Document Type

Honors Project


Children exhibit word-referent learning mechanisms like statistical learning (SL) proposed by Yu and Smith (2007) and propose-but-verify (PBV) mechanisms proposed by Medina (2011), but prior work has yet to investigate how children develop metalinguistic awareness within these two approaches. To evaluate the differences in corpus data predictions of the SL and PBV mechanisms, this study proposes a learning bias: an Extralinguistic Reference Bias. Statistical learning predicts a constrained trajectory of children’s development of metalinguistic awareness. Children younger than approximately age 5 have limited access to metalinguistic language use while they are engaged in the initial mapping of forms to their primary, extralinguistic meaning. Children will acquire metalinguistic language use only after first understanding extralinguistic reference through word-referent learning mechanisms.

Using data from the Child Language Data Exchange System (CHILDES), this corpus-linguistic study coded every token of four metalinguistic verb lemmas (say, ask, tell, talk) across all corpora of mainstream North-American English’s varieties with random forest classification. Prior to age 5, if children have limited access to metalinguistic reference as suggested by Piaget (1928) and Vygotsky (1962); their metalinguistic verb use is in fixed constructions that refer to speech acts, like say cheese, rather than reported speech. Furthermore, some of the isolated tokens that appear to be reported speech are instances of children’s imitations of parents’ modeled speech.

Additionally, the development of metalinguistic awareness is different within the SL and PBV approaches. For the SL mechanism, children would disregard any forms without an observable extralinguistic referent; whereas for the PBV mechanism, children would produce seemingly metalinguistic tokens without observable extralinguistic correlates.

CHILDES-North America.xlsx (4536 kB)
Corpus linguistic study using spaCy dependency parser

Included in

Linguistics Commons



© Copyright is owned by author of this document