-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path010-items-prediction.Rmd
359 lines (277 loc) · 40.2 KB
/
010-items-prediction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
# Predictive Models of the Acquisition of Individual Words {#items-prediction}
Note:
~ *The contents of this chapter are lightly adapted from @braginsky2019.*
In this chapter, we take up the challenge posed in Chapter \@ref(items-consistency), that is, to explain consistency and variability in the acquisition of individual words. Our approach is to define regression models that attempt to predict which words are learned earlier or later on the basis of a range of features drawn from different data sources. We fit these models to data across different languages and then interpret the resulting coefficients to draw conclusions about the potential contribution of different factors to children's learning.
## Introduction
As discussed in Chapter \@ref(intro-theory), one classic approach to word learning focuses on the specific mechanisms that children bring to bear on the learning problem. For example, across many laboratory experiments, a variety of mechanisms have been identified as plausible drivers of early word learning, including co-occurrence based and cross-situational word learning [@schwartz1983;@yu2007]; social cue use [@baldwin1993]; and syntactic bootstrapping [@gleitman1990;@mintz2003]. The individual contribution of each of these mechanisms has been difficult to assess, however.
Indeed, many theories of early word learning take multiplicity of cue types and mechanisms as a central feature [e.g., @hollich2000;@bloom2000]. As important as this work is, though, these studies are typically aimed at understanding how a small handful of words are learned in the laboratory under precisely-defined learning conditions. They do not directly address questions regarding the developmental composition and ordering of growth in the lexicon across many different children in their natural environments, nor whether these patterns are consistent across different languages.
An alternate approach to word learning -- one that we have been following throughout this book -- asks why some words are learned so early and some much later. This question about the order of the acquisition of first words can provide a different window into the nature of children's language learning. In Chapter \@ref(items-consistency), we began approaching this question by examining the consistency of acquisition order for children's earliest words. In the current chapter, we advance this goal using quantitative models to understand acquisition ordering.
Posed as a statistical problem, the challenge is to find what set of variables best predicts the age at which different words are acquired. This approach was pioneered by @huttenlocher1991 and developed further by @goodman2008; it is now firmly established as an important method for understanding vocabulary learning at scale. This previous work has revealed that, in English, within a lexical category (e.g., nouns, verbs), words that are more frequent in speech to children are likely to be learned earlier [@goodman2008]. Further studies (also in English) have found evidence that age of acquisition is likely to be earlier for words that have more phonological neighbors [e.g., @storkel2004; @stokes2010; @jones2019; but see @swingley2007; @stager1997]; words that share more associations with other words in the learning environment [@hills2009]; words that occur more often in isolation [@brent2001; @swingley2018]; words whose meanings are more concrete [@swingley2018]; words that are rated more iconic and/or more associated with babies [such as "choo-choo" or "doggy", @perry2015]; and words that occur in more distinctive spacial, temporal, and linguistic contexts [@roy2015].
Each of these studies used a different dataset and focused on different predictors, however. In addition, nearly all analyzed data from English-learning children, providing no opportunity for cross-linguistic comparison of the relative importance of the many relevant factors under consideration. In this chapter, we extend these approaches and assess the degree to which the predictors of word learning are consistent across different languages and cultures, as well as whether there are similar patterns across different word types (e.g., nouns vs. verbs).
We conduct cross-linguistic comparisons of the age of acquisition of particular words. We integrate estimates of words' acquisition trajectories from the Wordbank data with independently-derived characterizations of the word learning environment from other datasets. The use of secondary datasets for these analyses is warranted because no currently available resource provides data on both children's language environments and their learning outcomes for more than a small handful of children. In particular, we derive our estimates of the language environment from transcripts of speech to children in the CHILDES database [@macwhinney2000]. This data-integration methodology was originated by @goodman2008; it relies on large samples to average out the (substantial) differences between children and care environments. This is a conservative strategy because it requires substantial commonalities across families. While introducing additional sources of variability, it also allows for analyses that cannot be performed on smaller datasets or datasets that measure only child or environment but not both.
As our particular measures of environmental input, we estimated each word's (a) frequency in parental speech to children, (b) mean length in words of the parental utterances containing that word (MLU-w), (c) frequency as a one-word utterance, and (d) frequency as the final word in an utterance. While these measures are crude, they are easy to compute and relatively comparable across the languages in our sample. To derive proxies for the meaning-based properties of each word, we accessed available psycholinguistic norms using adult ratings of each word's (a) concreteness, (b) valence, (c) arousal, and (d) association with babies. Integrating these environmental and meaning-based measures, which are based respectively on estimates of children's linguistic environment and words' meaning, we predict each word's acquisition trajectories. We assess the relative contributions of each predictor, as well as how those predictors change over development and interact with the lexical category of the word being predicted.
These analyses address two questions. First, we ask about the degree of consistency across languages in the relative importance of each predictor. Consistency in the patterning of predictors would suggest that similar information sources are important for learners, regardless of language. Such evidence would suggest that superficial linguistic dissimilarities (e.g., greater morphological complexity in Russian and Turkish, greater phonological complexity in Danish) do not dramatically alter the course of acquisition. Conversely, variability would show the degree to which learners face different challenges in learning different languages, posing a challenge for more universalist accounts. Further, systematicity in the variability between languages would reveal which languages are more similar than others in the structure of these different challenges.
Second, we ask which lexical categories are most influenced by specific linguistic environment factors, like frequency and utterance length, compared with meaning-based factors like concreteness and valence. Division of dominance theory suggests that nouns might be more sensitive to meaning factors, while predicates and closed-class words might be more sensitive to linguistic environment factors [@gentner2001]. Following syntactic bootstrapping theories [@gleitman1990], nouns are argued to be learned via frequent co-occurrence patterns in the input (operationalized by frequency) while verbs might be more sensitive to syntactic factors [operationalized here by utterance length; @snedeker2007]. Thus, examining the relative contribution of different predictors across lexical categories can help test the predictions of influential theories of acquisition.
## Methods
```{r aoapred-prereqs, child = "_items-prediction.Rmd"}
```
### Acquisition trajectories
Since analyses in this chapter rely on unilemma mappings (see Section \@ref(unilemmas)), the set of languages represented is smaller than in other chapters. We use data from the items on WG forms for our comprehension measure, and data from the items in common between WG and WS forms for our production measure. Placeholder items, such as "child's own name," are excluded, as are longitudinal administrations. Table \@ref(tab:aoapred-lang-stats-table) gives an overview of our acquisition data.
```{r aoapred-lang-stats-table}
uni_lemma_info <- uni_model_data %>%
group_by(language) %>%
summarise(num_included = n_distinct(uni_lemma))
measure_admins <- admins %>%
mutate(produces = TRUE, understands = form == "WG") %>%
select(-form)
measure_sample_sizes <- bind_rows(
measure_admins %>% filter(produces) %>% mutate(measure = "produces"),
measure_admins %>% filter(understands) %>% mutate(measure = "understands")
)
instrument_info <- measure_sample_sizes %>%
group_by(language, measure) %>%
summarise(num_admins = n(),
min_age = min(age),
max_age = max(age)) %>%
mutate(age_range = paste(min_age, max_age, sep = "-")) %>%
select(-min_age, -max_age) %>%
pivot_wider(names_from = measure, values_from = c(num_admins, age_range))
lang_stats <- instrument_info %>%
left_join(uni_lemma_info) %>%
left_join(childes_sizes) %>%
ungroup() %>%
mutate(language = str_remove(language, " \\(.*\\)$")) %>%
select(language, ni = num_included, pna = num_admins_produces,
par = age_range_produces,
una = num_admins_understands, uar = age_range_understands, types, tokens)
kable(lang_stats, format = opts_knit$get("rmarkdown.pandoc.to"),
escape = FALSE, format.args = list(big.mark = ","),
caption = "Statistics for data from Wordbank and CHILDES. N indicates number of children.",
col.names = c("Language", "Items", "N", "Ages", "N", "Ages", "Types", "Tokens")) %>%
add_header_above(c("", "", "Production" = 2, "Comprehension" = 2, "CHILDES" = 2)) %>%
kable_styling(latex_options = "scale_down")
```
See Figure \@ref(fig:aoapred-demotraj) for example smoothed empirical item curves of the type being predicted in our subsequent analyses.
```{r aoapred-demotraj, fig.width=7, fig.height=3.5, fig.cap='Example production trajectories for the words "dog" and "jump" across languages. Points show the proportion of children producing each word for each one-month age group. Lines show the best-fitting logistic curve. Labels show the forms of the words in each language.'}
demo_lemmas <- c("dog", "jump")
demo_data <- uni_prop_data %>%
ungroup() %>%
filter(uni_lemma %in% demo_lemmas,
measure == "produces") %>%
mutate(language = language %>% str_remove(" \\(.*\\)"))
word_data <- demo_data %>%
select(language, uni_lemma, items) %>%
distinct(language, uni_lemma, .keep_all = TRUE) %>%
unnest(cols = items) %>%
mutate(x = ifelse(uni_lemma == demo_lemmas[1], 8, 35),
y = ifelse(uni_lemma == demo_lemmas[1], 1, 0))
ggplot(demo_data, aes(x = age, y = prop, colour = uni_lemma)) +
facet_wrap(~language, ncol = 5) +
geom_point(size = 0.8, alpha = 0.4) +
geom_smooth(aes(weight = num_true + num_false), method = "glm",
method.args = list(family = "binomial"),
se = FALSE, size = 1) +
geom_label(aes(x = x, y = y, label = definition), data = word_data, size = 3,
label.padding = unit(0.15, "lines"), vjust = "inward",
hjust = "inward", family = .font) +
.scale_colour_discrete(guide = FALSE) +
scale_y_continuous(name = "Proportion of children producing") +
scale_x_continuous(name = "Age (months)", breaks = seq(10, 30, 10))
```
### Word properties
For each word that appears on the forms in each of our `r num_langs` languages, we used corpora of child-directed speech in that language from CHILDES to obtain an estimate of its frequency, the mean length of utterances in which it appears, its frequency as the sole constituent of utterance, and its frequency in utterance final position (with frequency residualized out of solo and final frequencies). Additionally, we computed each word's length in phonemes.
To capture meaning-based factors in acquisition, we included ratings of each word's concreteness, valence, arousal, and relatedness to babies. All of these ratings were compiled based on previous studies using adult raters. In addition, since existing datasets for all of these ratings are primarily available for English, we used the unilemma mappings (see Section \@ref(unilemmas)) to use the ratings for English words across languages. Example words for these predictors in English are shown in Table \@ref(tab:aoapred-extremes-table).
```{r aoapred-extremes-table}
num_extremes <- 3
extremes <- uni_model_data %>%
filter(measure == "understands") %>%
distinct_(.dots = c("language", "uni_lemma", predictors)) %>%
mutate(uni_lemma = gsub("(.*) \\(.*\\)", "\\1", uni_lemma)) %>%
split(.$language) %>%
map_df(function(lang_data) {
map_df(predictors, function(predictor) {
if (predictor %in%
c("frequency", "final_frequency", "solo_frequency", "MLU-w")) {
filtered_lang_data <- lang_data %>%
filter(frequency != min(frequency))
} else {
filtered_lang_data <- lang_data
}
highest <- filtered_lang_data %>%
arrange_(as.formula(sprintf("~desc(%s)", predictor))) %>%
.$uni_lemma %>%
.[1:num_extremes]
lowest <- filtered_lang_data %>%
arrange_(predictor) %>%
.$uni_lemma %>%
.[1:num_extremes]
return(data.frame("language" = unique(lang_data$language),
"Predictor" = predictor,
"highest" = paste(highest, collapse = ", "),
"lowest" = paste(rev(lowest), collapse = ", ")))
})
})
extremes_display <- extremes %>%
filter(language == "English (American)") %>%
select(-language) %>%
arrange(Predictor) %>%
mutate(Predictor = display_predictors(Predictor)) %>%
select(Predictor, Lowest = lowest, Highest = highest)
kable(extremes_display, align = "llr",
caption = "Items with the highest and lowest values for each predictor in English.") %>%
kable_styling(latex_options = "scale_down")
```
Previous studies have shown robust consistency in the types of words that children learn very early [@tardif2008]. These words seem to describe concepts that are important or exciting in the lives of infants in a way that standard psycholinguistic features like concreteness do not. Capturing this intuition quantitatively is difficult, but @perry2015 provides a proxy measure as a first step. This measure is simply the degree to which a particular word was "associated with babies." Intuitively, we expect this measure to capture the degree to which words like _ball_ or _bottle_ feature heavily in the environment (and presumably, mental life) of many babies.
Each numeric predictor was centered and scaled so that all predictors would have comparable units.
**Frequency.** For each language, we estimated word frequency from unigram counts based on all corpora in CHILDES for that language (all corpora from a given language were included, regardless of dialect). Frequencies varied widely both within and across lexical categories. Each word's count includes the counts of words that share the same stem (so that _dogs_ counts as _dog_) or are synonymous (so that _father_ counts as _daddy_). For polysemous word pairs (e.g., _orange_ as in color or fruit), occurrences of the word in the corpus were split uniformly between the senses on the CDI (there were only between `r min(poly$n)` and `r max(poly$n)` such word pairs in the various languages; in the absence of cross-linguistic corpus resources for polysemy sense disambiguation, this is a necessary simplification). Counts were normalized to the length of each corpus, Laplace smoothed (i.e., count of 0 were replaced with counts of 1), and then log transformed.
**Solo and Final Frequencies.** Using the same dataset as for frequency, we estimated the frequency with which each of word occurs as the sole word in an utterance, and the frequency with which it appears as the final word of an utterance (not counting single-word utterances). As with frequency, solo and final counts were normalized to the length of each corpus, Laplace smoothed, and log transformed. Since both of these estimates are by necessity highly correlated with frequency, we then residualized unigram frequency out of both of them, so that values reflect an estimate of the effects of solo frequency and final frequency over and above frequency itself.
**MLU-w.** MLU-w is only a rough proxy for syntactic complexity, but is relatively straightforward to compute across languages (in contrast to other metrics). For each language, we estimated each word's MLU-w by calculating the mean length in words of the utterances in which that word appeared, for all corpora in CHILDES for that language. For words that occurred fewer than 10 times, MLU-w estimates were treated as missing.
**Number of phonemes.** In the absence of consistent resources for cross-linguistic pronunciation, we computed the number of phonemes in each word in each language based on phonemic transcriptions of each word obtained using the `eSpeak` tool [@duddington2012]. We then spot-checked these transcriptions for accuracy.
**Concreteness.** We used previously collected norms for concreteness [@brysbaert2014], which were gathered by asking adult participants to rate how concrete the meaning of each word is on a 5-point scale from abstract to concrete.
**Valence and Arousal.** We also used previously collected norms for valence and arousal [@warriner2013], for which adult participants were asked to rate words on a 1-9 happy-unhappy scale (valence) and 1-9 excited-calm scale (arousal).
**Babiness.** Lastly, we used previously collected norms of "babiness", a measure of association with infancy [@perry2015] for which adult participants were asked to judge a word's association with babies on a 1-10 scale.
**Lexical category.** Category was determined on the basis of the conceptual categories presented on the CDI form (e.g., "Animals", "Action Words"), such that the Nouns category contains common nouns, Predicates contains verbs and adjectives, and Function Words contains closed-class words [following @bates1994], and the remaining items are binned as Other.
**Imputation.** The resulting sets of predictor values for each language had varying numbers of missing values, depending on resource availability (number phonemes 0%, concreteness 0%-1%, arousal and valence 8%-13%, [solo/final] frequency 2%-14%, babiness 10%-33%, MLU-w 2%-53%). We used iterative regression imputation to fill in these missing values (separately within each language) by first replacing missing values with samples drawn randomly with replacement from the observed values, and then iteratively imputing values for a predictor based on a linear regression fitting that predictor from all others.
**Collinearity.** A potential concern for comparing coefficient estimates is predictor collinearity. Fortunately, in every language, the only relatively large correlations are between MLU-w and solo frequency (mean over languages `r mean_pair_cor("MLU", "solo_frequency")`), as expected given the similarity of these factors, along with modest correlations between frequency and concreteness (mean over languages `r mean_pair_cor("concreteness", "frequency")`) and between frequency and number of phonemes (mean over languages `r mean_pair_cor("frequency", "num_phons")`), a reflection of Zipf's Law [@zipf1935]. More importantly, the variance inflation factor for each of the predictors in each language is no greater than `r roundp(max(vifs$vif), 1)`, indicating that multicollinearity among the predictors is low.
### Analysis
We used mixed-effects logistic regression models [fit with the `MixedModels` package in `Julia`; @bates2018] to predict whether each child understands/produces each word from the child's age, properties of the word, interactions between each property and age, and interactions between each property and lexical category (which was contrast coded). Each model was fit to all data from a particular language and included a random intercept for each word and a random slope of age for each word. Computational and technical limitations prevented us from including random effects for child or including data from all languages in one joint model.
The magnitude of the standardized coefficient on each property gives an estimate of its independent contribution to words being understood/produced by more children. Interactions between properties and age give estimates of how this effect is modulated for earlier-learned and later-learned words. For example, a positive effect of babiness means that words associated with babies are known by more children; a negative interaction with age means that high babiness leads to higher rates of production and comprehension for younger children compared with older children. Similarly, interactions between properties and lexical category give estimates of how the effect differs among nouns, predicates, and function words.
## Results
**English predictor effects.** To illustrate the structure of our analysis, we first describe the results for English data, shown in Figure\ \@ref(fig:aoapred-refcoefs) as the main effect and age interaction coefficient estimates and 95% confidence intervals, for comprehension and production. For main effects, words are more likely to be known by more children if they are higher in frequency or concreteness, as well as in babiness for comprehension and in sentence-final frequency or sole-constituent frequency for production. In contrast, words that appear in shorter sentences (MLU-w) are more likely to be reported as understood or produced. For age interactions, while most predictors have consistent effects over age, words that are higher in frequency or concreteness are more likely to be known more by older children, while words that are higher in valence have a greater effect on acquisition in younger children, with an additional negative interaction with babiness in comprehension and positive interaction with MLU-w in production.
**Cross-linguistic predictor effects.** Figure\ \@ref(fig:aoapred-langcoefs) shows the coefficient estimate for each predictor in each language and measure. We find that frequency is the strongest predictor of acquisition (mean across languages and measures `r mean_term_coef("frequency")`). Other relatively strong overall predictors include concreteness (`r mean_term_coef("concreteness")`), solo frequency (`r mean_term_coef("solo_frequency")`), MLU-w (`r mean_term_coef("MLU")`), and final frequency (`r mean_term_coef("final_frequency")`). Number of phonemes is comparatively large for production (`r mean_term_measure_coef("produces", "num_phons")`) but not comprehension (`r mean_term_measure_coef("understands", "num_phons")`); conversely, babiness is comparatively large for comprehension (`r mean_term_measure_coef("understands", "babiness")`) but not production (`r mean_term_measure_coef("produces", "babiness")`). Finally, valence (`r mean_term_coef("valence")`) and arousal (`r mean_term_coef("arousal", 3)`) have much smaller effects.
Given the emphasis on frequency effects in the literature [@ambridge2015], one might have expected frequency to dominate, but several other predictors are also quite strong. In addition, some factors previously argued to be important for word learning, namely valence and arousal [@moors2013], appear to have limited relevance when compared to other factors. These results provide a strong argument for our approach of including multiple predictors and languages in our analysis.
```{r aoapred-refcoefs, fig.width=7, fig.height=4.5, fig.cap="(ref:aoapred-refcoefs-cap)"}
ggplot(ref_coefs, aes(x = estimate, y = term)) +
facet_grid(language + measure ~ effect, scales = "free",
labeller = as_labeller(label_caps)) +
ggstance::geom_pointrangeh(aes(colour = term, shape = signif,
xmin = estimate - 1.96 * std_error,
xmax = estimate + 1.96 * std_error)) +
geom_vline(xintercept = 0, color = .grey, linetype = "dotted") +
.scale_colour_discrete(guide = FALSE) +
scale_shape_manual(values = c(19, 21), guide = FALSE) +
labs(y = "", x = "Coefficient estimate")
```
(ref:aoapred-refcoefs-cap) Estimates of coefficients in predicting words' developmental trajectories for English comprehension and production data. Error bars indicate 95% confidence intervals; filled in points indicate coefficients for which _p_ < 0.05.
```{r aoapred-langcoefs, fig.width=7, fig.height=4.5, fig.cap="(ref:aoapred-langcoefs-cap)"}
ggplot(plt_lang_coefs, aes(x = estimate, y = term, colour = term)) +
facet_grid(measure ~ effect, scales = "free",
labeller = as_labeller(label_caps)) +
geom_point(aes(shape = signif), size = 1, alpha = 0.4) +
ggstance::stat_summaryh(geom = "crossbarh", fun.x = mean, fun.xmin = mean,
fun.xmax = mean, fatten = 3) +
geom_vline(xintercept = 0, color = .grey, linetype = "dotted") +
.scale_colour_discrete(guide = FALSE) +
scale_shape_manual(values = c(19, 21), guide = FALSE) +
labs(x = "Coefficient estimate", y = "")
```
(ref:aoapred-langcoefs-cap) Estimates of coefficients in predicting words' developmental trajectories for all languages and measures. Each point represents a predictor's coefficient in one language, with the bar showing the mean across languages. Filled in points indicate coefficients for which _p_ < 0.05.
**Consistency.** Apart from valence and arousal, all other predictors have the same the direction of effect in all or almost all languages and measures (at least `r most_opposite(c("arousal", "valence"))` of the `r num_langs * 2`). Thus, across languages, words are likely to be understood and produced by more children if they are more frequent, shorter, more concrete, more frequently the only word in an utterance, more associated with babies, more frequently the final word in an utterance, and appear in shorter utterances.
Additionally, there is considerable consistency in the magnitudes of predictors across languages. _A priori_, it could have been the case that different languages have wildly different effects of various factors (due to linguistic or cultural differences), but this pattern is not what we observe. Instead, there is more consistency in the correlations between coefficients across languages than would be expected by chance. As shown in Figure\ \@ref(fig:aoapred-consistency), each language's mean pairwise correlation with other languages' coefficients (i.e., the correlation of coefficients for English with coefficients for Russian, for Spanish, and so on) is outside of bootstrapped estimates in a randomized baseline created by shuffling predictor coefficients within language. The pairwise correlations are more consistent for production (mean _r_ = `r roundp(mean(filter(plt_coef_summary, measure == "produces")$mean_cor))`) than for comprehension (mean _r_ = `r roundp(mean(filter(plt_coef_summary, measure == "understands")$mean_cor))`), in which French and Russian effects are more idiosyncratic.
```{r aoapred-consistency, fig.width=6, fig.height=3, fig.cap="Correlations of coefficient estimates between languages. Each point represents the mean of one language's coefficients' correlation with each other language's coefficients, with the vertical line indicating the overall mean across languages. The shaded region and line show a bootstrapped 95\\% confidence interval of a randomized baseline where predictor coefficients are shuffled within language."}
ggplot(plt_coef_summary, aes(x = mean_cor, y = language)) +
facet_grid(. ~ measure, labeller = as_labeller(label_caps)) +
geom_vline(aes(xintercept = mean_cor), colour = .grey, size = 0.4,
data = plt_coef_summary %>% group_by(measure) %>%
summarise(mean_cor = mean(mean_cor))) +
geom_point(aes(colour = language), size = 2) +
geom_rect(aes(xmin = ci_lower_cor, xmax = ci_upper_cor,
ymin = as.numeric(language) + 0.4,
ymax = as.numeric(language) - 0.4,
fill = language),
data = plt_baseline_coef_summary,
alpha = .2, linetype = 0) +
geom_segment(aes(x = mean_cor, xend = mean_cor,
y = as.numeric(language) + 0.4,
yend = as.numeric(language) - 0.4),
data = plt_baseline_coef_summary,
colour = .grey) +
scale_x_continuous(breaks = seq(0, 1, 0.2)) +
labs(x = "Mean correlation with other languages' coefficients",
y = "") +
scale_colour_manual(values = lang_colours_aoa, guide = FALSE) +
scale_fill_manual(values = lang_colours_aoa, guide = FALSE)
```
**Variability.** While some particular coefficients differ substantially from the trend across languages (e.g., the effect of frequency for comprehension in Spanish is near 0), these individual datapoints are difficult to interpret. Many unmeasurable factors could potentially account for these differences: Spanish frequency estimates could be less accurate due to corpus sparsity or idiosyncrasy, the samples of children in the Spanish CDI and CHILDES data could differ more demographically, or Spanish-learning children could in fact rely less on frequency. Rather than attempting to interpret individual coefficients, we instead ask how the patterns of difference among languages reflect systematic substructure in the variability of the effects.
To examine the substructure of predictor variability, we followed Chapter \@ref(items-consistency) in using hierarchical clustering analysis to find the similarity structure among the pairwise correlations between languages' predictors. The resulting dendrograms are shown in Figure\ \@ref(fig:aoapred-clustering), which broadly reflect language typology, especially for production data. This result suggests that some language-to-language similarity is captured by the profile of coefficient magnitudes our analysis returns.
```{r aoapred-clustering, fig.width=5, fig.height=3.5, fig.cap="Dendrograms of the similarity structure among languages' coefficients."}
coef_clust_segments <- coef_clust %>%
mutate(segment = map(clust, ~.x %>% ggdendro::segment())) %>%
select(-data, -clust) %>%
unnest(cols = segment)
typology <- yaml::read_yaml("data/items-prediction/typology.yaml") %>%
data_frame(label = names(.), family = unlist(.)) %>%
select(label, family)
coef_clust_labels <- coef_clust %>%
mutate(segment = map(clust, ~.x %>% ggdendro::label())) %>%
select(-data, -clust) %>%
unnest(cols = segment) %>%
mutate(label = label %>% str_remove(" \\(.*\\)")) %>%
left_join(typology)
plt <- ggplot(coef_clust_segments) +
facet_grid(. ~ measure, scales = "free",
labeller = as_labeller(label_caps)) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_text(aes(x = x, y = y - 0.02, label = label, colour = family),
data = coef_clust_labels, hjust = 0, family = .font) +
coord_flip() +
scale_x_reverse() +
scale_y_reverse() +
# scale_colour_manual(values = lang_colours_aoa, guide = FALSE) +
.scale_color_discrete(name = "Language family",
guide = guide_legend(title.position = "top",
title.hjust = 0.5,
override.aes = list(size = 0),
keyheight = unit(0, "lines"))) +
expand_limits(y = -1) +
theme_get() +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.position = "bottom",
legend.text = element_blank(),
legend.title = element_text(size = rel(0.8), margin = margin(b = -8)),
legend.margin = margin(t = 0),
panel.border = element_blank())
lgnd <- typology %>% distinct(family) %>% mutate(x = scale(1:n())) %>%
ggplot(aes(x = x, y = 0)) +
geom_text(aes(label = family, colour = family), family = .font, size = 3) +
.scale_colour_discrete(guide = FALSE) +
lims(x = c(-3, 3)) +
theme_void()
cowplot::plot_grid(plt, lgnd, ncol = 1, rel_heights = c(20, 1))
```
**Comprehension vs. production.** As mentioned above, word length is the one predictor of acquisition that varies substantially between measures: it is far more predictive for production than for comprehension. Although not all children will produce precisely the citation form of CDI words, words that have longer citation forms will also on average tend to have more difficult realizations in child language (e.g., although some children will say “raf” for “giraffe,” others will attempt the full citation form). Given this, as measured here, length seems to reflect effects of production constraints (i.e., how difficult a word is to say) rather than comprehension constraints (i.e., how difficult it is to store or access). This result may explain why the hierarchical clustering analysis above appears more similar to linguistic typology in production than comprehension, that is, the role of production difficulty may be more similar for more typologically-related languages. Another possibility is that since the measures are confounded with age (comprehension is only measured for younger children), word length may play a larger role later in acquisition. Similarly, the stronger effect of babiness in comprehension over production could be due to its larger prominence earlier in development.
**Developmental change.** For both comprehension and production, positive age interactions can be seen in at least 9 out of 10 languages for concreteness and frequency. Conversely, there are negative age interactions for babiness and valence for comprehension in at least 9 out of 10 languages. This suggests that concreteness and frequency facilitate learning more so later in development, while babiness and valence facilitate learning earlier in development. This result is consistent with the speculation above that the babiness predictor captures meanings that have special salience to very young infants.
**Lexical categories.** Previous work suggests that predictors' relationship with age of acquisition differs among lexical categories [@goodman2008]. We investigate these differences by including lexical category interaction terms in our model. Figure\ \@ref(fig:aoapred-lexcatcoefs) shows the resulting effects for each lexical category, combining the main effect of a given predictor with the main effect of the lexical category and the interaction between that predictor and that lexical category.
Across languages, the strongest predictors of acquisition for both nouns and predicates are concreteness (nouns `r mean_lexcat_coef("concreteness", "nouns")`; predicates `r mean_lexcat_coef("concreteness", "predicates")`) and frequency (nouns `r mean_lexcat_coef("frequency", "nouns")`; predicates `r mean_lexcat_coef("frequency", "predicates")`). Thus content words are most likely to be known by more children if they are more frequent or more concrete. Conversely, function words are most influenced by number of phonemes (`r mean_lexcat_coef("num_phons", "function_words")`), babiness (`r mean_lexcat_coef("babiness", "function_words")`), and MLU-w (`r mean_lexcat_coef("MLU", "function_words")`), meaning that function words are most likely to be known by more children if they are shorter, less associated with babies, or appear in shorter sentences. These patterns are supportive of the hypothesis that different word classes are learned in different ways, or at least that the bottleneck on learning tends to be different, with different information sources having varying degrees of relevance across categories.
```{r aoapred-lexcatcoefs, fig.width=7, fig.height=4, fig.cap="Estimates of effect in predicting words' developmental trajectories for each language, measure, and lexical category (main effect of predictor + main effect of lexical category + interaction between predictor and lexical category). Each point represents a predictor's effect in one language, with the bar showing the mean across languages."}
ggplot(plt_lexcat_coefs, aes(x = estimate, y = term, colour = term)) +
facet_grid(measure ~ lexical_category,
labeller = as_labeller(label_caps)) +
geom_point(size = 1, alpha = 0.4) +
ggstance::stat_summaryh(geom = "crossbarh", fun.x = mean, fun.xmin = mean,
fun.xmax = mean, fatten = 3) +
geom_vline(xintercept = 0, color = .grey, linetype = "dotted") +
.scale_colour_discrete(guide = FALSE) +
labs(x = "Coefficient estimate", y = "")
```
Additionally, the mean pairwise correlation of coefficients between languages is much larger for nouns (`r lexcat_mean_cor("nouns")`) and predicates (`r lexcat_mean_cor("predicates")`) than for function words (`r lexcat_mean_cor("function_words")`). The higher between-language variability for function words suggests the learning processes differ substantially more across languages for function words than they do for content words.
## Discussion
What makes words easier or harder for young children to learn? Previous experimental work has largely addressed this question using small-scale lab studies. While such experiments can identify sources of variation, they typically do not allow for different sources to be compared directly. In contrast, observational studies allow the effects of individual factors to be measured across ages and lexical categories [e.g., @goodman2008;@hills2009;@swingley2018], but are limited in the size and scope of the datasets and languages that can be directly compared. We derived several new findings from our analyses, in part due to the larger set of languages and predictors we were able to include.
First, we found consistency in the patterning of predictors across languages at a level substantially greater than the predictions of a chance model. This consistency supports the idea that differences in culture or language structure do not lead to fundamentally different acquisition strategies, at least at the level of detail we were able to examine. Instead, they are likely produced by processes that are similar across populations and languages. We return to a discussion of these "process universals" in Chapter \@ref(conclusion-scale).
Second, predictors varied substantially in their weights across lexical categories. Frequent, concrete nouns were learned earlier, consistent with theories that emphasize the importance of early referential speech [e.g., @baldwin1995]. For predicates, concreteness was somewhat less important and frequency some more important. And for function words, length and MLU-w was more predictive, perhaps because it is easiest to decode the meanings of function words that are used in short sentences (or because such words have meanings that are easiest to decode). Overall, these findings are consistent with some predictions of both division of dominance theory, which highlights the role of conceptual structure in noun acquisition [@gentner2001], and syntactic bootstrapping theory, which emphasizes linguistic structure over conceptual complexity in the acquisition of lexical categories other than nouns [@snedeker2007]. More generally, our methods here provide a way forward for testing the predictions of these theories across languages and at the level of the entire lexicon rather than individual words.
In addition to these new insights, several findings emerged that confirm and expand previous reports. Environmental frequency was an important predictor of learning, with more frequently-heard words learned earlier [@goodman2008;@swingley2018]. Predictors also changed in relative importance across development. For example, certain words whose meanings were more strongly associated with babies appeared to be learned early for children across the languages in our sample -- perhaps explaining our findings in Chapter \@ref(items-consistency) [see also @tardif2008]. Finally, word length showed a dissociation between comprehension and production, suggesting that challenges in production do not carry over to comprehension (at least in parent-report data).
```{r aoapred-cvs}
cvs <- plt_lang_coefs %>%
filter(effect == "main effect") %>% # only main effects for now
group_by(measure, term) %>%
summarise(cv = cv(estimate),
sem = cv_sem(estimate),
n = n(),
category = "Predictors") %>%
mutate(signature = term) %>%
select(-term)
write_feather(cvs, "data/cvs/aoapred_cvs.feather")
```