FAQs

On this page, we have listed frequently asked questions about the scope of the project, corpus construction, ethical considerations, data processing, and publication. Please click on the questions to display the answers.

Scope of the project

Is the project focused only on documenting minority and endangered languages?

Do you have advice on sketches for multilingual children?

Is there any funding available to construct the corpus and create the sketches?

How valid are the sketch corpora? Have you tried to compare the results of sketch corpora with those of larger acquisition corpora?

We have done initial comparisons for German (Urbanczik 2023) and Inuktitut (Lee & Allen 2023). The encouraging result is that the sketch corpora contain high-frequency phenomena and major stages in the acquisition trajectory, and they do not produce wrong results. But the corpora are too small to capture everything and we have to calibrate generalizations accordingly: they do not allow us to make statements about what children know or do not know at a given age, but they are rich enough to allow us to formulate predictions and hypotheses. In general, analyses should stay as close as possible to the observable corpus data. If you are interested in contributing acquisition sketches of well-described languages, we encourage you to consider the possibility of comparing sketch results with what we know from larger datasets, and thus help assess the validity of the sketch format.

Corpus construction

What is the ages of the children to be recorded?

Should the children be recorded for an entire day at each age?

What is the minimum number of children you recommend?

Should we record the same two children at each of the five age points?

Can we write the sketch on the basis of (existing or new) data that deviate from the guidelines, e.g., semi-structured data (instead of naturalistic data), one-on-one recordings (instead of multiple participants), younger/older children (instead of the target ages of 2-4), one child (instead of two children), etc.?

I have access to a larger number of children - should I limit myself to recording only two children? How selective should I be?

We recommend not being too picky and including more children, as it is likely that some children will drop out during the course of the study for various reasons; also, it is always better to have more data to choose from. Keep in mind, though, that the project should remain feasible. In the end, a small project that is completed is better than a comprehensive project that is not completed.

Should we record all children within the same recording context, or is it better to capture as many contexts as possible?

Both approaches have advantages and disadvantages: collecting data within the same context makes the data more comparable, but collecting data from different contexts gives us better insights into the diversity of learning environments. We therefore refrain from making recommendations. However, we advise you to consider the recording context in your analysis, as different contexts favor different types of interaction and thus may increase or decrease the frequency of certain phenomena.

Will the focus on talkative children lead to a lack of representativity?

Maybe. But the intention is to maximize the amount of language within the 5 hours of recorded interaction. The goal is to develop a basic understanding of what the acquisition trajectory is like in the language - and this will be much easier with more data and clear speech (i.e. a talkative and clear child). Hopefully this won't have too much effect on whether a particular structure is used or not at a given age. On this basis, it will be possible to later extend the scope of the project and aim for comparisons with, e.g., less talkative children. You can also decide to record a wider range of children now, and then use their data for comparison later.

Does it introduce a bias when working with a very small group of people who are intrinsically interested in language?

No study is unbiased in its sampling: To a certain extent, there will always be a bias introduced by the particular persons that come and wish to work with you. You may acknowledge it and flag that not necessarily everybody in the community shows the same behavior. Understanding the limits of generalizability is something that runs right through the sketch. Still, you are not getting these recordings from nowhere and it is not going to be completely different in the community. For instance, older children often exaggerate elements of child-directed language – but still, their behavior reflects patterns present in the community. The patterns we find may not always be representative but they are pointing us in the direction of how acquisition might work in the language.

Ethical considerations

Regarding ethics approval, is there anything specific that needs to be included in the informed consent forms?

This kind of research will have to pass through a more complex approval procedure, as the collected data and metadata are highly sensitive: video recordings of spontaneous interaction involving minors, plus metadata that captures information on the development of children and their social networks. It is difficult to give general recommendations, though, because the details will vary from university to university and country to country. However, one important issue that needs to be addressed is the possibility of making the data or parts of the data available, e.g. through a language documentation archive or through CHILDES. This may be problematic for the audio/video recordings, but we encourage you to discuss the possibilities with the language community and the archive of your choice, and to consider making minimally the transcripts available.

Who can give consent?

Since children are minors, their parents or guardians will have to give consent for them. In addition, it is possible and appropriate to seek children’s assent to participate, including watching out for children showing signs of displeasure at being recorded or demonstrating their unwillingness to participate in any way. Furthermore, a long-term relationship with the community may make it possible to discuss ethical aspects of the recordings directly with the children as they grow older.

It is not uncommon for other children (neighbors, friends) to unexpectedly join a recording - how to best handle consent for these children?

Does the consent have to be in writing?

There are contexts where it may be ethically problematic to obtain written consent, e.g. because of illiteracy (where asking for written consent would force people to reveal that they cannot read and write), or a community may have had bad experiences of communication with institutions (e.g. with the police, in legal documents etc.) and is therefore suspicious of the written form. In such contexts, it is generally more appropriate to obtain verbal consent, e.g. to go through the consent form together and record the entire procedure on video. However, be aware that ethics approval is often contingent on obtaining written consent. If this is the case, your possibilities are limited - but we would encourage you to take up any such issues with your ethics board: in recent times, some ethics boards have accepted oral consent under such circumstances.

How can we prepare the families for what to expect, so that their consent is, indeed, informed consent?

We have had good experience with doing trial recordings with the first set of participants, and then reviewing and discussing the video together. When recruiting further families, we recommend involving participants who were previously recorded: who can explain how the recording worked for them (e.g. what they did during the recording, where they placed the camera, how they handled scenes that they didn't want to be recorded etc.), and - if possible - consent to showing their video (so that other families get an idea of the output). This allows families to discuss issues, resolve doubts etc.

Families may consent to participate and sign the consent form simply because they trust the research team and/or the advice of fellow community members and leaders - how can we make sure that families can reach an informed decision under these circumstances?

Building personal relationships and community trust is often an indispensable prerequisite to conducting research in small communities where people have little experience with academic settings and principles of research. At the same time, this approach can interfere with informed consent, as people may participate out of trust. Yet, paradoxically, this is also often the key to successfully explaining ethical considerations, and we encourage you to engage with ethics as an ongoing process rather than a one-off event. Long-term personal relationships make it possible to repeatedly show and discuss data, report on progress or present research results. This will not only create a deeper understanding of the project, but also give people the opportunity to voice concerns and revisit earlier decisions. We have also had good experience with local intermediaries: they were much better at explaining ethical issues in a way that was meaningful to the community; and by engaging with their explanations, we were able to uncover misunderstandings and recognize where our own previous explanations were not clear enough.

Families may feel community or other pressure to participate - how can we ensure that all participation is voluntary?

If local opinion leaders are supportive of a project, it is possible that individual families may feel that they cannot openly refuse to participate. Similarly, it might be socially inappropriate to refuse a request from a researcher who is valued as a guest of the community. In our experience, people are likely to find less explicit ways of signaling their refusal in such cases (e.g. postponing appointments, avoiding the camera etc.), and we will have to be perceptive of such signals. It is also a good idea to create opportunities for people to voice their concerns in private spaces, e.g. when discussing the logistics of an upcoming recording (where to place the camera, what to record etc.), or when reviewing a trial recording. Such contexts give families the opportunity to address specific issues - and to either resolve these issues to the family’s satisfaction, or else allow the family to frame their ‘no’ as a problem with a specific issue, not as a general refusal.

Should families be paid for their participation?

There are different regional and local practices on how compensation is handled, and we advise you to inquire about what is appropriate in your context. Generally, though, participation in data collection is a form of work and people could have done other things during that time (e.g. engage in paid work, do subsistence farming etc.) - and some form of compensation is thus advisable and appropriate. However, it needs to be aligned with community standards and expectations, so that it is neither a form of coercion (e.g. where people feel they have to consent because they are being paid) nor cause problems in the community (e.g. because the compensation is well above local standards). It is often helpful to discuss these issues with local contacts and seek their advice on what to do and not to do in this regard.

What if families would like to participate, but do not meet the inclusion criteria?

The acquisition sketch format allows for some flexibility (e.g. with regard to the exact ages of the children, gender etc.), but there are nevertheless inclusion/exclusion criteria. Similarly, local opinion might consider families as more or less suitable participants for this type of research. All this may lead to the exclusion of families who would like to participate. In this case, we have had good experience with exploring alternative ways for them to be involved, e.g. contributing to interviews on socialization practices, helping with transcription etc.

When recording children (and especially when recording for an entire day), we cannot foresee what will happen, and the camera is likely to capture private moments and/or participants who are not aware of the recording - how can we deal with this?

We recommend to review the recording with the family and other participants afterwards, so that they can decide if they want to cut scenes (maybe a breastfeeding scene, maybe someone entering the scene without a T‐shirt etc.). Furthermore, people may not be aware at first that someone sitting at the edge of a scene is captured by the camera as well, or that conversations behind the camera may be picked up by the microphone. Reviewing recordings is thus also a great opportunity to discuss the logistics of future recordings (e.g. where to put the camera next time such that participants can easily leave the scene whenever they do not want to be recorded).

Is it possible to hand over control to the families, so that they can decide when and what to record?

Yes. We have had good experience with leaving the camera with families, showing them how to operate it. Such an approach minimizes unwanted intrusions into family life and helps avoid recording scenes that families are not comfortable with. It can also lead to more natural data, as it mitigates the observer’s paradox and as families are in the best position to judge when and where their children are happy to be recorded. However, this approach may not be an option in all cases, and you need to weigh up the advantages against possible disadvantages. People may have no experience with video recording, thus affecting the quality of the recording (e.g. the sound quality, or the camera angle). Especially in the case of sign language research, it is unlikely that families can do the recordings themselves, as this requires lots of experience (e.g. with lighting and background). Furthermore, this approach might introduce a bias (e.g. not capturing contexts that would be important from a research perspective) and could make the data less comparable.

What do I have to keep in mind when older siblings/other children are present in the recording?

In our Acquisition Sketch Meeting 11, we talked about why the presence of other children and especially older siblings may be helpful, i.e. to compare child-directed language across ages. Regarding ethics and informed consent, of course it is more manageable if the other children are older siblings, i.e. belong to the same family you are working with anyway. This also adds to consistency in who the adults are talking to. You might consider to set up the recording in a location where you can control access to avoid recording people who accidentally pass by. Otherwise you may have to stop the recording earlier than you want to. Also, keep in mind that transcribing the session gets harder the more children are present.

Data collection and processing

What kinds of microphones are most comfortable for children, which should I use?

All of us have used different setups for different reasons and we thus cannot give a clear answer. For instance, you can equip each of your focus children with a small Zoom recorder placed on their chest. Or you may use lapel microphones in little backpacks and a small clip at the front, so they cannot take it off. It's important that the setup is unobtrusive so that they do not want to instantly remove it. A head-mounted device works best with very young children who are not so dexterous yet with their fingers and then start to forget it's there. A standalone microphone in front of the children (who do not wear a microphone) can also work well. It really depends on your setting and you may want to test a few options in the beginning.

How can we segment the recordings?

Who can do the transcription and translation?

Speakers of the language need to be involved. Depending on the setup of your project, it could be you as the project leader working on your own language, community members transcribing on their own, or community members transcribing together with you as an outside researcher. Ideally, you would want to involve people who know the child well and can decipher the more idiosyncratic aspects of the child’s language. If community members transcribe independently, you will likely need to do some training (either separate training sessions or on-the-job training), give regular feedback and check for quality and consistency.

Who can do the glossing?

Morphemic glossing requires linguistic skills and knowledge. It is a very difficult task for non-linguists, but it may be possible to train community members to provide word-level glosses. If there are grammatical materials available, you may be able to train students of linguistics to do a first-pass glossing (but make sure to check the results). Most likely, you will have to do the glossing yourself. Fortunately, there are many books on fieldwork and language documentation that provide helpful advice (see p. 7 of the SAM, Part I for some suggestions).

How to ensure consistency when working with multiple transcribers?

Think about relevant tiers for your transcription/annotation and establish a tier template to be used by all transcribers. We also recommend that you record (and regularly update) your annotation conventions in a shared file that can be accessed and modified by everyone in the team. If you’re working in an environment with low availability of electricity, you may print out your tier template (and use the sheets like a notebook to be filled in by the transcribers) and annotation conventions.

In what format should the data be transcribed?

Ideally in CHAT or ELAN format. There are pros and cons for each of these setups, and we recommend that you use the tools that you (and/or your closest colleagues) are most familiar with. In any case, you will need to train the transcribers in the use of the respective program. We recommend that you use the following types of tiers: transcript, translation, tiers needed for a morphemic analysis, tier for the interpretation of the child utterances, and possibly an addressee tier. Depending on your research question, you might want to add additional tiers. We provide some examples in our meeting on data processing (part one).

I am not familiar with CHAT. I have been using SALT. Will that do?

Can I use AI-assisted speech recognition tools for the transcription?

This may work well as a first pass for languages that the speech recognition tool is trained on - but be aware that the output will still need to be checked manually. For all other languages, it is unlikely that this will lead to satisfactory results, as you will need to provide training data for the model in the first place. You might want to try out ELPIS which allows you to build your own speech recognition model.

To what extent should we transcribe gestures and non-verbal actions?

Language is fundamentally multimodal, and children's (and carers') manual and non-manual gestures are an essential part of the communicative event. Transcribing them is very time-consuming, though, and we therefore recommend being selective. Good candidates are emblems with their conventionalized meanings (such as nodding for 'yes' or shaking the head for 'no'), iconics (depicting some aspect of a referent, e.g. roundness when talking about a ball) and/or deictic pointing gestures. You might also consider transcribing some actions if needed to understand a particular situation (e.g. a child hitting their brother). Annotating co-speech gestures, on the other hand, is likely to exceed your capacities. In any case, we recommend focusing on salient cases and/or annotating only a selected part of the corpus - unless, of course, your research focuses on gesture. The CHAT Transcription Manual provides useful notation conventions (of the format &=hits:table, &=pats:head, &=ges:frustration, &=points:car etc.).

How to deal with multilingual children? How to annotate their use of different languages?

Archiving

Where can we archive the sketch corpus?

Several options are available, including many documentation archives such as ELAR (Endangered Languages Archive) and child language databases such as CHILDES (Child Language Data Exchange System). Our recommendation is that you let yourself be guided by your prior experience and familiarity. For example, if you and/or your close colleagues are familiar with a particular archive, deposit the sketch data there. Or if you prefer a particular data processing setup, deposit in an archive that supports this setup (e.g. a CHAT setup would favor CHILDES, while an ELAN/Toolbox-type setup would favor a documentation archive etc.). We are setting up an institutional cooperation with the Language Archive Cologne, and you will be able to deposit the sketch data there - but this is not a requirement, and you should feel free to archive wherever is most appropriate for your purposes.

Can we archive in multiple places?

In principle, yes. But you should carefully think about what this decision entails. The sketch corpora are work-in-progress, as you will continue to add annotations over time. This means that you will periodically archive newer versions of the same annotation files - and if you have the same data in multiple archives, it can become very difficult to keep track of the many different versions, both for yourself and for others interested in the data. Furthermore, each archive has different setups and procedures, which you would need to follow. This is likely to increase your workload quite substantially. Our recommendation is therefore to archive the data in one place only. It might make sense, though, to add copies of parts of the data to databases that are dedicated to specific research purposes (e.g. to the PhonBank project for phonological research etc.). In this case, the database managers are usually able to offer support and hands-on-advice on how to do this.

How can others find the archived sketch data, given that sketch corpora will be deposited in different archives?

It is true that archives cater for different research communities, e.g. child language researchers will know about CHILDES, but are less likely to access language documentation archives and vice versa. As the Acquisition Sketch Project grows, we will therefore collate information on all acquisition sketches on our website, including a list of all available sketch corpora and links to their deposits.

Sketch writing

To what extent should we consider previous work on the language when writing the sketch? Into how much detail should we get? Should we try and locate unpublished manuscripts, theses etc.?

Tracing down unpublished sources is likely to be too much work. Of course, if you are aware of the existence of a highly relevant unpublished source, it might well be worth the extra effort. However, your primary goal should be to describe the sketch data. So, if you have access to relevant literature, use it to briefly contextualize your findings. If not, do not spend too much energy on localizing such sources.

Is it possible to write a sketch if the adult language is underdescribed or not described at all?

Yes. The basics of the language will be visible in the corpus data: in the language used by adult interlocutors and in the developmental changes between the ages of 2 and 4. During the transcription and annotation process, you will have a chance to discuss the observed data with your collaborators. This will result in a lot of additional valuable information on the adult language, which will help you interpret the child language.

How much detail should be included when covering the core topics in each section?

It is impossible to cover every core topic in equal detail and you will need to prioritize. This may be based on your interests and the interests of the community, as well as the typological structure of the language. The decision is yours, and all we ask is that you cover the core topics in all sections: some in more detail, some in less detail. If you are interested in pursuing a more in-depth study of a specific phenomenon, keep in mind that you can always publish it as a paper in its own right, outside the sketch approach.

Should we systematically compare patterns in child-directed language with patterns in child language?

Section 4 of the sketch deals with learning environments and ethnotheories, but people may not reflect consciously on their language use with children or on how children learn language. So it may be difficult to obtain such information from (formal or informal) interviews. Do you have recommendations for collecting such information under these circumstances?

We have had good experience with discussing these issues with the help of specific examples from the sketch corpus, or based on things that we have observed in the community. When discussing specific examples, people often add information on how typical they are in child language or child-directed language, or they interpret the examples with reference to common ideas in the community about developmental stages etc. We have also observed that people often focus on lexicon rather than grammar - i.e., it will likely be easier to talk about people's ideas on lexicon and lexical development than on grammar.

People may report on their language use in interview settings, but this may be different from their actual behavior. How do we handle this in the acquisition sketch?

It is possible that the interviews will generate stereotypical ideas about how language should be used (and why it should be used in such a way), and these ideas may or may not be reflected in actual language use. For the acquisition sketch, you should report on both: the information you gain from interviews as well as what you observe in the sketch data; this also includes taking note of any differences. However, understanding the relationship between what we believe we do and what we actually do is a highly complex issue and requires a multi-method approach that is beyond the scope of the sketch approach.

What other prompts like singing are worth noting?

How to deal with nursery rhymes? Should we include them in the sketch?

If they play an important role in the language, you could include them in section 4 on learning environments: list those that appear in the sketch corpus (and/or others that are known to you from other sources) and describe the interactional contexts where they are used. If they appear in the sketch corpus, you might be able to observe communication about nursery rhymes (e.g., expanding on them, playing with the words etc.). In this case, such information could also go into the respective sections on child-directed language or child language. For grammatical analyses, nursery rhymes should be excluded (as they constitute routinized utterances).

We noticed many apprehensive constructions in the sketch data (e.g. "don't do this, or you will fall") - should we include them in the sketch?

Should I ask caregivers about child-directed language they are aware of using?

You may certainly include information about typical elements of child-directed language in your sketch. For instance, you or a member of the community could systematically speak with adults about how they talk to children and why they talk to children in a different way than to other adults. Or, more informally, if there are comments on child-directed language during the transcription sessions, you may ask the transcribers to elaborate. It can be very helpful to talk about an example at hand rather than trying to imagine what elements of child-directed language may be used in the community. Moreover, unprompted comments may be more insightful than direct questioning: not everyone is equally aware of speech patterns; plus speakers may feel that certain patterns are socially more desirable than others and answer accordingly.

The sketch corpus does not contain much adult-directed language, so that I cannot compare adult-directed and child-directed language. How do I deal with this?

The corpora are set up in such a way that they will only ever contain limited amounts of adult-directed language. We therefore recommend that you focus on comparing the language addressed to younger vs. older age groups in the sketch corpus - because child-directed language is known to vary with the age of the child. Of course, if you have knowledge of the adult language and/or if adult language corpora are available, you can use this information to guide your initial explorations, e.g. to help you identify candidate features of child-directed language. However, be aware that this data will likely not be directly comparable to the sketch data (e.g. different adult speakers, different contexts etc.).

There is only limited adult/child interaction in the sketch corpus, so that I cannot investigate how adults talk to children. How do I deal with this?

In this case, it is likely that the sketch corpus will feature a lot of peer interaction. We therefore recommend that you focus on investigating how older children communicate with the focus children - because older children are known to adopt features of their community’s child-directed language when interacting with young children. If the sketch corpus features both adult/child and peer interaction, it may be worthwhile to compare the child-directed language of adults to that of older children.

How to deal with variability in the chapters on child-directed language?

Child-directed language is known to vary across individuals, contexts and socio-demographic backgrounds. It is thus very likely that you will see such variation in your sketch corpus. You probably won’t be able to identify the causes of such variation, as this would necessitate controlled studies (where you can manipulate the variables of interest). However, we recommend that you describe any variability that you observe in the data, as this will give valuable information on the possible variation space.

How to deal with individual variability in children's development?

Do you have recommendations for software to analyze prosodic features in child-directed and child language?

Is it possible to do a quantitative phonetic study based on the sketch recordings?

In general, it is possible to obtain good quality audio data, but the acquisition sketch contexts do not represent ideal recording conditions (as the focus is on natural interaction). However, it is still possible to do some quantitative analysis such as syllable counts. You might start with 20 clear utterances. Such a sampled approach may be more feasible than analyzing a whole stretch in full detail. Depending on your setting, you might even be able to do in-depth phonetic analyses. In sum, you can get very good quality audio recordings, but you probably can't rely on getting this quality in every recording.

How can we measure productivity in our data?

You can approach the issue by looking at combinatorics: If something appears with more than one root or word, you could take this finding as tentatively indicating productivity. So you can put forward a hypothesis about whether or not something might be productive. However, you would need different kinds of data, possibly even experimental data, to show whether something is really productive.

How can we measure productivity? Because even if you observe the expansion of an element to other contexts or the emergence of small paradigms, the forms might still be rote-learned - can we really be sure about their productivity?

This is always a tricky issue, even in large-scale corpora. We recommend that you deal with this by taking a descriptive approach, i.e. focus on describing the patterns that you observe in the data, without making any strong claims about productivity. Also, it is clear that the sketch data is limited. It is thus not necessary to preface every statement with a disclaimer about the limits of the data; consider adding a general disclaimer in the introductory section, though.

Can we distinguish between the first appearance of a structure and the mastery of it?

It is often difficult to perceive phonological distinctions in another language - how can we deal with this?

It is essential to closely cooperate with speakers of the language throughout the transcription process, and to especially discuss those instances where your perception and the transcribers' perception differ. This will allow you to identify problem areas. We have had good experience with developing a joint metalanguage to talk in non-linguistic terms about distinctions that are difficult to perceive.

Children may be able to produce a sound, but may not yet have acquired the phoneme - how do we differentiate between the acquisition of phones vs. phonemes?

The sketch corpus probably won't contain the kinds of evidence needed, at least not on a systematic basis and not for all sounds. However, it is possible that the corpus contains, e.g., instances where an adult explicitly teaches a pronunciation that the child repeats more or less successfully; or a game where children substitute sounds to create new words etc. Such instances are highly relevant, and you should report your observations in the sketch. In principle, you could further extend this line of research and expand on acoustic measurements. Research has shown that children may be aware of the phonemic structure, but cannot yet realize it, e.g. they may simplify a consonant cluster in the onset to a single consonant, but may nevertheless produce such syllables systematically longer than syllables with a simple onset - thereby showing awareness of the complex onset category, but not yet being able to realize a consonant cluster. This would presuppose a clear audio signal, skills in phonetic analysis, and prior knowledge of the phonetics of the adult language, and we therefore do not include such analyses among the core topics. But it is always possible to conduct in-depth follow-up studies on the basis of the sketch data.

What to do if I find that certain phonemes from the adult languages are missing in the child data?

We can’t use the absence of something to say it’s not acquired, e.g. the phoneme may be infrequent in the adult language (and the children aren’t using it either) and/or no words with the sound in question appear in the recordings (and so the children have no opportunity to produce it). Usually, one would expect to see a developmental trajectory with the production of a phoneme gradually increasing: if, however, a phoneme is completely absent at time 1, but is stably produced at time 2, it might be an accidental gap in the data. If you are interested in pursuing a more in-depth description of the acquisition of particular phonemes, the sketch data will provide a lot of helpful information for designing follow-up experiments and for selecting the ages you could be targeting in the future.

How do you suggest to analyse the acquisition of syllable types?

You could approach this question in a similar way as an analysis of phoneme acquisition: start with a list of adult syllable types, and scan your data for whether the children are regularly producing the respective type at the different age points. Following that, you may turn to the non-target like syllable productions and reflect on what you can tell from those. Maybe it is a reduced consonant cluster that recurs in your data. You can also look for general substitutions, e.g. the child might produce a retroflex nasal as an alveolar one.

Should we investigate the acquisition of stress patterns?

We have not included stress patterns among the core topics to be investigated, because this is notoriously difficult for underdescribed languages: for many languages, we have no clear idea of the stress patterns in the adult language, or, indeed, whether the language has stress in the first place. Under these circumstances, an investigation of the acquisition trajectory of stress patterns is likely to be impossible. However, this is not the case for all languages: in some languages, stress patterns are clear and salient (English and German being famous examples), and/or there is prior research that you can resort to. In such cases, it will be useful to extend your investigation to stress patterns.

It is likely that the sketch corpus captures a biased snapshot of semantic fields - those that are tied to the activities that we happen to have recorded. How do we deal with this?

It is true that vocabulary is highly context-dependent: a child-bathing session will feature other kinds of words than a semi-structured toy-playing session etc. We recommend that you make this transparent and describe the setting for each recording in section 3 of the sketch. We also recommend that you initially spend some time to discuss suitable recording contexts with community members, trying to identify typical contexts that children participate in (see section 2.2.1 of The SAM, Part I). If you settle on semi-structured contexts, we have had good experience with using play dough, as this allows carers and children to create culturally salient items.

Should we describe the use and distribution of function words in child-directed and child language?

When describing lexical development, where should we put the focus?

In languages where lexemes can occur both as nouns and verbs, it may be difficult to be confident about the syntactic category of lexemes used by young children. What can we do?

Syntactic category is often assigned on the basis of the adult language, but this is not an option for such languages. In such cases, we recommend starting with the older children (where morphology and syntax will help determine the syntactic category), and then work your way backwards to the younger children, noting down problematic issues and discussing this in the sketch. If you have more time and/or resources at hand, you could analyze the context of the utterance (e.g. children are talking about actions or objects), and/or resort to information about frequencies (if it exists).

In multilingual settings, should we analyze the development of lexemes separately for each language?

We recommend that you code for the language of the lexeme (in the sketch corpus, when compiling concordance lists etc.), but it then depends on the available amount of data whether or not it is possible to analyze them separately. Minimally, you can describe your observations, e.g. whether the lexicon comes predominantly from one language, whether certain semantic fields are predominantly associated with one language, whether children tend to use both languages to express the same concept etc. If you are working in a community that is highly multilingual, you may for instance also have a look at the approach and especially the Little Kids' Word List from Carmel O'Shannessy's project tracking language development paths in such communities.

How should a type-token-ratio be calculated if there are homophonous elements or elements for which grammarians are divided over their interpretation?

We recommend coding for both interpretations, as this allows you to find evidence for whether children treat them the same/differently, and whether this changes over time. Given limited data and resources, you will have to focus on a small number of grammatical domains. Nevertheless, we recommend coding for all inflections (not only those that you focus on), as this will allow you to provide quantification for various questions and contextualize the focus inflections within the overall development of inflectional morphology.

What do you count as repetitions? And do you include repetitions in your counts, e.g. when counting the mean length of utterances (MLU)?

We are basically looking at identical repetitions in adjacent utterances and/or occurring within a short time span. There are both arguments for and against counting repetitions. We are following the guidelines from the CHAT protocol which essentially specifies that repetitions are not included. In any case, you tag whether there is a repetition, so you can later filter for this criterion. For more information, view the CHAT manual here.

How to count morphemes that contain multiple meanings?

Such portmanteau morphemes are counted as a single morpheme, as the different meaning components cannot be assigned to different forms. However, we recommend coding for the context. E.g., a single morpheme may express information on gender, number and case - it is thus not clear whether a non-target-like use is due to the child's developing understanding of gender, number and/or case. However, if you code for contextual factors (e.g., morpheme X was used in a context that required a feminine singular accusative form), you may be able to detect patterns.

Are statistical analyses performed on any of the counts?

When analysing variation sets, how to identify which parts of a conversation still belong to the same set?

There are different ways of approaching this question in the literature. While we oriented towards the definitions of Küntay and Slobin (1996) and Waterfall (2006), you should choose the one which makes sense for your work. For instance, you may adopt a definition of a maximum of two utterances between the two instances. Further criteria may be that the communicative context or intent is the same. Depending on the definition, functional variation and variation sets can be differentiated.

Frequently Asked Questions