Skip Navigation

ECLS Tips Sheets

How to Obtain and Access ECLS Data PDF File

Identifying and Locating ECLS Variables PDF File

The ECLS Approaches to Learning Items PDF File

Frequently Asked Questions

General

When using a Mac (i.e., Apple Macintosh) operating system, access to the ECLS data varies by dataset. Many analysts use syntax files created with the ECLS's Electronic Codebook (ECB) software in conjunction with the ASCII file containing the raw ECLS data to create their analytic data files. Since the ECB software is not compatible with Mac operating systems, there are several alternatives available for Mac users who wish to create ECLS files for analysis.

Creating a customized data file with variables of interest (recommended due to the size of the full data file)

  • ECLS-K: Mac users can create data files with their variables of interest by using the NCES Online Codebook. Alternatively, the study's file record layout can be used as a reference for editing the full syntax file (available at https://nces.ed.gov/ecls/pdf/ECLSK_file_record_layout.pdf) to include only selected variables of interest.
  • ECLS-B: The ECLS-B file record layout (available at https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010011) can be used as a reference for editing the full syntax file to include only selected variables of interest, but accessing the ASCII data file requires a restricted-use data license. There is no online codebook available for the ECLS-B.
  • ECLS-K:2011: The ECLS-K:2011 file record layout can be used as a reference for editing the full syntax file (available at https://nces.ed.gov/ecls/pdf/K5_file_record_layout.pdf) to include only selected variables of interest. Currently, there is no online codebook available for the ECLS-K:2011.

Creating a full ECLS data file (not often recommended due to the large size of the resulting data file)

  • ECLS-K and ECLS-K:2011: Mac users can create full data files for the ECLS-K and ECLS-K:2011 by using the syntax files provided on the Data Products page.
  • ECLS-B: A full data file for the ECLS-B can be created using the syntax files sent by the IES Data Security Office to researchers who are approved to use the ECLS-B restricted-use files.

For further guidance about getting started with the ECLS data, please refer to the Data Products page. Additional guidance is provided in the How to Obtain and Access ECLS Data tip sheet available above.

There are two approaches to load ECLS data into R.

Option 1
The first approach requires two steps. First, researchers can use the downloadable SPSS/SAS/Stata syntax from the Data Products page to read the ASCII file into SPSS/SAS/Stata and create a full data file. Next, those data can be read into R. For example, researchers can use the "read.spss" function from the "foreign" R package or the "read_sav" function from the "haven" R package to read a full data file with a .sav extension. More information about the function in the "foreign" R package is available here. More information about the function in the "haven" R package, which imports and exports SPSS, Stata, and SAS files, is available here.

Option 2
A second approach is to use the "read" function in the "EdSurvey" R package to read in the data (e.g., readECLS_B, readECLS_K1998, readECLS_K2011). Researchers can then use functions within the EdSurvey package to conduct analysis. Researchers can also extract a dataset of variables with the function "getData" for further manipulation and analysis. Detailed information about using the EdSurvey package to analyze ECLS data is available online. The EdSurvey team can provide additional support if needed.

Links to non-governmental websites and the mention of trade names, commercial products, or organizations do not imply endorsement by the U.S. Government.

The ECLS kindergarten direct child assessments were not designed as a measure of children's kindergarten readiness. The ECLS assessments are not tied to any specific curriculum or standards. They were designed to be a broad measure of the knowledge and skills that children would have in a particular grade. The assessment frameworks guiding the development of the assessments for each domain are further discussed in the study documentation.

Additionally, NCES does not have a standard definition of "kindergarten readiness," and the ECLS data files do not contain any composite variables developed to serve as indicators of this construct.

However, the ECLS program studies do collect data that analysts can use to create their own kindergarten readiness measures based upon their own operational definitions. For example, all of the ECLS studies have collected data in a fall kindergarten round both directly from the child, as well as indirectly from parents and teachers, on skills that are typically associated with school readiness.

NCES does not provide recommendations to analysts on how to proceed with research using ECLS data to investigate school readiness; it is up to analysts to operationalize their definition of kindergarten readiness and identify which ECLS data best align with their research. Researchers may be interested in exploring the existing literature base of publications using the ECLS data to see how other researchers have examined kindergarten readiness using the NCES Bibliography Search Tool.

Yes, children with limited English skills participated in the ECLS-K, ECLS-B, and ECLS-K:2011 studies. These children will be included in the ECLS-K:2024 as well.

All of the ECLS studies were designed to assess children with limited English skills using the study's direct child assessments. The purpose of the procedures used to measure children's English language skills was to determine whether the children could be fairly assessed in English or whether they should be administered the assessments in a non-English language when available. The studies' assessments are not indicators of whether the children were proficient in English according to criterion-referenced objective measures of proficiency. As a result, study children's scores on the ECLS language measures may not align with schools' identification of students as English language learners.

For the kindergarten cohorts with released data, the ECLS-K and ECLS-K:2011, this was accomplished by administering a language screener as the first component of the direct cognitive assessment in the kindergarten and first-grade rounds of data collection. The screener determined whether children understood English well enough, for example whether they understood what they were being asked to do, to receive the full direct child assessment in English. In the ECLS-K, children who did not pass an established cut score on the language screener received a reduced version of the ECLS-K assessments as described in section 2.1 of the ECLS-K Base Year Public-Use Data Files User's Manual (NCES 2001029REV). In the ECLS-K:2011, all children completed the screener as well as the first 18 items of the reading assessment in English, regardless of their home language or performance on the screener. These 18 items, plus two items from the preLAS “Art Show” task (a total of 20 items), make up the section of the reading assessment referred to as the English basic reading skills (EBRS) section because they measure such skills. Spanish-speaking children who did not achieve at least the minimum score on the screener were then administered a short reading assessment in Spanish that measured Spanish early reading skills (SERS), as well as the mathematics and executive function assessments that had been translated into Spanish. Children whose home language was one other than English or Spanish and who did not achieve at least the minimum score on the screener were not administered any of the remaining cognitive assessments beyond the EBRS.

The screener was not administered after first grade in either the ECLS-K or ECLS-K:2011 because, by then, nearly all of the children had demonstrated sufficient English language skills to complete the full assessment battery in English.

However, the way that English language proficiency was measured, as well as which study components children were administered based on their score on the ECLS screener, differed between the ECLS-K and the ECLS-K:2011 in three ways:

  1. The language screener used in the ECLS-K consisted of three subtests of the preLAS 2000: “Simon Says,” which measured listening comprehension; “Art Show,” which was a picture vocabulary assessment; and “Let's Tell Stories,” which evaluated the child's natural speech. Together these three subtests were referred to as the Oral Language Development Scale (OLDS). The OLDS was administered only to children who were identified by the school as language minority (i.e., children who spoke a language other than English at home).

    In the ECLS-K:2011, only two of the preLAS subtests, “Simon Says” and “Art Show,” were included in the English language screener. Also, this screener was administered to all children in the ECLS-K:2011, as opposed to only children identified by the school as language-minority children. This shortened screener served mainly as a warm-up for children whose home language was English because the items were relatively easy for a native English speaker.

  2. In the ECLS-K, only children who achieved at least a minimum score on the English language screener were administered the remaining cognitive assessments in English.

    In order to capture the beginning English reading skills of English language learners (ELL), all children in the ECLS-K:2011 were administered a set of items from the main English reading assessment, regardless of their performance on the preLAS subtests. These items measured English basic reading skills (EBRS), and associated EBRS scores are available on the ECLS-K:2011 data file.

  3. The ECLS-K did not include a Spanish reading assessment. In contrast, Spanish-speaking children in the ECLS-K:2011 who did not achieve at least a minimum score on the preLAS subtests were administered a reading assessment in Spanish to measure their Spanish early reading skills (called the Spanish Early Reading Skills assessment, or SERS) after they were administered the EBRS. Scores associated with the SERS are available in the ECLS-K:2011 data file.

Please refer to the ECLS-K and ECLS-K:2011 study documentation, such as the user's manuals and psychometric reports available on the Publications and Products page, for more detailed information.

While all ECLS studies overlap in time points (i.e., all have kindergarten data collections) and measures (e.g., similar direct child assessment and parent interview/survey constructs in the kindergarten round of data collection), the ECLS-B was different in that it followed a nationally representative sample of children born in 2001 from birth through kindergarten. Therefore, the ECLS-B's procedures to include non-English speaking children and families differed from those used in the kindergarten cohort studies. At 9 months and 2 years, the child assessments were administered in the child's primary language, and when this was not English, the administration was done either by a bilingual assessor or with the assistance of a translator. At preschool and kindergarten, children's English language skills were assessed at the beginning of the cognitive assessments. Children who did not demonstrate sufficient English language skills to complete the assessment in English were either routed to a comparable assessment in Spanish (if Spanish-speaking) or excluded from the cognitive assessments. Regardless of English language skills, all study children were administered the motor items (via a translator if needed) and their physical measurements were taken.

The level of English language proficiency that children needed to demonstrate to be included in the ECLS-B assessments administered in English was set very low; as a result, relatively few children were excluded from the cognitive assessments administered in English. Also, all children's English language skills were reassessed at kindergarten regardless of their demonstrated proficiency at preschool. Consequently, children who were excluded at preschool were administered the cognitive assessments in English at kindergarten if they demonstrated sufficient English language skills when they were reassessed at the beginning of the kindergarten assessment.

Please refer to the ECLS-B's Data Collection Procedure page or the Distance Learning Dataset Training (DLDT) module on the ECLS-B for more detailed information.

Yes, the ECLS studies collect information on children's participation in child care and early childhood education (ECE) programs, including participation prior to kindergarten. There are some differences across studies in who reports this information and the time periods covered, as discussed further below. The primary reporters of information about child care/ECE arrangements are the children's parents. Among other characteristics about children's child care arrangements, parents are asked if the care that the child received was provided in a private home by a relative (someone other than the child's parents who was related to the child), provided in a private home by a nonrelative (caregivers such as home child care providers, regular sitters, or neighbors), or provided in a center-based care setting (e.g., care in a day care center or a before- or after-school program that was at the child's school or another location), and whether any care received was through a Head Start program. In the kindergarten rounds of the ECLS-K and ECLS-K:2011, and planned for the ECLS-K:2024, parents were asked to report retrospectively about children's care and education arrangements in the year prior to kindergarten. In the ECLS-B, during each wave of data collection, parents were asked about their child's current regular early care and education experiences, although in the final round parents were only asked about before- and after-school care. Finally, in all ECLS studies, parents were asked for information about children's participation in before- and after-school care arrangements when the child was in elementary school.

In addition to the parent interviews, the ECLS studies completed to date also collected child care and early childhood education information from other sources.

  • The ECLS-K administered a Head Start Center Questionnaire as part of a Head Start Verification Study which (1) identified which of the children reported by either their parents or their schools as having attended Head Start the year prior to kindergarten did indeed attend a Head Start program and (2) evaluated the process of identifying Head Start participation through parent and school reports and provided further information on the actual process of verifying these reports. More information about the Head Start verification data is available in the ECLS-K Base Year Restricted-Use Head Start File (NCES 2001–025) documentation (Tourangeau, Burke, et al. 2001a).
  • In the ECLS-B, beginning when the children were 2 years old, children's care and early childhood education providers were asked to provide information about their own experience and training and their setting's learning environment. Additionally, in the 2-year and preschool rounds, a subsample of ECLS-B children in regular nonparental child care and education arrangements had their settings observed to obtain information on the quality of those arrangements. During the ECLS-B kindergarten rounds, the before- and after-school care providers of children enrolled in kindergarten were asked to provide information about their own experience, their training, and their setting's learning environment.
  • The ECLS-K:2011 fall kindergarten parent interview included items to identify the child's current before- or after-school care provider from whom data about the child's care arrangement could be collected. The kindergarten Before- and After-School Care (BASC) questionnaires collected important information about children's before- and after-school care settings, including their learning experiences in nonparental care with regular before- or after-school care providers. Adults other than the child's parents/guardians who cared for the study child for at least 5 hours per week were asked to provide information, such as the location where care was provided, children's activities while in care, characteristics of other children in care, and their own background and experience.

The user's manuals available on the Publications and Products page have more detailed descriptions of the topics on child care and early childhood education included in each study. To view the actual questions asked of parents and providers, please see the Instruments and Assessment pages for the ECLS-K, ECLS-B, and ECLS-K:2011 studies.

Yes. In addition to the direct child cognitive assessments, the ECLS program studies collect a variety of direct and indirect measures of children's socioemotional, physical, and psychomotor development. An overview of these measures is included in each of the studies' user's manuals, whereas a more detailed description of the design, development, administration, quality control procedures, and/or psychometric characteristics of these measures is available in the psychometric reports on the ECLS Publications and Products page. Brief summaries of these measures are included below, by study.

ECLS-B

Socioemotional Development: When the ECLS-B children were about 9 months of age, the study measured children's socioemotional development (e.g., social skills, emotion regulation) directly using the Nursing Child Assessment Teaching Scale (NCATS) and indirectly through parental reports. When the children were about 2 years old, their socioemotional development was measured directly using a semi-structured play activity (the Two Bags Task) and indirectly through parent and provider reports. Additionally, at 2 years, attachment quality was assessed using a computerized Q-sort task, the Toddler Attachment Q-Sort (TAS-45), which was completed by the interviewer after the home visit. In the preschool collection, children's socioemotional development was measured directly again using the Two Bags Task and indirectly through parent and provider reports. During the kindergarten rounds of data collection, children's socioemotional development was assessed indirectly through parent and teacher/provider reports.

Physical Development: Children's length/height, weight, and middle-upper arm circumference (MUAC) were measured directly during the home visit at each wave of the ECLS-B. Head circumference was measured for children who were born with very low birth weight.

Psychomotor Development: Children's fine and gross motor skills were assessed in the ECLS-B using items from the Bayley Short Form — Research Edition at 9 months and 2 years. Parents of toddlers were asked about delays in learning to walk and disabilities that affect movement. At preschool and kindergarten, fine motor skills were assessed by asking children to copy a series of forms/shapes drawn by assessors and to build structures using blocks. Also at preschool and kindergarten, gross motor skills were assessed by asking children to jump, balance on one foot, hop on one foot, skip, walk backward along a line, and catch a bean bag. Parents were asked to report on their child's limb use and coordination relative to their same-age peers and whether or not their child had been evaluated for issues with their limbs or coordination.

ECLS-K

Socioemotional Development: The ECLS-K included measures of socioemotional development that focused on aspects of social competence, including social skills (e.g., social interaction, attentional focus, and self-control) and problem behaviors (e.g., internalizing and externalizing problem behaviors). One of the main instruments for measuring children's socioemotional development in the ECLS-K was the Social Rating Scale (SRS), which included a combination of items taken directly from Gresham and Elliott's Social Skills Rating System (SSRS), items adapted from the SSRS, and items developed specifically for the ECLS-K, such as the “Approaches to Learning” items. The SRS was completed by teachers (in kindergarten, first grade, third grade, and fifth grade) and parents (in kindergarten and first grade only). Parents and teachers were the primary sources of information on children's social competence and skills in kindergarten and first grade. In the third- and fifth-grade years, children provided information about themselves by completing a short self-description questionnaire (the SDQ or Self-Description Questionnaire I) (Marsh 1990). On the SDQ, children rated their perceptions of their competence and their interest in reading, mathematics, and school in general. They also rated their popularity with peers and competence within peer relationships, as well as other aspects of socioemotional development (e.g., anxiety). For eighth grade, a new version of the SDQ was developed using items from a published instrument designed to be used with adolescents (Self-Description Questionnaire II) (Marsh 1992). In addition, two scales in the ECLS-K eighth-grade self-administered paper-and-pencil student questionnaire, which were adapted from the National Education Longitudinal Study of 1988 (NELS:88), measured students' self-concept and their perceptions of how much control they had over their own lives. The eighth-grade questionnaire also included questions about students' school experiences, their activities, and their perceptions of themselves.

Physical Development: The ECLS-K included physical measurements of height and weight at all rounds.

Psychomotor Development: During the fall of kindergarten, the ECLS-K included a psychomotor assessment that was administered to gauge children's fine and gross motor skills. Fine motor skills were assessed by asking children to copy a series of forms/shapes drawn by assessors and to build structures using blocks. Gross motor skills were assessed by asking children to jump, balance on one foot, hop on one foot, skip, walk backward along a line, and catch a bean bag. Parents were also asked to report on their child's limb use and their coordination relative to their same-age peers and whether or not their child had been evaluated for issues with their limbs or coordination.

ECLS-K:2011

Socioemotional Development: The ECLS-K:2011 included measures of socioemotional development that focused on aspects of social competence, including social skills (e.g., social interaction, attentional focus, and self-control) and problem behaviors (e.g., internalizing and externalizing problem behaviors). The Social Rating Scale (SRS) initially developed for the ECLS-K and administered in the ECLS-K:2011 included a combination of items taken directly from Gresham and Elliott's Social Skills Rating System (SSRS), items adapted from the SSRS, and items developed specifically for the ECLS-K, such as the “Approaches to Learning” items. The SRS was completed by ECLS-K:2011 teachers (in all rounds of data collection) and parents (in the kindergarten and first-grade rounds only). Teachers also reported on closeness and conflict between themselves and individual study children (in kindergarten, first grade, second grade, and third grade). Parents and teachers were the primary sources of information on children's social competence and skills in kindergarten, first grade, and second grade. In the third, fourth, and fifth grades, ECLS-K:2011 students completed a self-administered questionnaire themselves. This self-administered questionnaire included a short self-description questionnaire (the SDQ or Self-Description Questionnaire I) (Marsh 1990). On the SDQ, children rated their perceptions of their competence and their interest in reading, mathematics, science, and school in general. The child questionnaire also asked children to report on their relationships with peers; social anxiety/fear of negative evaluation; loneliness; occurrences of peer victimization; grit; worry/stress about school; life satisfaction; behavioral engagement; school belonging; and prosocial behavior.

Physical Development: The ECLS-K:2011 included physical measurements of height and weight at all rounds. Additionally, in the fall of second grade, spring of third grade, and spring of fifth grade, a subsample of children had their hearing evaluated.

Psychomotor Development: Psychomotor development was not directly assessed in the ECLS-K:2011. However, parents were asked in each spring data collection to report on their child's coordination relative to their same-age peers and whether their child had been evaluated for issues with their limbs or coordination. Parents were also asked to report on the types of exercise or physical activity their child engages in, either with their parents (e.g., sports or calisthenics) or as their own activity (dance, soccer, jumping rope, etc.).

Researchers are strongly encouraged to obtain the most recent release of the data for the ECLS study they are interested in analyzing, regardless of whether their research interests include variables from the most recent or multiple rounds of data collection. The final public-use ECLS-K Kindergarten–Eighth Grade data file and the final public-use ECLS-K:2011 Kindergarten–Fifth Grade data file are available on the Data Products page. The final ECLS B 9-month–Kindergarten 2007 data file can be obtained from the IES Data Security Office by data users with an IES Restricted-Use Data License. Information about applying for a restricted-use data license can be found at https://nces.ed.gov/statprog/instruct.asp.

These final longitudinal data files include all released data for all cases that ever participated in the study. An additional reason to use each of the ECLS studies' final longitudinal data files rather than prior file releases is that they include important updates, data recalibrations, and/or corrections to errors discovered in previously released data. Data addenda, data anomalies, errata, and data considerations known at the time of publication are detailed in the data file user's manuals associated with each of the studies. The data file user's manuals for the ECLS-K and ECLS-K:2011 are available on the Publications and Products page, while the data file user's manuals for the ECLS-B are distributed by the IES Data Security Office with the data file. Please note that while it is important to use the most recent data files, researchers will likely need to refer to previously released user's manuals when analyzing data and variables derived from earlier rounds of data collection, as the primary focus of each manual is the round of data collection with which it was released.

ECLS studies provide various types of geographic or geocode information about children's homes, schools, and/or child care sites, such as the state-level Federal Information Processing Standard (FIPS) code, county-level FIPS code, Zone Improvement Plan (ZIP) code, ZIP Code Tabulation Areas (ZCTAs), and census tract code. These geographic area codes can be used to link the ECLS data to other geography-based data, such as Census data. While each ECLS study offers a different amount of geographic or geocode information, most of the information is available only in restricted-use data files to analysts with an IES Restricted-use Data License. Information about applying for a restricted-use data license can be found at https://nces.ed.gov/statprog/instruct.asp. Analysts interested in using the geographic code information to link to other data should clearly specify that in their restricted-use license application.

It should be noted that both the ECLS-B and ECLS-K:2011 include the detailed geographic information in their respective main restricted-use file (RUF), while the ECLS-K has several geocode data files with additional geographic/geocode information. These restricted-use geocode data files, which are supplemental to the main ECLS-K RUF, are distributed as “Census and Geocoded Location Data for the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K)” to restricted-use license holders upon request from the IES Data Security Office. In addition to the variables listed in the table below, the ECLS-K geocode data files include about 600 Census (or Census-derived) sociodemographic variables for the people living in each census tract or ZCTA, including income, race/ethnicity, and many others. Supporting documentation is distributed with the ECLS-K Census and Geocoded Location Data and consists of a user's manual, data file record layouts describing the variables on each of the ASCII data files, and SAS syntax files that can be run to generate an extract data file from the ASCII data file.

Please note that while restricted-use data may include some state identifiers, the samples for the ECLS studies were not designed to be representative of children, parents, teachers, or schools in particular states or to support state-, city-, or county-level estimates.

The table below lists the types of geographic or geocode information available, and when the information was collected, for each of the ECLS studies, along with the corresponding variable names.

Geocode Data Availability by Type and ECLS Study

  ECLS-B1 ECLS-K2 ECLS-K:20113
School FIPS State Code   R#FIPSST (#=3 through 7) F#FIPSST (#=1 through 9)
School FIPS County Code   R#FIPSCT (#=3 through 7) F#FIPSCT (#=1 through 9)
School Census Tract Code   R7CENTRC
tract
F#CENTRC (#=1 through 9)
School Census Block Code   R7CENBLK  
School Census Region     X#REGION (#=1 through 9)
School ZCTA Code   zcta  
School ZIP Code   R#SCHZIP (#=3 through 7)
rzip
F#SCHZIP (#=1 through 9)
School Locale Code   KURBAN_R4
R#URBAN5 (#=3 through 7)
R#LOCALE6 (#=3 through 7)
X#LOCALE7 (#=1 through 9)
School Latitude   R7SCLAT
latitude
 
School Longitude   R7SCLNG
longitude
 
Home FIPS State Code   P7FIPSST  
Home FIPS County Code   P7FIPSCT  
Home Census Tract Code   P7CENTRC
tract# (#=1 through 5)
P#CENTRC (#=1 through 9)
Home Census Block Code   P7CENBLK  
Home ZCTA Code   zcta# (#=1 through 5)  
Home ZIP Code X#HHZIP (#=1 through 5) R#HOMZIP (#=3 through 7)
rzip# (#=1 through 5)
P#HOMZIP (#=1 through 9)
Home Locale Code X#HHLOCL (#=3, 4, 5)    
Child Care ZIP Code X#CCZIP (#=2, 3, 4)    
Child Care Locale Code X#CCLOCL (#=3, 4)    
Wrap-Around Care ZIP Code X#WCZIP (#=4, 5)    
Wrap-Around Care Locale Code X#WCLOC (#=4, 5)    

1 The # sign in this column refers to the data collection round number in ECLS-B: 1 = 9-month round; 2 = 2-year round; 3 = preschool round; 4 = kindergarten 2006 round; 5 = kindergarten 2007 round. Text not in italics or bold refers to information available in the ECLS-B restricted-use file (RUF).

2The # sign in this column refers to the data collection round number in ECLS-K: 1 = fall kindergarten; 2 = spring kindergarten; 3 = fall first grade; 4 = spring first grade; 5 = spring third grade; 6 = spring fifth grade; 7 = spring eighth grade. Variable names in italics in this column identify variables that are available in the ECLS-K geocode data files. Variable names in bold identify variables that are available in the ECLS-K restricted-use file (RUF) and the ECLS-K public-use file (PUF). Variable names that are not in italics or bold identify variables only available in the ECLS-K RUF.

3The # sign in this column refers to the data collection round number in ECLS-K:2011: 1 = fall kindergarten; 2 = spring kindergarten; 3 = fall first grade; 4 = spring first grade; 5 = fall second grade; 6 = spring second grade; 7 = spring third grade; 8 = spring fourth grade; 9 = spring fifth grade. Text in bold in this column refers to information available in the ECLS-K:2011 restricted-use file (RUF) and the ECLS-K:2011 public-use file (PUF). Text not in bold refers to information available only in the ECLS-K:2011 RUF.

4 School location type in the base-year sampling frame.

5 School location type: 7-category version, collapsed categories in PUF, full categories in RUF.

6 School location type: 8-category version, collapsed categories in PUF, full categories in RUF.

7 School location type: 12-category version, collapsed categories in PUF, full categories in RUF.

Weights are used to adjust for disproportionate sampling, survey nonresponse, and undercoverage of the target population when analyzing complex survey data such as the ECLS data. Including the weights in your analyses also corrects for the overrepresentation of certain groups that were purposely oversampled in the ECLS studies and, consequently, produces more accurate estimates. Estimates produced in analyses that do not adjust for nonresponse, undercoverage, and oversampling may be biased if the characteristics of the nonresponding, underrepresented, or oversampled groups are related to the outcomes being studied. Weights are also necessary to produce national-level estimates of kindergarten children in 1998–99, of kindergarten teachers in 1998–99, and of schools educating kindergartners in 1998–99 (for the ECLS-K); of children born in 2001 (for the ECLS-B); and of kindergartners in 2010–11 and of schools educating kindergartners in 2010–11 (for the ECLS-K:2011). Note that weights should still be used when analyses are focused on a subgroup (e.g., only students who are English language learners) to ensure that the results generalize to that focal subgroup in the full population.

Another sampling design issue that necessitates the use of weights is that subsampling of the full sample was used in different ways to reduce field costs in the ECLS-K and the ECLS-K:2011 studies. Including the appropriate weights in analyses that use data collected from the subsamples adjusts for the subsampling designs and, consequently, produces more accurate estimates.

In the ECLS-K, the fall first-grade data collection was conducted with all eligible students from a 30 percent subsample of schools that participated in the base year. In that study, subsampling was also used for some children (i.e., children not part of a protected group as specified in the study's sampling plan) who moved from their original sampled school after the base year. In the ECLS-K fifth-grade and eighth-grade rounds, subsampling was done to determine eligibility for the mathematics and science teacher questionnaires, such that about half of the children were subsampled to have mathematics teacher data and the other half to have science teacher data. For the ECLS-K:2011 subsampling, the fall first-grade and fall second-grade data collections were conducted with all eligible students from a 30 percent subsample of schools that participated in the base year. As in the ECLS-K, in the ECLS-K:2011 subsampling was also used for children (again, children not part of a protected group) who moved from their original sampled school after the base year. Finally, in the fourth and fifth grades, subsampling for mathematics and science teacher data was done as described for the ECLS-K. The ECLS-K and ECLS-K:2011 data files include weights that adjust for these subsampling designs and should be included in analyses.

The ECLS data file user's manuals describe the calculation and use of sample weights. The manuals also provide information on the weights available and guidance on how to choose the best one for analysis. Usually the best approach for choosing a sample weight for a given analysis is to select one that maximizes the number of sources of data included in the analyses for which nonresponse adjustments are made, which in turn minimizes bias in estimates while maintaining as large an unweighted sample size as possible.

In addition to the study documentation, NCES provides multiple online modules in the Distance Learning Dataset Training (DLDT) system that address weighting. There is a common module about analyzing NCES complex survey data generally as well as study-specific modules that explain weighting and variance estimation for the ECLS-B, ECLS-K, and ECLS-K:2011 studies.

Analysts are usually encouraged to use the same weight for all analyses included in a research paper, even when there is a different ideal weight for each analysis. Weights are assigned to cases with valid data associated with the component(s) contributing to the weight. Therefore the application of different weights results in analyses that have been run with different analytic samples (i.e., the exact cases contributing to the analyses). Using the same weight for all analyses in a research paper ensures that the analyses are all run using the same overall analytic sample (noting that item-level missing data in each analysis may result in the exclusion of additional cases for each of the analyses).

However, there can be exceptions to this general guidance. For example, the ECLS-K sample is representative of the population of children enrolled in kindergarten in the United States. during the 1998-99 school year; in the base year, the data are also representative of kindergarten teachers and schools with kindergarten programs or educating kindergarten-age children in ungraded settings. Similarly, the ECLS-K:2011 sample is representative of the population of children enrolled in kindergarten in the United States during the 2010-11 school year; in the base year, the data are also representative of schools with kindergarten programs or educating kindergarten-age children in ungraded settings. Analyses that examine different populations (i.e., children, teachers, or schools) should use the corresponding weights to produce estimates from the sample representative of the target population.

The decision on whether to use the same weight or different weights for a set of analyses is ultimately left to the researcher's discretion.

No. The ECLS-K, ECLS-B, and ECLS-K:2011 samples are designed to support national and regional estimates. They are not designed to estimate characteristics of children, families, and schools at or below the state level.

ECLS-B

Were children with disabilities sampled in ECLS-B?

Children with disabilities and special health needs were included in the ECLS-B sample. NCES and the Office of Special Education Programs (OSEP) conducted an evaluation of the feasibility of oversampling young children with disabilities and special health needs and determined that such an oversampling could not be pursued. However, the ECLS-B oversampled twins and infants born with low and very low birth weight. Low birth weight and short gestation (common among twins) are strongly associated with early developmental difficulties and special health needs. Additionally, information is gathered on early health and developmental disabilities and on the services children receive. The ECLS-B aimed to be as inclusive as possible, and assessments were designed to include children of all levels of ability and skill. Exclusion from the assessments was considered on a case-by-case basis; in most cases where an exclusion occurred, a child was excluded only from certain components of the assessment rather than from the entire assessment. For the 9-month and 2-year data collections, all children were included in the untimed one-on-one assessments. Accommodations were made (and documented) when necessary. For the preschool and kindergarten data collections, most children were included in the untimed one-on-one assessments. However, children who required Braille or sign language were not administered the cognitive assessments, though they did participate in the motor and physical assessments with accommodations. Children who were in a wheelchair did not participate in the gross motor assessments and accommodations were made in order to obtain their physical measurements. The special needs of all other children who had them were accommodated (e.g., children who normally used some type of assistive device were allowed to use the device during the assessment).

What instruments were used to assess children's cognitive development?

Information on children's development was captured directly from the children themselves and indirectly through a parent/primary caregiver interview. Both the direct and indirect child assessments include measures developed specifically for the ECLS-B and measures taken from other well-established and/or standardized assessments.

9-month and 2-years collections
The assessment of cognitive development used at 9 months and 2 years was the Bayley Short Form—Research Edition (BSF-R), an adaptation of the Bayley Scales of Infant Development–II (BSID-II). The BSF-R includes a subset of BSID-II items that can be used to approximate children's performance on the full BSID-II. This assessment captures children's babbling, vocabulary, active exploration, understanding of repetitive actions, and problem solving skills.

Preschool and kindergarten collections
The early reading and mathematics direct cognitive assessments used in the preschool and kindergarten collections were similar to the assessments used in the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K). They incorporated items developed for the ECLS-K, as well as items from the following copyrighted assessments:

Peabody Individual Achievement Test-Revised (PIAT-R)
Peabody Picture Vocabulary Test Third Edition (PPVT-III)
PreLAS 2000
Preschool Comprehensive Test of Phonological and Print Processing (Pre-CTOPPP)
Comprehensive Test of Phonological Processing (CTOPP)
Test de Vocabulario en Imagenes Peabody (TVIP)
Test of Early Mathematics Ability- Third Edition (TEMA-3)
Test of Early Reading Ability - Third Edition (TERA-3)
Test of Preschool Early Literacy (TOPEL)
Woodcock-Johnson III and Woodcock-Johnson III-Revised Tests of Achievement

At preschool only, children were asked about their color knowledge using a task developed by the Head Start Impact Study (Color Bears).

In the preschool and kindergarten collections, parents were asked about their children's skills and knowledge of things like colors, letters, and numbers using items from the National Household Education Surveys Program (NHES) questionnaires.

Parents also reported on children's vocabulary using a subset of items taken from the MacArthur Communicative Development Inventory (M-CDI) in the 2-year and preschool collections. Additionally, in the preschool collection, parents were asked about children's conversational language (Leventhal 1999).

How do I choose the most appropriate score for analysis of my research question?

There are many direct cognitive, socioemotional, and physical measure scores available on the ECLS-B data files. For guidance in selecting scores, please see Choosing Scores (472 KB).

What populations of children can be studied with the ECLS-B data?

The ECLS B describes children born in the United States in 2001, with the exception of children born to mothers younger than age 15 and children who died or were adopted prior to the 9-month data collection. Additionally, the ECLS-B sample is large enough to support the analysis of many different subgroups of children. Because the ECLS-B oversampled certain groups of children that are relatively rare in the general population (twins; Chinese, Other Asian and Pacific Islander, and American Indian/Native Alaska children; and children born with moderately low or very low birth weight), reliable estimates can be produced for these groups of children.

What statements can be made about participation in early child care and education programs, and its relation to development?

The ECLS-B data on child care and early education can be used to examine nonparental care and education experiences of the children born in 2001, both at a particular point in time (e.g., when they are 9 months old) and longitudinally. For example, it is possible to compare children's nonparental care and early childhood education experiences before kindergarten to their experiences in before- and/or after-school programs and activities once they enter kindergarten.

All statements about nonparental child care and early education should be made in relation to the experiences of children born in the United States in 2001. The ECLS-B data cannot be used to make statements about nonparental child care and early education in the United States generally or about the population of providers. For example, the data cannot be used to generate estimates of the number of providers in the United States who provide different types of care and early education (i.e., center-based, relative care, or nonrelative care). Because early care and education providers were identified through their link to the ECLS-B sample of children born in 2001, as opposed to being identified through a random sample of providers from a universal list of providers, the sample of providers is not nationally representative.

To study the relationship between early care and education and children's development, it is important to have data both before children receive care and education from persons other than parents and after they begin receiving nonparental care and education. The ECLS-B captures information on children's participation longitudinally, making it possible to compare differences in key child development outcomes before and after experiencing nonparental care and education.

What is the Bayley Short Form—Research Edition (BSF-R)? How does the BSF-R compare to the BSID-II? How does it compare to the BSID III?

The Bayley Scales of Infant Development-Second Edition (BSID-II), which assesses young children's cognitive and motor development, was too long and complex for administration by non-clinicians during the ECLS-B home visit. Consequently, with publisher's permission, the Bayley Short Form—Research Edition (BSF-R) was developed. The BSF R comprises a subset of items from the BSID-II, which can be used to estimate performance on the full BSID-II and yet was feasible to administer in the home by non-clinicians. The subset of items selected to approximate children's performance on the full BSID-II was chosen using Item Response Theory (IRT) modeling. Children's estimated BSID-II scores, derived from their performance on the BSF-R, are on the ECLS-B data file. (The item-level BSF-R data used to estimate the BSID-II are not on the file.)

Creating the BSF-R.

The items administered in the BSF-R were selected based on ease of administration and analysis of the item properties using IRT modeling. A two-parameter IRT model was used (discrimination power & item difficulty level). BSID-II publisher data for the administration of BSID-II items in a standardization sample were obtained and all the items were scaled on one metric. Then items were chosen based on difficulty level and discrimination power. First, the item pool was reduced to those items representing the constructs appropriate for the targeted age range at assessment, were at even intervals for difficulty level, and had a discrimination power of approximately 1. Then, within the reduced pool, items that were simple to administer and straightforward in scoring were chosen. Whenever possible, "twofers" were chosen: these are sets of items that can be scored from one administration (e.g., a child is given a cup and 5 blocks; items "puts one block in cup," "puts 3 blocks in cup," and "puts 5 blocks in cup" can all be scored). Also, items requiring the least amount of materials for administration were preferred.

Once the final set of items was determined, they were organized to approximate the BSID-II age sets. That is, the BSID-II groups items into age sets such that no one child received all the items. A child would begin with the items in his/her age set (e.g., a 9-month-old would begin with the 9-month age set which has items appropriate for ages 8-11 months). If these items were too difficult, the assessor then administered the age set for younger children (e.g., the 8-month-old age set, or even the 7-month-old age set). Conversely, if the items were too easy, the child would be administered an age set for older children (e.g., the 10-month-old age set, or even the 11 month old age set). To approximate the BSID-II age sets, the items chosen for the BSF-R were organized into a Core set (i.e., administered to all), a Basal set (i.e., administered to those who performed poorly on the Core set), and a Ceiling set (i.e., administered to those who performed perfectly or nearly perfectly on the Core set). In this way, determining when to administer the Basal or Ceiling set and which set to administer was straightforward and all children were appropriately challenged and assessed. The BSF-R diverges from the BSID-II primarily in its use of shortened core, basal, and ceiling item sets.

Lastly, the BSID-II uses a 30-item Behavior Rating Scale (BRS) to help interpret children's performance. Nine items were chosen from the BRS and included in the ECLS-B for this purpose. These items do not, however, approximate the full BRS.

Scores.

Children's performance on the BSF-R was used to estimate their performance on the BSID-II through the use of IRT modeling. The standard error of these estimates can be found of the data file (e.g., X1MTLSE, X1MTRSE, X2MTLSE, X2MTRSE). Separate scores were produced for the mental and the motor scales. Three types of scores were generated and can be found on the ECLS-B data file. The scale score (i.e., X1RMTLS, X1RMTRS, X2MTLSCL, X2MTRSCL) represents the number of items a child would have gotten correct on the full BSID-II. It is a straight score and does not take into account prematurity.

Also on the file is the child's ranking relative to other children his/her age in the ECLS-B sample, correcting for prematurity. This ranking is similar to the Developmental Index scores on the BSID-II; this is a standardized score that can be used to compare groups of children. T-scores were used to standardize ECLS-B children's scale scores (i.e., X1MTLT, X1RMTR1, X2MTLTSC, X2MTRTSC). The T-scores have a mean of 50 and a standard deviation of 10. As mentioned above, these scores take into account premature birth. To obtain the child's chronological age at the time of the assessment, the child's birth date was subtracted from the date of the assessment to obtain child age at assessment. In the case of children who were born at least 21 days (i.e., 3 weeks) early, the amount of prematurity (e.g., 4 weeks) was then subtracted from the child's age at assessment. In this way, children born premature were ranked relative to other children at the same developmental age (as opposed to chronological age).

Lastly, the ECLS-B data file includes 20 proficiency probability scores: 10 from the Mental Scale and 10 from the Motor Scale. The individual proficiencies are described in the User's Manual. Each proficiency probability was generated from the child's estimated performance on 4 to 6 BSID-II items and represents the probability that the child has mastered the skill represented by that proficiency. Thus, the proficiency probability scores range from 0 - 1. Scores on a particular proficiency can be averaged across children to produce estimates of mastery rates within population subgroups.

BSF-R v BSID-II.

The BSF-R is a subset of the BSID-II and can be equated with the BSID II using IRT modeling.

  • Like the BSID-II, the BSF R has a Mental Scale and a Motor Scale. While the BSID-II has 178 Mental items and 111 Motor items, the BSF-R has 29 Mental items at 9 months and 33 Mental items at 2 years, and 35 Motor items at 9 months and 32 Motor items at 2 years.
  • The BSID-II groups items in age sets; the BSF-R has a Core set of items that are administered to all children and the supplementary Basal and Ceiling items sets that are administered if needed.
  • The BSID-II generates a raw score or "true" score that is then converted to a standardized score known as the Mental Developmental Index or the Motor Developmental Index (MDI). IRT modeling was used to estimate the BSID-II raw or true scores from the BSF-R. These estimated scale scores and their associated standard errors are on the ECLS-B data file. Additionally, T-scores are used to standardize the raw scores relative to the ECLS-B sample, taking into account prematurity.
  • The ECLS-B also provides proficiency probabilities based on the BSID-II raw scores.

Only BSID-II scores are on the ECLS-B data file; the item-level BSF-R scores used to estimate these BSID-II scores are not available on the file.

BSID-II v. BSID-III.

The BSID-III, published in October of 2005, differs from the BSID-II in that it assesses development in more than just the cognitive and motor domains. It also examines development in areas such as language, adaptive behavior, social development, and emotional development. Additionally, the BSID-III is normed on a more recent population than the BSID-II; the BSID-III uses the 2000 census in stratifying children by age. More information about the BSID-III can be found at www.PsychCorp.com.

How do I access ECLS-B data?

Due to NCES's confidentiality legislation, ECLS-B case-level data are available only to qualified researchers who are granted a restricted-use data license. Information about applying for or amending a restricted-use data license can be found at http://nces.ed.gov/pubsearch/licenses.asp. When presenting analyses, preparing manuscripts, publishing ECLS-B results, or corresponding through email (including with NCES staff), analysts must comply with ECLS-B rounding rules. Specifically, unweighted sample sizes must be rounded to the nearest 50. For example, a cell size of 25 to 74 is rounded to 50, 75 to 124 is rounded to 100, and a cell size less than 25 is denoted by a symbol indicating ''rounds to zero.'' Further, all presentations and manuscripts prepared using ECLS-B restricted-use data must be sent to the NCES Data Security Office (IESData.Security@ed.gov) for disclosure review prior to publication or presentation, as is required by the terms of the NCES restricted-use data license.

A subset of variables from the ECLS-B 9-month data collection is available to the general public in the Data Analysis System (DAS).The DAS allows users to develop tables with weighted estimates from the ECLS-B but does not provide users with access to case-level data. For more information about the DAS, please click on Data Information.

I saved my taglist in the ECB and used the “Extract” function to save a file with the variables I tagged, but when I try to open the file in SPSS/SAS/Stata, I don't see any data. What's wrong?

The ECB does not create a data file. Rather, the ECB creates syntax code that must be run in a statistical software package to generate a data file. The syntax file reads in raw data from the ASCII data file (the file with a .dat extension). In the ECB, there are two “save” steps in the “Extract” procedure. The first step saves the syntax file. In the second step, there is no file that is actually saved. Instead, this step writes a line of code in the syntax file indicating what to name the data file once the syntax file is run and a data file is generated.

What is the difference between resident and nonresident fathers?

The resident father was identified during the parent interview as the person who resided in the household who was either the child's biological/adoptive/step-/foster father or the person identified as the partner or spouse of the parent interview respondent (i.e., was the father figure or played an important role in the child's life). As the partner or spouse of the parent respondent, the person completing the resident father questionnaire could be the child's grandfather if the child's grandmother was the parent respondent, or the male partner of the child's mother.

For the ECLS-B, fathers identified as eligible for the nonresident questionnaire had to be the child's biological father, could not reside in the household with the child, and had to meet one of the following criteria: (1) the father must have seen the child at least once in the last month; (2) the father must have seen the child at least 7 days in the last 3 months; or (3) the father must have been in touch with the child's birth mother at least once a month in the 3 months preceding the parent interview. Contact was defined as a telephone call or an in-person visit. Additionally, the biological mother had to be the respondent for the parent interview and she had to give permission for the nonresident biological father to be contacted.

Are all fathers surveyed in all data waves?

Both resident fathers and nonresident biological fathers were surveyed in the first two waves of data collection (at 9 months and 2 years). Only resident fathers were surveyed during the preschool data collection. No fathers were surveyed during the kindergarten collections unless they were the primary caregiver and responded to the parent interview during the home visit.

What is the difference between teachers, Early Care and Education Providers (ECEPs), and Wrap-Around Early Care and Education Providers (WECEPs)?

In the ECLS B, ‘teachers' refers to the educators who taught the ECLS-B children in kindergarten or higher. Early Care and Education Providers, referred to as ECEPs, provided child care and early education below the kindergarten level. They may have been teachers in a preschool, babysitters, family daycare providers, nannies, or relatives; in short, anyone who was not the child's parent or guardian and regularly provided care and/or education prior to kindergarten for the study child. Wrap-Around Early Care and Education Providers, or WECEPs, were people who provided care and/or education for the ECLS-B children enrolled in kindergarten during the hours before and after school. Like ECEPs, they were a varied group and provided care and/or education in a variety of settings.

When were teachers and early care and education providers (ECEPs/WECEPs) surveyed?

Early care and education providers (ECEPs) were interviewed by telephone during the 2-year, preschool, and kindergarten 2006 data collections. In the 2-year collection, the interview was referred to as the Child Care Provider (CCP) interview. The name was changed to the Early Care and Education Provider (ECEP) interview at preschool to include the more educational settings children tended to be in as they got older (e.g., preschool, nursery school, public pre-kindergarten programs, etc.). Teachers and wrap-around early care and education providers (WECEPs) were surveyed in the two kindergarten data waves. In the first kindergarten collection in 2006, ECEP interviews were conducted by telephone for children who were not yet enrolled in kindergarten or higher and participated in regularly scheduled nonparental care and/or education (e.g., preschool, child care, etc.). Teachers of children who were enrolled in kindergarten or higher were mailed self-administered questionnaires to fill out and return. WECEP interviews were conducted by telephone for children enrolled in kindergarten who received regularly scheduled before- and/or after-school care. The WECEP phone interview was a modification of the ECEP phone interview, tailored to wrap-around settings and programs. While the teacher and WECEP components were included in the second kindergarten collection in 2007, the ECEP interview was not included because almost all of the ECLS-B children were in kindergarten or higher.

What supplemental data are available?

The ECLS B currently has two supplemental datasets associated with it: the Twin Triad dataset and the Reading Aloud Profile – Together (RAPT) dataset.

The Twin Triad file consists of data from the 9-month collection for more than 50 twin pairs. The supplemental Twin Triad study is a modification of the 9-month teaching activity (The Nursing Child Assessment Teaching Scale, or NCATS). In the regular NCATS activity protocol, the parent was asked to interact with his/her child one-on-one and to teach the child one of a selected set of tasks that was slightly beyond the child's current abilities (e.g., how to stack a set of blocks). All the twins in the sample completed the NCATS task individually with their caregiver (usually the mother). However, a subset of twin pairs and their mothers agreed to do the NCATS task a third time, in a triadic interaction such that the mother simultaneously attempted to teach both twins a new NCATS task (one not yet attempted). The Twin Triad file contains information on how the threesome interacted during the teaching task and thus sheds light on how the triad might interact naturally in their day-to-day lives (as triadic interactions may be more common for twins than the one-on-one interactions observed in the dyadic NCATS interaction). The Twin Triad dataset is a supplemental restricted-use dataset and is available to restricted-use license holders upon request to the IES Data Security Office (IESData.Security@ed.gov).

The Reading Aloud Profile – Together (RAPT) data provide detailed information about parents' and children's behaviors while engaged in the joint book reading activity of the preschool Two Bags Task. In the preschool Two Bags Task the parent and child were given 10 minutes to play with the contents of two bags. The first bag contained a book to read and the second bag contained toys with which to play. The RAPT data can be used to examine whether joint book reading behaviors of parents and children vary by family and child characteristics and whether joint book reading behaviors relate to children's early reading competency at preschool and upon entry to kindergarten. The RAPT sample of approximately 800 cases is a random sample drawn from the larger ECLS-B sample, so all the oversamples are represented, though it may not be able possible to study each oversample group individually using the RAPT data due to relatively small subgroup sample sizes. Although the RAPT data were initially released as a separate dataset, they are now included in the longitudinal 9-month—kindergarten 2007 restricted-use data file. For more information on the RAPT, see the Early Childhood Longitudinal Study, Birth Cohort (ECLSB), Preschool–Kindergarten 2007 Psychometric Report (NCES 2010-009) (Najarian et al. 2010).

How do I determine if a case is in one of the supplemental samples?

In order to determine which twins are included in the Twin Triad dataset, one must request the Twin Triad dataset and merge it to the main data file using child ID. In order to determine if a case was included in the RAPT study, one may use any one of the Z3 variables. If a case has valid data for a Z3 variable, then it was included in the RAPT sample.

Can you apply the weights to the supplemental samples to get national estimates?

You cannot apply weights to the Twin Triad sample to obtain national estimates because the sample was not a random sample. Consequently, the findings can only be generalized to twins with caution. The RAPT sample, however, is a random sample of study children with Two Bags Task data. RAPT data can be weighted using a main sample weight adjusted by multiplying by the inverse of the probability of selection (for example, W3R0 * [8,900/800]).

What are the proficiency probabilities on the data file?

The ECLS-B data file includes 20 proficiency probability scores from the 9-month and 2-year direct child assessments: 10 from the Mental Scale and 10 from the Motor Scale. The individual proficiencies are described in the User's Manual. Each proficiency probability was generated from the child's estimated performance on 4 to 6 assessment items and represents the probability that the child has mastered the skill represented by that proficiency. Thus, the proficiency probability scores range from 0 – 1. The proficiency probabilities are informative in that they indicate where growth has occurred (i.e., in what skills), whereas gains in scale scores only indicate that growth has occurred. That is, two children may have both gained the same number of points on the BSF-R mental scale, for example, but in different places. The proficiency probabilities could show that the first child has mastered jabbering expressively, but the second child had already mastered that skill and is working on mastery of expressive vocabulary. Currently, the ECLS B provides proficiency probabilities for the 9-month and 2-year BSF-R mental and motor assessments on the longitudinal 9-month—kindergarten 2007 restricted-use data file. Proficiency probabilities for the preschool – kindergarten cognitive assessments are not currently available.

ECLS-K

What is the difference between restricted-use data and public-use data files?

Several modifications are made to the data on the public-use files in order to reduce the likelihood that any respondent could be identified in the data.

  • Outlier data (i.e., unusual or rare responses) are top- or bottom- coded on the public-use files. For example, the number of kindergarten teachers who did not have at least a bachelor's degree was so small that such teachers are grouped in the same category as teachers who have a bachelor's degree. Bottom and top coding prevents identification of schools, teachers, parents, or children who have unique characteristics without affecting overall data quality. Outlier data appear in their original form on the restricted files.
  • Certain variables with too few cases having valid data or a sparse distribution are suppressed in the public-use files (i.e., no data are reported for those variables) but are available in the restricted-use files.
  • Certain continuous variables are transformed into categorical variables, and certain categorical variables have their categories collapsed in the public-use file. This categorization and collapsing reduce disclosure risk, while still providing data with adequate variability that can be used in many different kinds of analyses, such as regression analysis. Data that are modified in this way on the public files appear in their original form on the restricted files.

Additionally, ECLS-K restricted-use files are cross-sectional; most ECLS-K public-use files are longitudinal.

How will the difference in public-use files and restricted-use files impact analysts?

For most users, the public-use files provide all the data they will need for most analyses. Both the public- and restricted-use files provide data at the individual child level; for the kindergarten round, data are also provided at the teacher and school levels. Overall, few variables have been suppressed on the public files. (Information about which variables have been suppressed can be found in the data file user's manuals. Additionally, all data for suppressed variables have a value of -2 in the data file.)

Some users may find that only the restricted files have the specific data they need. For example, those researchers examining certain groups of children whose representation in the population is relatively small, such as children in special education or children who speak a specific non-English language at home, or researchers interested in examining the types of kindergarten programs offered in schools, will find that the restricted files have more variables related to their topics of interest than the public files do. In many cases, however, even though the detailed information on the restricted-use files may be of interest, the sample sizes are too small for detailed analyses. Before requesting restricted-use data, NCES recommends examining the public-use files to verify if the needs of the researcher can be met using those data files.

The modifications used to reduce the likelihood that any respondent could be identified in the data do not affect the overall data quality.

Will you be collecting and releasing any more data on the ECLS-K sample?

At this time, NCES does not have plans to collect any more data from the students in the ECLS-K cohort or their families. The last round of ECLS-K data was released on the longitudinal kindergarten through eighth grade data file.

NCES has continued its ECLS kindergarten cohort data collections with the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011; https://nces.ed.gov/ecls/kindergarten2011.asp) and the Early Childhood Longitudinal Study, Kindergarten Class of 2023-24 (ECLS-K:2024; https://nces.ed.gov/ecls/kindergarten2024.asp ).

How many children per school did ECLS-K sample?

On average, 23 kindergartners were sampled from each ECLS-K school. In some small schools and early childhood programs offering kindergarten programs, the number sampled was smaller. In some of these smaller schools, the entire population of kindergarten children was selected to participate in the ECLS-K. The average number of children per school decreases with each round of data collection as many children changed schools. For example, approximately one-quarter of children changed schools between kindergarten and first grade, and half of the children had changed schools at least once between kindergarten and third grade.

Did you sample whole classrooms?

No, children in each ECLS-K school were randomly sampled from a list of all kindergartners attending that school. During the design phase of the study, a number of different sample designs were considered and evaluated. The option of sampling entire classrooms was given strong consideration. In the end, this option was not adopted primarily because of the burden such a design would place on the teachers participating in the study and the loss of efficiency associated with an additional level of clustering.

Did the ECLS-K sample include children who were retained in kindergarten? Did it follow children if they were retained in later grades?

Yes, the ECLS-K sample included children who were repeating kindergarten. The base-year sample was composed of children who were kindergartners in the fall of 1998. Approximately 5 percent of these children were in their second year of kindergarten at that time. In addition, about 5 percent of the children who were first-time kindergartners in the fall of 1998 repeated kindergarten during the second year of the study (school year 1999-2000), when the majority of the sample was in first grade.

The ECLS-K did follow children who were retained in later grades. For instance, in the spring of 2002, when most children (89 percent) were in third grade, about 10 percent were in second grade and around 1 percent were in other grades (e.g., first or fourth grade). In the eighth-grade round of data collection (spring 2007), about 87 percent of the ECLS-K cohort was in eighth grade and about 13 percent was in a lower grade. Less than half a percent of the cohort was in a grade higher than eighth.

Were children with disabilities sampled for the ECLS-K?

Yes, children with disabilities were included in the sample for the ECLS-K, though disability status was not used as a sampling characteristic at the time of sampling (i.e., children with disabilities were not sampled at different rates than were children without disabilities). Many of the children in the sample were identified as needing and began receiving special education services over the life of the study. Thus, the sample of children receiving special education services increased in size between kindergarten and eighth grade.

How did the ECLS-K identify children with disabilities?

All children with disabilities who meet federal eligibility requirements are expected to participate in special education programs or receive special education services through the school. During data collection, ECLS-K project staff asked schools whether the sampled children had an Individualized Education Plan (IEP), an Individualized Family Service Plan (IFSP), or a 504 plan on file with the school district. Once children were identified as receiving special educational assistance due to a disability, field supervisors identified what accommodations, if any, needed to be made in order to administer the direct child assessment battery to them appropriately. The special education teachers of children with an IEP, IFSP, or 504 plan were asked to complete questionnaires about their background and the services provided to the children and their families. Additionally, parents were asked a series of questions about children's health and disabilities in the parent interview.

What information was collected from the teachers and parents of disabled children?

Parents and teachers of children with disabilities were asked the same questions that parents and teachers of children without disabilities were asked. The parent and teacher instruments did contain additional items that asked about the services children with disabilities received. Also, a supplemental questionnaire was administered to the special education teacher of children who had an Individualized Education Plan (IEP). Copies of the parent and teacher instruments and the special education teacher questionnaires can be downloaded from the Instruments and Assessments page of the ECLS-K website.

Can you use the ECLS-K data to produce estimates that are nationally representative of school and teacher characteristics?

Yes, but only the ECLS-K kindergarten data will support such estimates. The base-year (i.e., kindergarten) school sample is nationally representative of schools that educate kindergartners. A separate school-level data file with school weights is included on the longitudinal kindergarten through eighth grade data file. During the kindergarten year, the ECLS-K sampled all kindergarten teachers in each of the ECLS-K schools. Data from this nationally representative sample of kindergarten teachers also are available as a separate file with teacher weights on the longitudinal kindergarten through eighth grade file. The ECLS-K data do not, however, support teacher-level or school-level estimates in first, third, fifth, or eighth grades. After kindergarten, teachers and schools were only included in the study if they educated one or more ECLS-K children. Therefore, no teacher-level or school-level weights are provided after the base year.

Who's included in the child, teacher, and school catalogs? Why are there teachers and schools in the child catalog, but there are no teachers and students in the school catalog? Why are there schools in the teacher catalog that are not in the school catalog?

In the base year, the ECLS-K was representative at three levels—kindergartners (i.e., the child level), kindergarten teachers, and schools educating kindergartners. The longitudinal kindergarten through eighth grade electronic codebook (ECB) contains a catalog pertaining to each of these levels. The child catalog is longitudinal and contains data for all children who participated in the kindergarten year, as well as data for all subsequent rounds of data collection. The teacher catalog is cross-sectional and contains information for the representative sample of kindergarten teachers only for the kindergarten round of data collection. The school catalog also is cross-sectional and contains information for the representative sample of schools educating kindergartners, again only for the kindergarten round of data collection.

Within the representative sample of schools educating kindergartners, kindergarten teachers were selected for the teacher sample, regardless of whether any children were sampled from their classrooms for the child sample. Thus, there are teachers in the teacher catalog who are not represented in the child catalog. Conversely, there are teachers in the child catalog who are not in the teacher catalog, because some children changed teachers during the kindergarten year and their new teachers were not part of the representative sample of teachers. Similarly, the school catalog contains only those schools that were sampled as part of the representative sample of schools educating kindergartners and that had a completed school administrator questionnaire. The school-level file does not contain those schools that children moved into but were not part of the initial representative sample of schools educating kindergartners. Data collected from children's new schools were, however, included in the child-level file.

A user might see school IDs on the teacher file that are not on the school file. This is because, while these schools were part of the representative sample of schools with kindergartens, they did not have a completed school administrator questionnaire.

Thus, users interested in creating teacher/classroom-level files or school-level files based on presence of child data, regardless of whether they were part of the representative teacher or school sample, should use the child-level file.

How are the data collected in the fall and spring kindergarten teacher Part B questionnaires presented in the child and teacher catalogs?

In the fall of kindergarten (round 1), teachers were asked about their characteristics and the characteristics of their classroom in Teacher Questionnaire, Part B (TQB). Teachers who were added to the study in the spring of kindergarten (either in a school that joined the study in round 2 or for a child who had a new teacher in round 2) were administered a similar version of the TQB in the spring (round 2). Teachers who answered TQB in the fall did not complete the spring version, so for any one teacher there is only one set of TQB data.

In the base year teacher file all of the TQB items have the B1 prefix, regardless of the round in which the data were collected. Two flags indicate the round in which the data were collected. If the flag B1TQUEX equals 1, the data were collected in the fall; if B2TQUEX equals 1, the data were collected in the spring.

In the longitudinal child file there are two sets of TQB data, one set beginning with the B1 prefix and one set beginning with the B2 prefix. The B1 items pertain to the teacher linked to the child in the fall [with variable T1_ID] and the B2 items pertain to the teacher linked to the child in the spring [with variable T2_ID]. The majority of the children have the same teacher in the fall and spring, so for these children their case-level information for the B1 TQB items is identical to their case-level information for the for B2 TQB items. For children who changed teachers during the year, these two sets of data are different because they come from different teachers [you can determine whether children changed teachers by comparing T1_ID with T2_ID or by looking at variable FKCHGTCH]. For cross-sectional analyses, analysts should use the TQB items from the time period for which they are doing analyses (i.e., B1 items for fall kindergarten; B2 items for spring kindergarten). For information that reflects the kindergarten experience as a whole, analysts might choose to limit their analysis to children whose teacher remained constant across the year.

Note:

  • The Spring Kindergarten TQB contains a subset of items asked in the Fall Kindergarten TQB. However, the variable structures for the B1 and B2 variables on the child-level data file are parallel; that is, all Fall Kindergarten TQB variables (B1 variables) have a corresponding Spring Kindergarten TQB variable (B2 variables) in the Electronic Codebook (ECB) and in the resulting data file.
  • Some questions asked of teachers in the Fall Kindergarten TQB were not asked of teachers new to the study in the Spring Kindergarten TQB. For children who had the same teacher in fall and spring, the information from the B1 variable pertaining to such a question is carried forward to the B2 variable. Children who have teachers who are new to the study in the spring (i.e., they are in schools that joined the study in the spring or they changed teachers between fall and spring) do not have information pertaining to these questions in the spring. Data for these measures for these children are coded either as system missing or not ascertained, depending on whether or not his/her teacher responded to the survey.
  • On the Fall Kindergarten TQB, question 1 asks teachers about time spent in whole class activities, small group activities, individual activities, and child selected activities. This question was not repeated on the Spring Kindergarten TQB, but was asked on the Spring Kindergarten TQA (question 8). Therefore, there are three sources of information from kindergarten on time spent in whole class activities, small group activities, individual activities, and child selected activities [Fall Kindergarten TQB: B1WHLCLS; B1SMLGRP; B1INDVDL; B1CHCLDS] [Spring Kindergarten TQA: A2WHLCLS; A2SMLGRP; A2INDVDL; A2CHCLDS] and [Spring Kindergarten TQB: B2WHLCLS; B2SMLGRP; B2INDVDL; B2CHCLDS]. Since this question did not appear in the Spring Kindergarten TQB, the B2 variables are simply the responses that teachers had provided on the fall TQB. The A2 variables present the information that best reflects the spring kindergarten time period.

Is it possible to compute the elapsed time period between two assessments?

It is possible to calculate the elapsed time between two direct assessments for a child using variables in the Electronic Codebook (ECB). For each direct assessment, there are corresponding variables that indicate the month, day, and year in which the direct assessment was administered. For instance, in round 1 the assessment date variables are: C1ASMTMM (C1 Assessment month), C1ASMTDD (C1 Assessment day), and C1ASMTYY (C1 Assessment year-4 digits). To calculate the elapsed time between two assessments for a child, one can use the assessment date variables from the two rounds of interest to determine the number of days between the two direct assessments.

The ECB also includes composite variables for children's age at assessment at each direct assessment time point (e.g., R1_KAGE for Round 1 Composite child assessment age, in months). These variables are based on the children's date of birth and the date on which they were assessed. In some cases, there are discrepancies in the age at assessment variables due to masking of variables for the public-use file or improvements in the date of birth variables collected in earlier rounds of data collection. Since this is the case, we recommend that users calculate the elapsed time between assessments using the method described above, rather than use the composite assessment age variables on the public-use data file.

Below are examples of SPSS and SAS code that can be used to calculate elapsed time between direct assessments:

SPSS Code:
COMMENT Calculate elapsed time between R1 and R2
COMMENT convert 3 variable assessment date into a one variable assessment date.
COMPUTE date1=DATE.DMY(c1asmtdd, c1asmtmm, c1asmtyy).
FORMATS date1(DATE11).
VARIABLE WIDTH date1(11).
EXECUTE.

COMPUTE date2=DATE.DMY(c2asmtdd, c2asmtmm, c2asmtyy).
FORMATS date2(DATE11).
VARIABLE WIDTH date2(11).
EXECUTE.

COMMENT calculate elapsed time between R1 and R2 assessments, in days.
COMPUTE elapse = (date2 - date1)/86400.
EXECUTE.

SAS Code:

/* EXAMPLE - Calculate elapsed time between R1 and R2*/
data new file;
set original file;
/*
Because C1ASMTMM , C1ASMTDD, C1ASMTYY are numeric values, only the SAS function is needed to convert it to SAS date value, which can then be extracted.
*/
date1=mdy(C1ASMTMM , C1ASMTDD, C1ASMTYY);
date2=mdy(C2ASMTMM , C2ASMTDD, C2ASMTYY);
diff=abs(date2-date1);
run;

I noticed that in the data file not all cases have data for the round 4 school administrator questionnaire (SAQ) variables. What do I do?

Schools that had already completed the school administrator questionnaire (SAQ) in round 2 were given a modified repeat school SAQ in round 4, whereas schools that were new to the study in round 4 were asked to complete an SAQ for new schools that collected more information than the modified SAQ (and was very similar to the SAQ used in round 2). The questions that were not in the repeat school SAQs (e.g., the grade levels included in the school, how many students the school site is designed to accommodate, what grades are tested with standardized tests) had already been asked in round 2, and variables associated with these questions are on the data file as S2 variables.

For those SAQ questions that were asked at round 2 but not at round 4 for repeat schools, a user can pull forward the data collected from round 2 to have a complete set of round 4 variables. For children who did not change schools between rounds 2 and 4, their round 2 child-level S2 variables can be used in analyses at round 4. Care should be taken to not replace updated information collected at round 4 with round 2 data (i.e., this procedure should only be used for data that were not collected at round 4). However, for children who changed schools between rounds 2 and 4 and moved into a repeat school, their own S2 data cannot be used at round 4. For these children, an analyst can use the child's round 4 school ID (S4_ID) to find another child who attended that school (i.e., had the same school ID) in round 2 and then pull that other child's round 2 school data forward.

When I try to download the online kindergarten through eighth grade data from https://nces.ed.gov/ecls/dataproducts.asp, I receive a message saying that the file is corrupted. What troubleshooting steps do you recommend?

The following are the steps that should be taken in order for an ASCII file to be created that contains all the data from kindergarten through eighth grade. (The procedures work the same for the smaller files with the base year school and teacher data.)

  1. Click on Childk8p.z01 from NCES's website. A prompt will appear that asks whether you want to "find," "save," or "cancel." Select "save" and save the file to the desired location on your computer. It may be helpful to create a specific folder in which to save all the files you will need to download.
  2. Repeat step 1 for Childk8p.z02, Childk8p.z03, Childk8p.z04, Childk8p.z05, and finally, the Childk8p.zip file, making sure to place the files in the same folder as that containing Childk8p.z01.
  3. Go to the location where you saved all your files and double-click the Childk8p.zip file.
  4. The WinZip dialogue box should open. Select the Childk8p.zip file and then select extract.
  5. A dialogue box may open that asks where you want WinZip to extract to. By default, it should be the folder in which you saved the files. If it is not, change the directory to the location where you would like the files to be located and select extract.
  6. The extraction process should begin, and after it is complete, you should see the childk8p.dat file. This is the ASCII file with all the data in it.

The above-listed procedures for opening files do not seem to work if you use Microsoft Window's default Extraction Wizard along with some free unzipping programs (other than WinZip). Many data users find that WinZip is needed to open the online data properly, although Extraction Wizard may be able to open the smaller zipped folders (e.g., the school and teacher data).

For computers with slower connection speeds, it is possible that large files such as these may become corrupted during the lengthy time it takes to download them. If your computer has a slower connection speed, try the above procedures using a computer with a faster connection time.

Please keep in mind that the public-use ECLS-K data are also available online through NCES's Online Codebook (https://nces.ed.gov/OnlineCodebook).

I saved my taglist in the ECB and used the “Extract” function to save a file with the variables I tagged, but when I try to open the file in SPSS/SAS/Stata, I don't see any data. What's wrong?

The ECB does not create a data file. Rather, the ECB creates syntax code that must be run in a statistical software package to generate a data file. The syntax file reads in raw data from the ASCII data file (the file with a .dat extension). In the ECB, there are two “save” steps in the “Extract” procedure. The first step saves the syntax file. In the second step, there is no file that is actually saved. Instead, this step writes a line of code in the syntax file indicating what to name the data file once the syntax file is run and a data file is generated.