Educational attainment
Summary: The educational attainment levels of the working-age population (ages 25-64). Data for 2010 and 2022 represent five-year averages (e.g., 2018-2022).
Data Source(s): Integrated Public Use Microdata Series, IPUMS USA, University of Minnesota, www.ipums.org, 1990 and 2000 5% samples, 2010 and 2022 American Community Survey 5-year samples.
Universe: All people ages 25 through 64.
Methods: The number and percentage of people ages 25-64 by level of educational attainment was calculated by race/ethnicity, gender, nativity, and ancestry for each year and geography. For the map breakdown, census tract level data from the 2020 5-year American Community survey was re-estimated into 2010 census tract boundaries using a 2020 to 2010 census tract geographic crosswalk developed using 2020 block level population data from the 2020 Census Redistricting Data along with a block level geographic crosswalk (2020 to 2010 blocks) from NHGIS. See the methodology page for other relevant notes.
Notes:
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- The high school diploma category of education includes those with an actual high school diploma as well as high school equivalency or a General Educational Development (GED) certificate.
- Data for 2010 and 2022 represent 2006-2010 and 2018-2022 averages, respectively.
Disconnected Youth
Summary: The share of the population ages 16 to 24 who are not working or enrolled in school. Data for 2010 and 2022 represent five-year averages (e.g., 2018-2022).
Data Source(s): Integrated Public Use Microdata Series, IPUMS USA, University of Minnesota, www.ipums.org, 1990 and 2000 5% samples, 2010 and 2022 American Community Survey 5-year samples.
Universe: All people ages 16 through 24.
Methods: The number and percentage of disconnected youth, among all youth ages 16 through 24 was calculated by race/ethnicity, gender, nativity, and ancestry for each year and geography. See the methodology page for other relevant notes.
Notes:
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- People in the armed forces during the time of the survey are considered to be employed (i.e., not disconnected).
- Data for 2010 and 2020 represent 2006-2010 and 2018-2022 averages, respectively.
School poverty
Summary: The percentage of students attending public elementary and secondary schools by school poverty level. The year indicated is the latest of a given school year (e.g., 2023 refers to the 2022-2023 school year). School poverty levels are defined by the share of students eligible for free or reduced-price lunch (FRPL) and include: "Low" (less than 25% FRPL), "Mid-low" (25-50% FRPL), "Mid-high" (50-75% FRPL), and "High" (greater than 75% FRPL). It is important to note that the Healthy, Hunger-Free Kids Act of 2010 changed eligibility requirements and this can impact comparability of the school poverty data over time.
Data Source(s): National Center for Education Statistics, Common Core of Data, Public Elementary/Secondary School Universe Survey.
Universe: All students attending public schools.
Methods: The share of students eligible for free or reduced-price lunch (FRPL) was calculated at the school level for all public elementary and secondary schools. Schools were then classified into four groups – school poverty level categories – based on this share (low, mid-low, mid-high, and high), and the number and shares of students by school poverty level category were aggregated to the various Atlas geographies for each racial/ethnic group. For the vast majority of schools, the total student count is consistent with the sum of the counts by race/ethnicity. For a small number of schools, however, it is slightly higher given that the latter excludes any students belonging to an unknown or non-CCD race category. For this reason, data for all racial/ethnic groups combined (the "All" category) reported in the Atlas is based on the sum of student counts by race/ethnicity.
It is important to note that the measure of school poverty used, the share of students eligible for FRPL, is not always reported and is subject to some degree of error at the school level. The reasons for this include the fact that the count of students deemed FRPL-eligible may be taken at a different time than the total student count, and in some states, a single school may administer the free lunch program for a group of schools (in which case its count and share of FRPL-eligible students would be overstated). However, it is likely that any bias caused by these inconsistencies in reporting at the school level are largely mitigated once the data is aggregated across the many schools in a given Atlas geography. It is also important to note that the Healthy, Hunger-Free Kids Act of 2010 changed eligibility requirements and this can impact comparability of the school poverty data over time. In particular, the Act introduced the Community Eligibility Option (CEO), available in 11 states (including the District of Columbia) by the 2013-14 school year and in all states in the 2014-15 school year, which allows more children to be eligible for FRPL.
Given the prevalence of missing data for some schools and changes to eligibility requirements in recent years, we took precautions to avoid reporting data that are inaccurate or misleading. First, we do not report school poverty information if 10 percent or more of the relevant student population attends schools that do not report valid (non-missing) FRPL eligibility data. Due to the impact of the COVID-19 pandemic on school reporting of FRPL eligibility data, this restriction impacts the data we report for the 2020-2021 and 2021-2022 school years the most, when the share of all students in schools reporting FRPL eligibility data nationwide fell to about 82 percent. By the 2022-2023 school year, the share increased to 90 percent, but was still below pre-pandemic levels of around 94 to 99 percent. Second, after making an initial calculation of the overall share of students eligible for FRPL based on available data for the 2009-10 through 2022-23 school years, we examined changes in this measure over time for all 731 Atlas geographies and noted any dramatic year-to-year changes. School poverty data for some Atlas geographies in certain years were set to missing based on this examination. See the methodology page for other relevant notes.
Notes:
- Disaggregated data for Asian Americans and Pacific Islanders is only available for 2010 and 2023.
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- Data are not reported if 10 percent or more of the relevant student population attend schools that do not report valid (non-missing) FRPL eligibility data.
- The year indicated is the latest of a given school year (e.g., 2023 refers to the 2022-2023 school year).
Life expectancy
Summary: Estimated life expectancy at birth based on abridged life tables constructed from mortality data by race/ethnicity and gender. The "years above average" measure in the ranking and map breakdowns reports the difference in life expectancy between a given racial/ethnic group and the overall population in a given geography, and therefore appears to be zero/missing when the race/ethnicity filter is set to "all." Data for each year represent an average over the previous five years (e.g., 2020 is a 2016-2020 average).
Data Source(s): Centers for Disease Control and Prevention, CDC WONDER, wonder.cdc.gov.
Universe: All people.
Methods: Life expectancy at birth by race/ethnicity for each year and geography was estimated using information on mortality and mid-year population estimates from the WONDER database using an abridged life tables approach. A life table is a table that includes the number of deaths, total population, probability of dying, and remaining life expectancy by single year of age for a given year or time period. Abridged life tables are similar, but present the information for age groups rather than by single year of age. Remaining life expectancy for each age group is largely a function of the probability of dying for people in their own age group and in older age groups. Due to non-disclosure of death counts for some age group cells by race/ethnicity, geography, and year, death rates were at times substituted from higher levels of geographic aggregation. Given this, measures were taken to avoid reporting unreliable estimates – that is, estimates with too many substitutions. Specifically, we only report estimates for which at least 90 percent of the total number of deaths for a population are from age groups that had disclosed death counts in the underlying data and did not require substitution of death probabilities from higher levels of geography. We also only report estimates based on at least 100 total deaths (for all age groups combined). See the methodology page for other relevant notes.
Notes:
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- No data is available for the Mixed/other population or for all people of color combined.
- No data is reported unless at least 90 percent of the total number of deaths for a population are from age groups that had disclosed death counts in the underlying data and did not require substitution of death probabilities from higher levels of geography.
- No data is reported unless based on at least 100 total deaths (for all age groups combined).
Automation risk
Summary: Estimated average automation risk associated with a worker’s occupation. Automation risk is a measure that attempts to capture the percentage of tasks associated with an occupation that can be automated, and thus reflects the likelihood of computerization of the underlying tasks for a given occupation.
Data source(s): Lightcast (formerly Burning Glass Technologies); Integrated Public Use Microdata Series, IPUMS USA, University of Minnesota, http://www.ipums.org/, 2019 American Community Survey 5-year sample.
Universe: The employed civilian noninstitutionalized population age 16 or older.
Methods: The American Community Survey microdata was filtered to employed civilian noninstitutionalized people age 16 or older. Nationwide data from Lightcast on an index of automation risk, or “automation factor,” along with occupational entry-level education requirements — both expressed at the SOC level — were matched. The number of employed civilian noninstitutionalized people age 16 or older and the estimated automation factor were then aggregated by race/ethnicity, gender, nativity, education requirements, and industry, for all states, regions, and the US overall. The automaton factor was estimated as a weighted average of the SOC-level factors, using the number of employed civilian noninstitutionalized people age 16 or older as weight.
The underlying automation factor is defined for an occupation as the percentage of tasks that can be computerized (i.e., automated). Is it drawn from a 2013 paper, The Future of Employment: How Susceptible Are Jobs to Computerisation, by Carl Benedikt Frey and Michael A. Osborne. The data reported on the number of jobs at risk of automation are calculated based on the assumption that the percentage of tasks within an occupation that can be automated approximates the percentage of jobs that are at risk of automation.
Notes:
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- No data are available for cities or counties.
Future-ready jobs
Summary: Percentage of workers with a job that is future ready. Future-ready jobs are those that provide stable, family-supporting incomes for workers and strong prospects for employers and communities.
Data source(s): Lightcast (formerly Burning Glass Technologies); Integrated Public Use Microdata Series, IPUMS USA, University of Minnesota, http://www.ipums.org/, 2019 American Community Survey 5-year sample.
Universe: The employed civilian noninstitutionalized population age 16 or older.
Methods: We define future readiness based on three criteria: (1) stable or growing employment, (2) resilient to automation, and (3) paying a living wage. A job is considered future-ready if it meets all three criteria. A job is considered not future-ready if it lacks one or a combination of the three criteria.
Whether a job is stable or growing is determined using Lightcast’s job growth data. Automation risk is defined as having a probability of computerization (or “automation factor”) lower than 50 percent, given the full array of tasks that comprise a particular job. The automation factor is based on automation risk associated with each occupation from a 2013 paper, The Future of Employment: How Susceptible Are Jobs to Computerisation, by Carl Benedikt Frey and Michael A. Osborne. Lastly, the living wage criteria is a function of the average wage of an occupation and the cost of living in each geographic area. Jobs are determined to pay a living wage when the average wage is sufficient to sustain a family comprised of two working adults and two children.
The projection of future-ready jobs is extracted from a proprietary dataset from Lightcast based on its job counts data. Lightcast uses the four most recent Quarterly Census of Employment and Wages (QCEW) datasets from the Bureau of Labor Statistics to calculate current employment. Then, job counts of future years are projected based on past trends. Various adjustments are made to the job growth projection using additional data sources including the National Industry-Occupation Employment Matrix (NIOEM) and state-level projection data. See here for a detailed explanation of Lightcast’s projection methodology. Estimated job growth by race/ethnicity and gender assumes that current staffing patterns of detailed occupations remain constant going forward.
Notes:
- Latinos include people of Hispanic origin of any race and all other groups exclude people of Hispanic origin.
- No data are available for cities or counties.
Policing in schools
Summary: The number and rate (per 10,000 students) of arrest events and referrals to law enforcement of students enrolled in public elementary and secondary schools by grade level and classroom size. Classroom sizes are defined at the school level by the ratio of students to teachers and whether they fall more than one standard deviation above the mean, relative to the state. The year indicated is the latest of a given school year (e.g., 2018 refers to the 2017-2018 school year).
Data Source(s): National Center for Education Statistics, Common Core of Data, Public Elementary/Secondary School Universe Survey; US Department of Education, Office for Civil Rights, Civil Rights Data Collection, 2017-2018 Enrollment File and Law Enforcement Referral File. All school level data was cleaned and provided by Civilytics Consulting prior to analysis and aggregation by the National Equity Atlas team.
Universe: All students attending public schools.
Methods: An underlying school level file, including all public elementary and secondary schools, merging data from the National Center for Education Statistics Common Core Data (CCD) and the Civil Rights Data Collection (CRDC) was provided to the Atlas team by Civilytics Consulting (https://www.civilytics.com). The file included both regular public schools as well as special education, vocational education, alternative, charter, magnet, juvenile justice facilities, and Title 1-eligible schools.
While the most important information required to develop the indicator, including the number of students, arrests, and referrals to law enforcement by race/ethnicity and gender, is drawn from the CRDC data, the CCD data was used to attach school level information on geographic location (latitude and longitude), number of full-time equivalent (FTE) teachers, grade levels served, and total number of students. The latitude and longitude information were used to geocode schools to accurately assign them to each Atlas geography, while the other fields were used to define broad classroom sizes and grade levels, validate how well the CRDC data represents the full population of students enrolled in public schools, as described below.
Because the two underlying sources (CRDC and CCD) are produced by different entities, achieving a perfect match between the two files is not as easy as one might expect. Civilytics Consulting implemented techniques to match schools between the two files when a simple match based on standard school identifiers could not be made, using fields that were common in both files such as the school’s name, the local education agency identification numbers (LEAIDs), and the local education agency names. Ultimately, among the 97,632 schools represented in the CRDC data, only 441 could not be matched to the CCD data for the 2017-2018 school year. Those 441 schools represent 0.5 percent of all schools in the CRDC data (or 0.4 percent of all students) and are excluded from the analysis.
Classroom sizes were calculated at the school level by dividing the count of total enrolled students by full-time teachers (both CCD variables). When setting the thresholds to define “larger” versus “smaller” classroom sizes, we took into consideration the substantial variation in class sizes by state. We observed that a disproportionate share of the "crowded" schools were in a handful of states (e.g., California, Oregon, Nevada, and Arizona), so using an absolute threshold to define larger versus smaller classroom sizes would lead to some states with virtually all schools defined as having “larger” class sizes and other states with most all schools defined as having “smaller” class sizes. We also appreciate that this state level variation is in part shaped by different state level policies and legal frameworks that impact the distributing funding across schools and districts within each state. To achieve a better balance between schools with “larger” and “smaller” classroom sizes in each state and adjust the definition of classroom size for state level variation, we defined schools as having “larger” classroom size if they fell more than one standard deviation above the mean across all schools in a state and defined all other schools as having “smaller” classroom size.
Grade level was defined based on information from the CCD about the highest and lowest grades offered at each school. The available codes in the data included prekindergarten, primary, middle, high, other, ungraded, adult education, secondary, missing/not reported, and not applicable. Most schools (93 percent) fell in the most common school levels of primary, middle, and high school/secondary. These levels were used to define grade levels for which data are reported (with data for high and secondary schools combined into the “high schools” category). The rest of the schools are included in the data reported for “all public schools” on the Atlas.
To ensure that data we report is accurate and reflective of the full public school student population, we took measures to avoid reporting data that could be misleading or incomplete. First, we only report data by classroom size if at least 90 percent of the relevant student population (e.g., by race/sex/grade level) attend schools with valid class size information. Nationwide, about 5.5 percent of schools (representing 3.1 percent of students) had missing classroom size information—but the extent of the missing information varies by geography. This constraint on reporting helps ensure consistency between data we report for all schools combined in each geography and by class size. Notably, no class size information is available for any of the schools in Utah, so data is only available for all schools combined. Second, because the CCD data are generally considered to be more accurate and reliable than the CRDC data, we only report any data for any geography if the total enrollment from the CRDC data is at least 90 percent of CCD enrollment. This is done to ensure that the CRDC data is adequately representative of the full student population in each geography.
Despite these efforts to ensure the accuracy and consistency of the data we report, there are many known issues with the CRDC that are worth highlighting. These include differences across Local Education Agencies (LEAs) in how they define and report arrests and referrals to law enforcement, suppress data by LEAs due to data quality concerns, and errors in reporting, among other issues. One egregious example of this is how data reported by New York City Public Schools was suppressed due to data quality issues. This results in the data we report for New York City (and its constituent counties/boroughs) showing only nine arrests, which is not plausible given that the district had enrollment of over one million students and reported about 3,000 referrals to law enforcement. Due to data quality issues, users should interpret the data with caution and hold local and state education officials responsible for improving data quality. See the methodology page for other relevant notes.
Notes:
- Due to data quality issues, users should interpret the data with caution.
- For display purposes, zero values were set to missing to prevent the appearance of zeros in cases where data is unavailable, excluded, or where there are no cases.
- No data are reported by classroom size unless at least 90 percent of the relevant student population (e.g., by race/sex/grade level) attend schools with valid class size information.
- No data are reported unless total enrollment from the CRDC data is at least 90 percent of CCD enrollment.
- No data are available by class size for the state of Utah and its constituent geographies.