Summary Measures from Integrated Public Use Microdata Series (IPUMS USA) Microdata

Relevant indicators:

  • Nativity and ancestry
  • Population growth
  • Median age
  • Wages: Median
  • Wages: $15/hr
  • Poverty
  • Working poor
  • Unemployment
  • Income growth
  • Income inequality
  • Homeownership
  • Educational attainment
  • Disconnected youth
  • Commute time
  • Housing burden
  • Car access
  • Economic gains: Racial equity in income
  • Economic gains: Eliminate rent burden

About IPUMS microdata

Although a variety of data sources were used to produce the indicators on the Atlas, many are based on a unique dataset created using microdata samples (i.e., “individual-level” data) from IPUMS USA for several points in time: 1980, 1990, 2000, and five-year pooled files for 2010 and later (e.g. 2006 through 2010 for 2010, 2013 through 2017 for 2017). While the 1980 through 2000 files are based on the decennial census and cover about 5 percent of the U.S. population each, the five-year pooled files are from the ACS and cover only about 1 percent of the U.S. population in each year of the pool. Five years of ACS microdata pooled together were used for years after 2000 to improve the statistical reliability and achieve a sample size that is comparable to that available in previous years. Compared with the more commonly used census “summary files,” which include a limited set of summary tabulations of population and housing characteristics, use of the microdata samples allows for the flexibility to create more illuminating metrics of equity and inclusion, and provide a more nuanced view of groups defined by age, race/ethnicity, and nativity in each region of the United States.

A note on sample size

While the IPUMS microdata allows for the tabulation of detailed population characteristics, it is important to keep in mind that because such tabulations are based on samples, they are subject to a margin of error and should be regarded as estimates—particularly in smaller regions/cities and for smaller demographic subgroups. In an effort to avoid reporting highly unreliable estimates, we do not report any estimated ratios or measures of central tendency (e.g., means and medians) that are based on a universe of fewer than 100 individual survey respondents. For example, the universe for the unemployment rates reported for Latinos by the Unemployment indicator is the Latino civilian noninstitutional population ages 25-64, so unemployment rates are only reported for Atlas geographies that have at least 100 actual survey respondents who fall in this particular universe. However, even with this restriction in place, users should not assume that small differences in indicator values between demographic subgroups are statistically significant.

Geography of IPUMS microdata

A key limitation of the IPUMS microdata is geographic detail. Each year of the data has a particular lowest level of geography associated with the individuals included, known as the Public Use Microdata Area (PUMA) for years 1990 and later, or the County Group in 1980. PUMAs are generally drawn to contain a population of about 100,000, and vary greatly in geographic size from being fairly small in densely populated urban areas, to very large in rural areas, often with one or more counties contained in a single PUMA.

While not a challenge for producing state-level data (as PUMAs do not cross state boundaries), summarizing IPUMS data at the city and regional levels is complicated by the fact that PUMAs do not neatly align with the boundaries of cities and metropolitan areas. Rather, large cities and metropolitan areas have several PUMAs entirely contained within the core of the city (or metropolitan area) but several other, more peripheral PUMAs straddling the metropolitan area boundary.

PUMA-to-region and PUMA-to-city crosswalks

To create a geographic crosswalk between PUMAs and metropolitan areas, and between PUMAs and cities for the 1980, 1990, 2000, 2010, and later years, we used the approach described below. For simplicity, the description refers only to the PUMA-to-region crosswalk but the same procedure was used to generate the PUMA-to-city crosswalk as well. We first estimated the share of each PUMA’s population that fell inside each metro area using population information specific to each year from Geolytics, Inc. expressed at the 2000 census block group level of geography (for years 1980 through 2000) or from the ACS 5-year summary file (for years 2010 and later) expressed at contemporaneous block group geographies. If the share was at least 50 percent, then the PUMAs were assigned to the metro area and included in generating our regional summary measures. For most PUMAs assigned to a region, the share was 100 percent. For the remaining PUMAs, however, the share was somewhere between 50 and 100 percent, and this share was used as the “PUMA adjustment factor” to adjust downward the survey weights for individuals included in such PUMAs when estimating regional summary measures. Finally, we made one final adjustment to the individual survey weights in all PUMAs assigned to a metro area: we applied a “regional adjustment factor” to ensure that the weighted sum of the population from the PUMAs assigned to each metro area matched the total population reported in the official census summary files for each year/period. The final adjusted survey weight used to make all metro-area estimates was thus equal to the product of the original survey weight in the IPUMS microdata, the PUMA adjustment factor, and the regional adjustment factor.

Because the PUMAs used to generate summary measures for metro areas and cities do not always correspond perfectly to their geographic boundaries, all such summary measures are subject to some degree of geographic—and therefore statistical—inaccuracy in describing metro area/city social and demographic characteristics. In order to quantify the degree of any such inaccuracies, we developed measures of both the “geographic fit”—that is, how well the PUMAs assigned to a metro area/city line up with the actual metro area/city boundaries—and “demographic fit”—that is, how well demographic measures summarized from the IPUMS microdata line up with the same measures derived from sources for which there is no geographic mismatch (such as census summary data). These measures of geographic and demographic fit were examined and used to withhold the reporting of IPUMS-based data in the Atlas for a handful of cities in 1990 and 2000, and to flag other cities and metro areas for which data in the Atlas should be used with extra caution.

To measure geographic fit, we calculated three measures: the share of the metro area/city population in each year that was derived from PUMAs that were 80 percent, 90 percent, and 100 percent contained in the metro area/city (based on population). For example, a metro area/city with perfect geographic fit would be one in which 100 percent of the population was derived from PUMAs for which 100 percent of the PUMA population was contained in that metro area/city. For brevity in the discussion below, we refer to such as metro area/city as having 100 percent of its population from 100-percent-contained PUMAs. A metro area/city of dubious geographic fit thus might be one in which zero percent of its population was from 80-percent-contained PUMAs (indicating that all of the PUMAs assigned to it were somewhere between 50 and 80 percent contained since a PUMA must be at least 50 percent to be assigned to the metro area/city in the first place). For most cities and metros in each year, the population shares from 80-, 90-, and 100-percent contained PUMAs are near 100 percent, with the vast majority at or above 80 percent. For others, however, these measures of geographic fit fell below 80 percent, and these cities and metros are regarded as having “poor geographic fit.”

To measure demographic fit, we then summarized the IPUMS microdata for the 150 largest metro areas and the 100 largest cities to calculate three demographic measures for each year that are related to the sorts of demographic measures included in the Atlas, but far simpler: the percentage people of color, the poverty rate, and the percentage immigrant. We then calculated the same three measures for all 250 geographies in each year using summary data from Geolytics, Inc. for 1980 through 2000 and from the corresponding 5-year ACS summary files for years 2010 and later, and compared the estimates for each measure derived from the microdata to the corresponding measure derived from the summary data. Because we would expect some degree of variation between the two estimates even for metro areas/cities with perfect geographic fit due to sampling differences between the census microdata and census summary data for all years, as well as any additional variation caused by the estimation procedures applied by Geolytics, Inc. to re-allocate data to 2010 census geographies for 1980 through 2000, we calculated the maximum absolute difference (and percentage difference) between the two estimates for each of the three measures across the set of metro areas/cities with perfect geographic fit and used these as benchmarks to assess the degree of demographic fit for the remaining metro areas/cities. If the absolute difference (or percentage difference) in any of the three demographic measures for a metro area/city was well beyond the benchmark value, we identified the metro area/city as having “poor demographic fit.”

This table provides information on the quality of the demographic and geographic fit of the IPUMS microdata for all of the 100 largest cities and 150 metro areas included in the Atlas. The share of the city/metro area population from 80-percent-contained PUMAs in each year is reported in the table, and the quality of the demographic fit is indicated by the shading of the cells in the table. Also indicated in the table is whether or not IPUMS-based data is included in the Atlas for each geography/year. While data is included in the Atlas for all of the 150 largest metro areas in all years, for some of the 100 largest cities in some years IPUMS-based data is excluded from the Atlas—either because there was a poor demographic fit or because there were no PUMAs with at least 50 percent of their population residing in the city (and thus no PUMAs were assigned to the city).​​​