Who Gets Counted as Indigenous? A Deep Dive into Methods and Data Challenges
By Ryan Fukumori
Introduction
In extending our research series on metropolitan rental affordability to focus on Indigenous residents across the US, the National Equity Atlas team altered our existing methodology in how we identify and categorize Indigenous people. Indigenous communities’ unique histories and circumstances shape how researchers collect, process, and report data on Indigenous communities, as well as the challenges that researchers face. As we will discuss, locating Indigenous people in the survey data is not just a technical consideration about how to organize the data but also a process that requires understanding the social and cultural trends among people who identify themselves as Indigenous on tools like the American Community Survey (ACS).
As discussed in our primary analysis, the National Equity Atlas research team is not comprised of experts on Indigenous communities in the United States. The qualitative insights on Indigenous people we share, both in the main analysis and in this discussion of methods, were informed by conversations with leading Indigenous advocates and scholars across the US and from our review of the Indigenous experts cited in the text. As researchers and data scientists, however, we are well-equipped to explain the tools used to collect Indigenous population data, and the extent to which these tools are sufficient to serve that purpose.
To discuss how we use tools like the ACS to produce disaggregated data on Indigenous residents, we should review the entire survey process, from individual residents’ responses to research insights:
- Data Collection: Individual residents and households self-report their racial/ethnic identity and ancestry on population surveys like the American Community Survey. Because respondents can select multiple racial, ethnic, and ancestral groups and include write-in responses, the original, pre-cleaned data includes a very large number of different combinations of responses.
- Data Cleaning and Codification: Survey administrators like the US Census Bureau group the whole universe of individual responses into a smaller set of coded categories, and data distributors like IPUMS USA include these numerical codes in their raw population datasets. For instance, the US Census Bureau assigns three different codes torespondents who identify as both American Indian and white, those who identify as both American Indian and Latinx, and those who identify as American Indian only.
- Data Analysis: Researchers like the National Equity Atlas team design data methodologies that translate these coded categories into asmaller set of racial and ethnic groups (i.e., American Indian and Alaska Native, Asian, Black, Latinx, Multiracial, Native Hawaiian and Pacific Islander, Other, and white). As an example, most National Equity Atlas analyses group respondents who identify as both ”Native American” and “white” under the umbrella “Multiracial” category, while grouping people who identify as both “Native American” and “Latinx“ under the “Latinx” category.
Most of the data tools at the National Equity Atlas only address the third step as part of our methods: how we adapt the codified survey data into a set of accessible and familiar social categories. This is the only step in which our team has control over how the survey data is organized and displayed. However, a population analysis of Indigenous communities requires a thorough discussion of all three steps, including the upstream processes that are beyond the control of researchers. Trends in how people self-report their identity, as well as alterations to the US Census Bureau’s codification process in recent years, all factor into our quantitative findings.
Here, we explore all three steps of the survey data process in reverse order, starting with our own data methods and decisions around categorization. Each section outlines the particular methodological choices and social trends behind the numbers and how these real-world considerations might shape the numerical data in our analysis. We end on some takeaways and considerations for future research.
Determining Indigenous Racial Categories: Our Methods
This analysis uses IPUMS USA microdata from the 2016 and 2022 American Community Survey five-year estimates to determine Indigenous population sizes and median household incomes, based on respondents’ reported race, ethnicity, and/or ancestry. While previous National Equity Atlas analyses provide disaggregated data for American Indians and Alaska Natives (AIAN), the methods for this analysis differ in two main ways: 1) a more expansive and inclusive grouping of Indigenous residents; and 2) an incorporation of all multiracial Indigenous residents into the data sample.
Many Atlas analyses categorize American Indians and Alaska Natives as residents who only identify their race as AIAN on the American Community Survey, and not as any other additional racial or ethnic identities. Under that methodology, any AIAN residents who also identify as Latinx are categorized as Latinx, and AIAN residents who are not Latinx but also identify with another racial group fall under the Multiracial category. However, this analysis required a departure from this convention for a few key reasons.
Expanding the Category of “Indigenous”
This analysis includes all survey respondents who indicated that their race and/or ancestry includes one of the following identities:
- American Indians or Alaska Natives. Survey respondents were invited to write in their specific tribal affiliation for both ACS questions on race and ancestry.1 We include respondents who indicated that they are AIAN but did not write in a specific tribe. In addition to the blanket AIAN category, the US Census Bureau also codes data for 24 of the 574 federally recognized AIAN tribes with critical population masses, based on write-in answers. There are also about 400 federally unrecognized AIAN tribes who fall under the blanket category.
Besides native North Americans from the 49 continental US states, this analysis also includes people of Puerto Rican descent who indicate that they are AIAN. Many ethnic Puerto Ricans are variably descended from the mixture of European colonists, enslaved and emancipated Africans, and Indigenous Taínos who populated the island in the era of Spanish imperialism, prior to the start of US occupation in 1898. However, we only include Puerto Ricans who individually claim AIAN identity on the American Community Survey.
While primarily composed of native North Americans from the 49 continental US states, the AIAN category also includes Indigenous peoples from Mexico and the rest of Latin America. Beyond the Taínos, most Indigenous people in Latin America have not been directly subject to indefinite US occupation. However, many have endured deep histories of social and economic exclusion in their homelands, from the era of Spanish imperial rule to modern-day national independence. Indigenous Latin Americans in the US who do not speak Spanish are also subject to social marginalization in seeking critical supports and services.
- Chamorros, Micronesians, Native Hawaiians, Samoans, or other Pacific Islanders from the states in the Compacts of Free Association. While usually grouped under the racial category of “Pacific Islander,” the native peoples of Hawaiʻi, Guam, and American Samoa have been under US colonial occupation since the 1890s. The US annexed the Hawaiian islands in 1898, five years after a group of Western business interests, Protestant missionaries, and US military forces illegally overthrew the native constitutional monarchy of Queen Liliuokalani. In 1899, Spain ceded its colony of Guam to the US following Spain’s defeat in the Spanish-American War, and the US and Germany partitioned the Samoan islands into two colonial territories.
Meanwhile, the Pacific Islander states in Compacts of Free Association—the Federated States of Micronesia, the Marshall Islands, and Palau—are independent nations that have voluntarily agreed to receive US military defense and federal funds for social services. These island nations came under Japanese imperial occupation between World War I and World War II, after which the United Nations placed the islands under a trusteeship following Japan’s defeat. The US administered the trusteeship for several decades, until the islands successfully obtained national independence and entered into the Compacts—Micronesia and the Marshall Islands in 1986, Palau in 1994.
Indigenous Polynesians under US occupation or jurisdiction have not encountered the same legal structures and regulatory agencies governing the Indigenous peoples of the continental US, such as the Bureau of Indian Affairs. However, many Indigenous Pacific Islanders bear a historical relationship to US colonial expansion and territorial dispossession that is fundamentally similar to that of American Indian and Alaska Native nations on the mainland.2 While the Micronesian nations are independent, the enduring presence of US military bases on the islands is a constitutive feature of their local economy and governance.
Moreover, as with American Indian and Alaska Native residents, many Indigenous Pacific Islanders have voluntarily left their homelands in search of opportunity across the United States, especially in metropolitan areas. For instance, the Native Hawaiian diaspora in the mainland US now outnumbers the Native population living in Hawaiʻi itself. Because of these structural similarities, we considered it appropriate to include these particular Indigenous Pacific Islander communities in the analysis.
Our previous analysis of Asian American and Pacific Islander rental affordability in metro America included Chamorros, Micronesians, Native Hawaiians, and Samoans within the broader category of “Pacific Islander.” However, this analysis also included other Pacific Islander communities whose home countries are independent nations and/or decolonized territories without a history of US rule, such as Fiji and Tonga. This current analysis restricts the Pacific Islander population to the peoples whose ancestral lands were or are still under US occupation, jurisdiction, and/or protectorship.
Incorporating Multiracial Indigenous Residents
Besides including several Pacific Islander groups under the umbrella of Indigenous peoples, this analysis also differs from prior National Equity Atlas data tools by including all residents who identify as Indigenous, regardless of any other racial or ethnic identities they specify.
A more inclusive lens for Indigenous identity is necessary, especially in major metropolitan areas where Indigenous residents coexist with a diverse array of people. Despite the endurance of Indigenous segregation, exclusion, and displacement throughout US history, Native Americans have also been part of multiracial communities and families for centuries. Indeed, a majority of Native Americans and Native Hawaiians nationwide are multiracial or multiethnic, and a quarter of Latinx residents nationwide also identify as Indigenous when asked directly about their ancestry. Restricting this analysis to non-Latinx people who only identify as Indigenous would necessarily exclude the experiences of many Indigenous people, including many people who are active within their communities and governments.
Because so many Indigenous people identify as multiracial, restricting the analysis to non-Latinx people who only identify as Indigenous would also radically reduce population sample sizes in most of the metropolitan areas under study. Indigenous communities are generally smaller relative to other communities of color, so this reduction would likely remove many metropolitan areas from the dataset altogether due to insufficient data.
Including all multiracial Indigenous residents does not come without its own methodological challenges, as we discuss below. However, this more expansive set of criteria is essential to capture the full spectrum of Indigenous communities.
Changes and Challenges in the US Census Bureau’s Data Practices
Before researchers access American Community Survey microdata, the US Census Bureau cleans the raw dataset by grouping the millions of individual survey responses into a more manageable set of coded categories. This data cleaning process is crucial for write-in responses to questions on ancestry, occupation, and language, as well as questions on race, where respondents can select multiple options and therefore produce a large number of different response combinations. For example, a numerical code can identify all people who fall under the broad umbrella of “Multiracial,” while a separate code can distinguish these multiracial residents through specific combinations—AIAN and white, AIAN and Black, etc.
The US Census Bureau regularly updates its backend methodology for these codes, based on changing social circumstances and changes to the survey itself. Large-scale shifts in economic activity require new occupation codes; the ACS started tracking same-sex partnerships in the 2013 dataset; and sustained advocacy has led to the inclusion of Middle Eastern and North African (MENA) as its own racial and ethnic category, starting with the 2030 Decennial Census.
At times, changes to this coding methodology can create issues in comparing population data from the years before and after the change was implemented. Moreover, despite the thoroughness of its data reporting, the ACS does not capture some critical information about Indigenous communities, which in turn limits researchers’ ability to further disaggregate Indigenous populations in the survey data.
New Codifications for Multiracial Indigenous People
Starting with the 2020 Decennial Census and 2020 American Community Survey, the US Census Bureau changed its coding and reporting practices for respondents who identify as multiracial. This means the 2016 and 2022 ACS use slightly different survey methodologies, and the 2016 version might omit some multiracial Indigenous residents who are counted as Indigenous in the 2022 version.3
Compared to the 2016 ACS, the question about racial identity on the 2022 ACS (question #6 on both surveys) features a few more write-in fields, which now allow white and Black respondents to include any specific ethnic and ancestral origins in addition to their race. The Census Bureau also now uses a more detailed set of practices to turn these write-in responses into codes for specific groups:
- Incorporation of more write-in responses: Prior to 2020, a respondent would only be grouped within a particular racial identity if they checked its corresponding box. Starting with the 2020 version of the survey, however, a respondent could check the “White” box only but if they wrote “German and Cherokee” in the write-in field, the US Census Bureau would also identify them as AIAN even if they had not checked the AIAN box.
- Expanded multiracial identity: Prior to 2020, for any respondents who included a write-in response to their race, the Census Bureau converted their responses into codes that only captured a maximum of two of those identities. Thus, some respondents to the 2016 ACS might not be counted as Indigenous if they included more than two groups in a write-in field for their racial identity. By contrast, since 2020, the ACS codes capture up to six write-in responses for race and ethnicity, ensuring that more multiethnic Indigenous people are accurately captured.
- Expanded inclusion of Indigenous identity within Latinx/Hispanic origin: Similarly, prior to 2020, for any respondents who included a write-in response to the question about Latinx/Hispanic origin, the Census Bureau converted responses into codes that only captured a maximum of two identities and prioritized Latinx/Hispanic groups over racial groups, like AIAN. This data cleaning process could then omit cases where Latinx respondents also identified as Indigenous. Since 2020, the ACS codes also capture up to six responses to the Latinx/Hispanic origin question, substantially increasing the number of AIAN residents of Latinx/Hispanic origin identified in the survey.
Comparing year-over-year data from the ACS 1-year estimates demonstrates how much these backend methodological changes altered the population counts for Indigenous residents. The number of people nationwide who identify as American Indians and/or Alaska Natives (whether alone or in combination with other racial/ethnic identities) jumped from 5.6 million in 2019 to 8.9 million in 2020, a 58 percent increase. The increase was even greater among AIAN residents who also identify as Latinx: from nearly 1.4 million to over 2.8 million people, a 102 percent rise in the population count. The number of AIAN residents did not grow by over 4 million people in the span of a year, especially in an era where AIAN birth rates have been declining. Rather, the Census Bureau’s new coding system could finally incorporate the uncounted millions of multiracial Americans with claims to Indigenous identity.
In turn, these methodological changes might shape our own comparison of 2016 and 2022 ACS data. It is likely that the 2022 dataset included a wider range of multiracial residents identifying as AIAN, which in turn alters the median household incomes for each metropolitan area. Because the Indigenous populations in the top 100 metro areas are relatively small, the expansion of the Indigenous population sample in 2022 could have created more dramatic increases in the median household income, especially if the Census Bureau’s new criteria are more likely to include higher earners in the sample.
Limits to Survey Data and Indigenous Identity
One challenge to surveying Indigenous populations is that “American Indian” and “Alaska Native” are both racial as well as political identities: enrolled members of the 574 federally recognized tribes have dual citizenship in the US and their tribal nation. Because the American Community Survey tracks race and not tribal citizenship, ACS data makes no distinction between people who are formal members of Indigenous groups and residents who claim Indigenous ancestry but who have no personal, familial, or political ties to contemporary Indigenous communities. Tribal citizens include many multiracial American Indians and Alaska Natives, so being multiracial cannot indicate whether or not Indigenous people are part of Indigenous institutions and social networks.
Recent US Census Bureau data underscores the extent to which self-reported American Indian populations exceed the number of tribal citizens. For instance, over 1.5 million people identified as Cherokee on the 2020 Decennial Census—a total over three times the number of enrolled Cherokee citizens, which was slightly over 450,000. Without the ability to discern enrolled tribal citizens and embedded community members from people who only self-report Indigenous identity, this analysis risks including more higher-income residents who are far removed from the structural disparities facing Indigenous communities, and who thus elevate the median Indigenous household income in many metro areas. That is, the available population data may artificially inflate the overall share of affordable neighborhoods for Indigenous renters in our dataset.
Despite these methodological challenges in determining Indigenous identity, restricting the criteria to the political identity—i.e., only enrolled and affiliated tribal citizens—comes with an additional problem. There are around 400 Indigenous tribes in the US without federal recognition, whose members would be excluded by these narrower criteria. Federally unrecognized tribes are ineligible for many government funding opportunities, and their members thus face their own financial constraints and structural discrimination. Hence, using the political definition for American Indians and Alaska Natives would arbitrarily excise many tribal groups who are actively petitioning for federal recognition, further marginalizing these communities.
In all, these methodological decisions about determining Indigenous identity from survey data are inseparable from the real-world political and legal challenges over recognizing Indigeneity. This is all the more relevant when we consider the deep history of people in the US claiming Indigenous heritage without any social ties to Indigenous communities.
The Politics and History of Claiming Indigenous Identity
In order to create a year-over-year, nationwide dataset with the American Community Survey, the US Census Bureau relies on individuals and households in each survey sample to self-report their racial, ethnic, and ancestral identities. These self-reporting practices—not just in the ACS and Decennial Census, but also in other public surveys we use—are the foundation for the demographic data we report at the National Equity Atlas.
Self-reporting is an essential practice, but it comes with its challenges. Write-in fields for race, ethnicity, and ancestry, such as those on the ACS, help ensure that individuals from relatively small communities are being counted in data that government agencies use to determine resource allocations. At the same time, self-reporting depends on respondents’ understanding of their own genealogy and their personal interpretation of socially constructed identities like race and ethnicity. In one example, then-President Barack Obama, whose mother was white, identified himself only as Black on the 2010 Decennial Census. In other words, demographic data from the ACS is not some incontrovertible “truth” about Americans’ ancestry, but rather an aggregation of millions of individual decisions about how to write themselves into the survey’s format.
It is likely that one trend in racial self-reporting—Americans without formal ties to Indigenous communities claiming to be Indigenous—is still present in ACS demographic data, and it is difficult to quantify the extent of this trend. In other words, it is possible that Indigenous population changes between 2016 and 2022 (and in turn, changes to Indigenous median household incomes) could result in part from a surge in the number of people who newly self-report as Indigenous. However, claiming oneself as Indigenous is a practice that extends deep into the history of the Americas, so it is also important to explore this phenomenon across time.
Home Ancestry Kits and Racial Self-Reporting
Despite centuries of legal restrictions, mixed-race partnerships have been a constant throughout US history. As such, it is reasonable that many people in the US without formal ties to modern-day Indigenous communities could locate at least one Indigenous ancestor back in their family tree. While it is likely that many of these residents do not self-report as Indigenous, it is also possible that some people do identify as Indigenous on demographic surveys solely based on their self-understanding of family history.
The proliferation of take-home DNA ancestry kits in recent years may have led to an increase in the number of people who newly presume that they have some Indigenous ancestors and subsequently self-report that ancestry. While difficult to quantify, this phenomenon has been common enough for research scientists, employees of ancestry kit companies, and Indigenous tribal organizations alike to caution against using these tests to affirm one’s Native American identity. DNA ancestry kits compare one’s genetic profile to a reference population and thus indicate a probability of ancestry based on genetic overlap, not a definitive breakdown of one’s lineage.
Scientists and Indigenous leaders alike stress that Indigenous identity is a matter of social belonging, political membership, and cultural experience, not a question of genetics. Uncritically reading the results of a home ancestry kit can enable people without Indigenous social ties to misappropriate Indigenous culture and unjustifiably speak on behalf of Indigenous community members. In addition, treating Native ancestry as some incontrovertible truth coded in one’s DNA also mischaracterizes racial identity as a rigid biological marker—which it is not—as opposed to understanding race as a social construct that has resulted from centuries of policies and practices, and one which continues to evolve.
Claiming Indigenous Ancestry as a Form of National Mythmaking
Shared mythologies about Indigenous ancestry are not exclusive to the US. Many Latinx residents trace their origins to nations where the social construct of race revolves around the concept of “mestizaje.” Mestizaje claims that the nation is unified around a shared multiethnic, multicultural identity rooted in the historical intermixture of Indigenous peoples, enslaved Africans, and European colonists. Scholars and writers critical of mestizaje have argued that this seemingly patriotic celebration of ethnic mixture can downplay the centuries of exploitation and exclusion that enslaved Africans and Indigenous peoples experienced throughout the Americas, reframing racial violence as consensual coexistence. Moreover, by offering that all non-immigrants in these nations are multiethnic, mestizaje might fail to recognize the modern-day Afrolatinx and Indigenous people within these nations, who still endure the social and economic disparities that grew out of enslavement and colonization.
It is important to weigh the popularity of concepts like mestizaje in discussing demographic trends among Latinx people in the US. About one-quarter of Latinx adults nationwide also identify as Indigenous when asked directly, though just 5 percent of Latinx people identify as AIAN when identifying their race on a multiple-choice survey question.4 To be certain, many Latinx people in the US certainly do have some Indigenous ancestry. However, it is difficult to use ACS data to distinguish the specific Latinx residents with familial, social, and/or political ties to modern-day Indigenous communities in Latin America from the wider universe.
Because these ideas about widespread Indigenous ancestry are commonplace in the US and Latin America, recent developments like home ancestry kits can serve to reinforce or confirm existing presumptions about one’s lineage. Indigenous leaders and data scientists who caution against people uncritically claiming an Indigenous identity face the challenge of confronting these longstanding cultural conceits.
Final Takeaways
This discussion serves as a reminder that there is no such thing as purely objective population data. Data collectors, data administrators, and researchers all make deliberate decisions about how to categorize thousands, if not millions, of individual responses for the sake of analysis and storytelling. Because race and ethnicity are not fixed biological categories, but social constructs, people also have a wide array of self-definitions about their own racial and ethnic identity. In multiracial societies like the US, many people fall in between categories, or identify with multiple racial and ethnic groups.
These broad racial and ethnic categories are still essential for population data studies, because they illuminate larger structural and social trends in racial and economic inequity. However, it is inevitable that frictions or challenges can emerge in how researchers make decisions about turning the full spectrum of survey responses into clear-cut racial categories.
The challenges became clear in working with Indigenous population data due to the unique nature of tribal citizenship, the political and legal divides between federally recognized and unrecognized tribes, and the longstanding social trends of people outside of tribal communities claiming Indigenous heritage. Without additional research, we can only speculate on how many members of these different social groups exist under the umbrella of “American Indian and Alaska Native” on the American Community Survey.
However, these methodological challenges should not detract from the fact that many metropolitan areas are broadly unaffordable for Indigenous renters. Moreover, Indigenous community members face a number of challenges around housing precarity, poor housing quality, and limited social supports for Indigenous people away from reservation lands and from federally unrecognized tribes. It is essential that we carry forth both dialogues: how to refine our understanding of Indigenous community change through better data and how to advance policies and practices that support the well-being of Indigenous renters specifically, and all renters in general.
1 Note that we use the term “tribe” here to conform with the language used in the ACS survey and analysis; however, in general we aim to refer to an Indigenous group as a “people” or “nation,” in light of the pejorative meaning “tribe” carries for many groups, used historically to equate Indigenous people with being “savage” or “primitive” and frequently misused in modern contexts to perpetuate stereotypes about African and Indigenous peoples (Kojo Institute 2020).
2 Some US historians place the conquest of the North American continent and Pacific Islands within a single continuum of American empire. See, e.g.: Daniel Immerwahr, How to Hide an Empire: A History of the Greater United States, (Farrar, Straus and Giroux, 2019); V.G. Kiernan, America: The New Imperialism: From White Settlement to World Hegemony, (Verso Books, 2005).
3 The 2022 ACS 5-year estimates, which use survey data from the years 2018 to 2022, combine survey data from years with the less inclusive criteria for Indigenous residents (2018 and 2019) and years with the more inclusive criteria (2020, 2021, and 2022). The 2024 ACS 5-year estimates will be the first multiyear survey that fully incorporates the new, more inclusive criteria.
4 The figure of 5 percent includes all Latinx adults who identify as American Indian and/or Alaska Native, whether alone or in combination with other racial identities. Only 2 percent of Latinx adults identify only as AIAN.