Summary of Data Sources
The California Immigrant Data Portal draws data from a variety of federal, state, and local sources to better understand California’s diverse immigrant population, and the context for immigrant inclusion along key measures of economic mobility, warmth of welcome, and civic participation. The portal provides decades of data for counties, sub-county areas, cities, and the state overall that are geographically consistent over time and disaggregated by immigration status, race, and ancestry.
While the 2000 Census and American Community Survey (ACS) microdata from the Integrated Public Use Microdata Series (IPUMS USA) is used for many indicators due to the rich level of demographic detail it provides, data is also drawn from the following sources: the ACS 5-year summary files front the U.S. Census Bureau, the National Historic Geographic Information System (IPUMS NHGIS), Guidestar, the California Department of Education, the California Department of Justice, the Refugee Processing Center, Political Data, Inc., and Transactional Records Access Clearinghouse.
Details on the data sources and methods used for each indicator can be found by clicking on the links accompanying each indicator display (see the question mark). However, some aspects of dataset construction, such as how we restrict reporting when sample sizes are too low, how we define race, nativity and ancestry, and how we estimate immigration status are relevant to many indicators so we provide that information on this page.
The portal includes data for following geographies:
County: all 58 California counties
Sub-county: 110 Consistent Public Use Microdata Areas (CPUMAs)
City or place: 33 large California cities and one Census Designated Place (East Los Angeles)
The user should bear in mind that while data is available for some indicators for more cities and/or other more detailed geographies than we include in the portal, we sought to include only geographies that we could generate consistent data for across multiple indicators. Given that many indicators rely on the 2000 Census and ACS microdata from IPUMS USA, the set of sub-county and city or place geographies for which we report data on the portal is largely based on those that can be identified in the microdata. They are generally geographies of at least 100,000 people.
Some may be wondering, “what is a CPUMA?” These sub-county geographies are “Consistent Public Use Microdata Areas” created by the Integrated Public Use Microdata Series (IPUMS USA). The version used in the portal is based on the CPUMA0010 variable and is drawn to essentially form what is the lowest common denominator from a geographic perspective between 2000 and 2010 Public Use Microdata Areas (PUMAs). PUMAs are statistical geographic areas of at least 100,000 people and are the lowest level of geography available in the Public Use Microdata Sample (PUMS) data from the Decennial Census and American Community Survey (ACS). In some cases a “CPUMA” may be the same size as a county. Due to the centrality of the microdata samples for several indicators and the fact that the CPUMA is the lowest level of geography at which we can report indicators from the microdata consistently over time, we adopted it as a geography to include in the portal.
A list of all geographies for which data is available on the portal is included below. Please note that not all indicators are available for all geographies due to sample size and other limitations.
Contra Costa County
Del Norte County
El Dorado County
Los Angeles County
San Benito County
San Bernardino County
San Diego County
San Francisco County
San Joaquin County
San Luis Obispo County
San Mateo County
Santa Barbara County
Santa Clara County
Santa Cruz County
Alameda County--Berkeley & Albany
Alameda County--Oakland (Northwest) & Emeryville
Alameda County--Oakland (East) & Piedmont
Alameda County--Oakland (South Central)
Alameda County--San Leandro, Alameda & Oakland (Southwest)
Alameda County--Castro Valley, San Lorenzo & Ashland
Alameda County--Union City, Newark & Fremont (West)
Alameda County--Fremont (East)
Alameda County--Livermore, Pleasanton & Dublin
Colusa, Glenn, Tehama & Trinity Counties
Contra Costa County
Northern Border Counties
El Dorado County
Lake & Mendocino Counties
Los Angeles County (North)
Los Angeles County (West, North, Central)
Los Angeles County--LA City (Sunland, Sun Valley & Tujunga)
Los Angeles County (East, Central)
Los Angeles County--Glendora, Claremont, San Dimas, La Verne & Pomona
Los Angeles County--Diamond Bar, La Habra Heights (East) & Rowland Heights
Los Angeles County--Glendale
Los Angeles County--Burbank, LA City (East, Central), Alhambra & South Pasadena
Los Angeles County--LA City (Northeast/North Hollywood & Valley Village)
Los Angeles County--LA City (North Central/Van Nuys & North Sherman Oaks)
Los Angeles County (Southwest, West, Central)
Los Angeles County--LA City (Central/Hancock Park & Mid-Wilshire)
Los Angeles County--LA City (Central/Koreatown)
Los Angeles County--Whittier, Hacienda Heights, La Mirada & Santa Fe Springs
Los Angeles County--Pico Rivera & Montebello
Los Angeles County (Central, South)
Los Angeles County--East Los Angeles
Los Angeles County--LA City (South Central, Southeast)
Los Angeles County--LA City (Central/Univ. of Southern California & Exposition Park)
Los Angeles County--LA City (Central/West Adams & Baldwin Hills)
Los Angeles County--Inglewood & Hawthorne
Los Angeles County--Downey, Norwalk, Bellflower & Paramount
Los Angeles County--Compton & West Rancho Dominguez
Los Angeles County--LA City (South/San Pedro), Gardena, Lawndale & West Athens
Los Angeles County--Redondo Beach, Manhattan Beach & Hermosa Beach
Los Angeles County--Torrance, Long Beach (Southwest & Central) & Palos Verde Peninsula
Los Angeles County--Carson
Los Angeles County--Long Beach City (North)
Los Angeles County--Long Beach (East), Lakewood, Cerritos, Artesia & Hawaiian Gardens
Marin County--Novato & San Rafael (North)
Marin County--San Rafael (South), Mill Valley & Sausalito
Monterey and San Benito Counties
Orange County (North, Central)
Orange County (Southeast, South Central)
Riverside County (East, Central)
Riverside County (West)
Sacramento County--Citrus Heights
Sacramento County (excluding Citrus Heights)
San Bernardino County (excluding Upland, Montclair, Ontario, Chino, Chino Hills)
San Bernardino County--Upland, Montclair, Ontario, Chino, Chino Hills
San Diego County--Oceanside and inland areas
San Diego County--San Diego (Northeast), Poway Cities & Encinitas
San Diego County--San Diego (coastal, central)
San Diego County--San Diego (Central, East), El Cajon, Santee & La Mesa
San Diego County--San Diego (Central/Mid-City)
San Diego County--San Diego (Southeast), Chula Vista & National City
San Diego County--San Diego (South) & Imperial Beach
San Francisco County--Richmond District
San Francisco County--North Beach & Chinatown
San Francisco County--South of Market & Potrero
San Francisco County--Inner Mission & Castro
San Francisco County--Sunset District (North)
San Francisco County--Sunset District (South)
San Francisco County--Bayview & Hunters Point
San Joaquin County
San Luis Obispo County
San Mateo County--Daly City, Pacifica & Colma
San Mateo County--South San Francisco, San Bruno & Brisbane
San Mateo County--San Mateo (North), Burlingame & Millbrae
San Mateo County--San Mateo (South) & Half Moon Bay
San Mateo County--Redwood City, San Carlos & Belmont
San Mateo County--Menlo Park, East Palo Alto & Atherton Town
Santa Barbara County
Santa Clara County--Mountain View, Palo Alto & Los Altos
Santa Clara County (Central, Northwest)
Santa Clara County--Milpitas & San Jose (Northeast)
Santa Clara County--San Jose (East Central) & Alum Rock
Santa Clara County (Central, East, Southwest)
Santa Clara County--San Jose (Northwest)
Santa Clara County (Central)
Santa Cruz County
Solano County--Vallejo & Benicia
Solano County--Fairfield & Suisun City
Solano County--Vacaville & Dixon
Sonoma County--Windsor Town, Healdsburg & Sonoma
Sonoma County--Petaluma, Rohnert Park & Cotati
Sonoma County--Santa Rosa
Sutter & Yuba Counties
Tulare County (excluding Visalia)
City or place
East Los Angeles CDP
Censoring Observations with Small Sample Sizes
Most indicators in the portal are measures of central tendency (e.g., means and medians) based on survey data, and are subject to a margin of error. While we do not report margins of error, we do make efforts to avoid reporting highly unreliable estimates. Unless otherwise noted, for all indicators derived from the Census and ACS microdata from IPUMS USA, we do not report any estimates based on a universe of fewer than 100 individual survey respondents. For example, the universe for the Median Hourly Wage is the full-time wage and salary workers ages 25-64, and we do not report the median hourly wage if there are fewer than 100 individual survey respondents (i.e., unweighted) in that universe for any particular geography/demographic group. When reporting data for immigration statuses which are estimated (see below) of undocumented, lawful permanent resident, and eligible-to-naturalize adult, we increase the minimum threshold for reporting to 200 individual survey respondents. We do this both to account for the greater uncertainty (e.g. beyond sampling error) accompanying our status estimates and also to protect privacy for vulnerable populations.
It is important to keep in mind that even with these restrictions in place, all indicator values should be regarded as estimates, and particular care should be taken when interpreting data for less populated geographies and for smaller demographic subgroups. Users should not assume that small differences in indicator values between demographic subgroups are statistically significant. Finally, even with the aforementioned sample size restrictions in place, estimates of zero or 0 percent are possible. Such estimates should be regarded as very small numbers/percentages and not actually zero. Similarly, estimates of 100 percent should be regarded as high percentages, and not actually 100 percent.
Categorizing People by Race, Nativity, and Ancestry
In the portal, categorization of people by race is generally based on individual responses to various surveys. For most indicators, people are categorized into six mutually exclusive groups on the basis of their response to two separate questions on race and Hispanic origin as follows:
- “White” is used to refer to all people who identify as White alone and do not identify as being of Hispanic origin.
- “Black” is used to refer to all people who identify as Black or African American alone and do not identify as being of Hispanic origin.
- “Latino” is used to refer to all people who identify as being of Hispanic origin, regardless of racial identification.
- “Asian American” is used to refer to all people who identify as Asian and do not identify as being of Hispanic origin.
- “Pacific Islander” is used to refer to all people who identify as Native Hawaiian or Pacific Islander alone and do not identify as being of Hispanic origin.
- “Native American” is used to refer to all people who identify as Native American or Alaskan Native alone and do not identify as being of Hispanic origin.
- “Mixed/other” is used to refer to all people who identify with a single racial category not included above, or who identify with multiple racial categories, and do not identify as being of Hispanic origin.
Any exceptions to this categorization are noted in the data notes that can be found by clicking on the question mark above each indicator display.
Categorization of people by nativity is generally based on individual responses to survey questions on country of birth and parental citizenship. Unless otherwise noted, people are categorized into two mutually exclusive groups as follows:
“U.S.-born” refers to all people who identify as being born in the United States (including U.S. territories and outlying areas), or born abroad of at least one U.S. citizen parent.
“Immigrant” and “foreign born” refers to all people who identify as being born abroad, outside of the United States, of non-U.S. citizen parents.
Some portal indicator breakdowns include further detail by ancestry. Most breakdowns are based on 2000 Census and ACS microdata from IPUMS USA. While the ancestry groups often reflect (and likely are consistent with) countries or origin, it is important to note that they are actually based on reported ancestry. This was done so that comparisons could be made between the U.S.-born and immigrant populations within a given group. Data by ancestry was also tabulated to be consistent with the mutually exclusive racial groups described above. The ancestral groupings were defined by examining each broad racial/ethnic group separately and selecting the ancestries within each group that capture a reasonably large number of people identified statewide. The ancestral groups broken out for each broad racial/ethnic group are based on the first response to the census question on ancestry, recorded in the IPUMS USA variable “ANCESTR1.”
While many community-based and national organizations that serve the the Middle Eastern and North African (MENA) community have taken issue with the way that federal surveys missclassify the the MENA population as white, and there is now long-awaited movement toward adding a MENA checkbox to future surveys administered by the U.S. Census Bureau, the vast majority of the MENA community is still included among white population in data that is disaggregated by race/ethnicity in federal surveys and in the portal. For indicators that include breakdowns by ancestry, however, dissaggregated data for the MENA immigrant community by detailed ancestry groups can be found (see the "by ancestry" breakdown of the Housing Burden indicator, for example). Doing a better job of the identifying the MENA community in federal surveys is critical to garnering equitable support from federal, state, and local programs -- as well as from the philanthropic community -- that support immigrant inclusion and community development.
Estimating Immigration Status and the Eligible to Naturalize
Several indicators that rely on the ACS microdata from IPUMS USA report data by immigration status, including undocumented, lawful residents, and naturalized U.S. citizens, while the Naturalization indicator reports data for eligible-to-naturalize adults. Among these different statuses, only naturalized U.S. citizenship is self-reported while all the others are estimated following an approach developed by Professor Manuel Pastor at the USC Equity Research Institute.
The approach relies on an increasingly common strategy that involves first determining who among the non-citizen population is least likely to be undocumented due to a series of conditions (a process called “logical edits”) and then sorting the remainder into documented and undocumented based on a series of probability estimates. The probability estimates are derived from a logistic regression model run on the 2014 Survey of Income and Program Participation (SIPP) from the U.S. Census Bureau, from which coefficients are then applied to non-citizen, non-cuban immigrants in the 5-year ACS microdata from IPUMS USA to estimate each respondents probability of being undocumented. Unlike most surveys, the questions included in the SIPP allow researchers to deduce documentation status.
Individuals in the ACS microdata who are not assumed to documented based on the logical edits are then tagged as “undocumented” until estimated control totals from experts at the Office of Immigration Statistics, the Migration Policy Institute and the Center for Migration Studies are met. Estimated control totals at both the national level by country of origin, and at the state level (for all countries combined) are applied. It is important to note that when tagging individuals as “undocumented,” the tagging is not simply done from the top down in terms of estimated probabilities of being undocumented, but is rather done in such as way that the distribution of probabilities for those tagged as undocumented mimicks the distribution observed among those identified as undocumented in the SIPP.
All non-citizens not tagged as undocumented are assumed to be either Lawful Permanent Residents (LPRs) or holders of student or H1B visas, with the student and H1B visa holders. Student visa holders include those who immigrated as adults and were enrolled in higher education at the time they were surveyed. H1B visa holders are identified through a procedure that considers age, country of origin, length of time in the U.S., and occupation. Those not identified as student or H1B visa holders are assumed to be LPRs.
With identifiers in place for who is an LPR among non-citizens in the ACS microdata, we then apply some basic conditions to determine which are likely to be eligible-to-naturalize adults. We include all individuals at least 18 years old who had been in the United States for at least five years prior to the survey or three years if married to a U.S. citizen. We assume a 12.5 percent undercount of the undocumented and 2.5 percent undercont of LPRs (including the eligible to naturalize) in the ACS microdata and adjust survey weights accordingly. For this reason, counts of immigrants by status reported on the portal may sum up to more than the total immigrant population.
For more detail on the methodology summarized above, see here.