COVID-19 Data Project

The COVID-19 Data Project has aggregated COVID-19 case data by race and ethnicity at the local level to facilitate an in depth look on real world disparities in health outcomes. This health equity dataset is specifically arranged by CDC and U.S. Census Bureau-compliant race and ethnicity categories, further detailed in this codebook spreadsheet. The data is updated monthly, and available to be downloaded for further analysis as a .csv or .xlsx file.

The COVID-19 Data Project: Race and Ethnicity Data




According to the Centers for Disease Control and Prevention (CDC), the presence of systemic health and social inequalities result in greater risk for long-term health and socioeconomic impacts from COVID-19 among racial and ethnic minority groups.1 Causal factors for these disparities include high representation as frontline workers, low healthcare access and utilization, gaps in education, wealth and income, lack of access to affordable housing, and discrimination.1 On April 6th, Louisiana was the first state to release COVID-19 data by race. Despite only making up 33% of the state’s population, African Americans accounted for 70% of the state’s COVID-19-related mortality.2 Moreover, in a case surveillance report published in June 2020, the CDC reported that among cases where race and ethnicity were known, approximately 33% were Hispanic, 22% were Black, and 1.3% were American Indian/Alaska Native, despite these racial and ethnic minority groups only accounting for 18%, 13%, and 0.7% of the U.S. population respectively.3


In Georgia, early reports suggest that the proportion of patients hospitalized from COVID-19 were disproportionately Black, and these patients were experiencing more complications due to a higher prevalence of underlying health conditions, such as diabetes and cardiovascular disease.4 These disparities were reported at the state-level. It may help to view data at the local level. Gathering data at the county level and supplementing it with data about socioeconomic status (SES) can provide information regarding place-based risk.5 In June, the Department of Health and Human Services released new requirements mandating that all laboratories report COVID-19 data on race, ethnicity, age, and sex, to state and local public health departments by August 1 2020.6 The COVID-19 Data Project identified the need to gather race and ethnicity data by county for the U.S. in order to address the previously-seen gap in national data sets.


The COVID-19 Data Project: Health Equity Handbook


Standardizing and streamlining the data gathering process became extremely important. Thus, the development of a handbook that includes information on methodologies, definitions, and troubleshooting for data entry occurred. A link to the handbook, which contains more specifics about training and data entry typically used to assist interns, can be found here.


Data Collection


On June 7th, 2020, the first step was taken to determine which U.S. counties were presently providing COVID-19 confirmed case counts by race and ethnicity via state or county public health department websites. From June onward, this process is repeated at the beginning of each month to uncover new counties reporting this data.


The first collection of confirmed case counts by race and ethnicity began on June 15th, 2020, with new counties updated monthly. As of September, around 100 interns are collecting COVID-19 case data by race and/or ethnicity at the county level from over 1,000 counties in the United States. From June 15th onward, case counts were collected daily for the following race and ethnicity categories: White, Black/African American, Asian, American Indian/Alaska Native, Native Hawaiian/Pacific Islander, 2+ Races, Other, Unknown, Hispanic (all races), Non-Hispanic, and Not Specified in compliance with the CDC and U.S. Census Bureau procedure. Quality assurance measures are taken daily to locate errors and inconsistencies in our data.




Varying Reporting Methodologies 


Percentages vs. Numbers. Some counties report COVID-19 cases as percentages (%) of total daily cases while others report their cases as raw counts (#). This can lead to misrepresentation of the distribution of cases by race/ethnicity. 


Race and Ethnicity Categorization. Race and ethnicity are independent of each other, yet some counties combine racial and ethnic categories under one chart or table; this creates a challenge in interpreting their data. For example, when “Hispanic” is treated as a racial category, it is unclear whether Black Hispanics with positive cases of COVID-19 are counted as Black, Hispanic, or both. Some counties will only classify their data into two to four distinct racial categories, such as “White,” “Black/African American,” “Other,” and “Unknown.” This results in a lack of data representative of other racial categories, such as “Asian” and “American Indian or Alaska Native.”


Untimely Data Releases


COVID-19 case counts are collected daily. When counties only update their data on specific days (e.g., weekdays), gaps are unavoidably created. This complicates our ability to observe daily trends for those specific counties. Some data are not reflective of a 24-hour time period because some counties release new data either multiple times within a 24-hour time-period or less frequently than every 24 hours.


Issues with Disease Reporting Systems


Due to the large volume of COVID-19 tests being processed and technical issues arising with disease reporting systems, we have experienced additional gaps in the data. In August, California experienced complications with electronic laboratory reporting system. This led to a backlog of test results and in state and county  health departments underreporting daily new COVID-19 cases.7




As more counties overcome data collection and reporting barriers, we hope to continue to collect, enter, and manage health equity data for all counties across the United States. We hope that this data will serve to identify and address health disparities related to COVID-19 by driving policy change aimed at reducing health disparities.  


Future Releases


The most recent release of this data and document occurred on March 2nd, 2021; previous releases have occurred on September 29th, 2020, November 3rd, 2020, December 8th 2020, and February 2nd. As we continue to collect and release COVID-19 data by race and ethnicity, these notes will be updated to document our data collection methods and findings. We will also continue to update our "ReadMe" document on our GitHub. We feel it is imperative that we document processes and release data to help inform strategies to keep communities safe during this pandemic. Our hope is that the information may also serve to inform prevention strategies for future pandemics.


Suggested Citation


When using data images, downloaded data, or shared document formats, please attribute BroadStreet as well as the original source, when applicable. For examples and more information, review this article which answers the question “How do I cite BroadStreet?”




Audrey Caprio

Bachelor of Science in Nursing | Purdue University

Master of Public Health Candidate | The George Washington University


Jasmine Johnson

Bachelor of Science in Mathematics | California State University, East Bay

Master of Science in Statistics | California State University, East Bay


Mel Terry

Bachelor of Science in Biology | Madonna University

Master of Science Epidemiology Candidate | Columbia University Mailman School of Public Health





  1.  Centers for Disease Control and Prevention. Health Equity Considerations and Racial and Ethnic Minority Groups. Centers for Disease Control and Prevention website. Accessed September 15, 2020.

  2.  Villarosa L. ‘A Terrible Price’: The Deadly Racial Disparities of COVID-19 in America. The New York Times. April 29, 2020. Accessed September 15, 2020.

  3.  Stokes EK, Zambrano LD, Anderson KN, et al. Coronavirus Disease 2019 Case Surveillance -- United States, January 22-May 30, 2020. MMWR Morb Mortal Wkly Rep 2020;69:759-765.

  4.  Killerby ME, Link-Gelles R, Haight SC, et al. Characteristics Associated with Hospitalization Among Patients with COVID-19 -- Metropolitan Atlanta, Georgia, March-April 2020. Centers for Disease Control and Prevention website. Accessed September 15, 2020.

  5.  Chowkwanyun M, Reed AL. Racial Health Disparities and COVID-19 -- Caution and Context. N Engl J Med 2020; 383:201-203. DOI: 1056/NEJMp2012910

  6.  Weiland N, Mandavilli A. Trump Administration Sets Demographic Requirements for Coronavirus Reports. The New York Times. June 4, 2020. Accessed September 15, 2020.

  7.  McGough M. Coronavirus updates: California reports 200 new deaths; case totals still delayed by glitch. The Sacramento Bee website. Published August 05, 2020. Accessed September 18, 2020.