From projects created by volunteers, like 17-year-old Avi Schiffmann’s Covid tracking site and Nextstrain’s Genomic Epidemiology Tracker of Novel Coronavirus, to initiatives from leading institutions like Johns Hopkins Covid Tracker and New York Times Covid Maps, we at Alation wanted to support the experts around the world leveraging data to better understand COVID-19. To help these data heroes work more productively and share knowledge and insights collaboratively, we have created the COVID-19 Data Catalog.

With so much data out there, it can be difficult to find, understand, and compare relevant data sets. And while knowledge sharing is vital for impactful analysis, it isn’t easy for our disparate data heroes working across the globe to collaborate. To help overcome these challenges, the COVID-19 Data Catalog creates one place where experts can easily find COVID-19 datasets. The data catalog also includes related datasets on weather, census population, and demographics, so that insightful comparison can be made with ease. The data catalog also creates a community where experts can connect and share information with like-minded experts who have the same goal in mind: helping us contain, combat, and respond to COVID-19. With the COVID-19 Data Catalog, we want to enable the data community to focus on finding insights rather than spending time on the incredible amount of grunt work that comes with analyzing data.

What went into making the COVID-19 Data Catalog:

  • Creating a home-grown team of “Data Hunters” whose goal is to find, classify, and prioritize new data sources and catalog and document them.
  • Providing an open data warehouse (DW) with our partner Amazon Web Services (AWS) that anyone from the community can access.
  • Staging the latest data in the DW, with our partner Trifacta, cleaning the data and pushing it through automated data pipelines.
  • Leveraging Alation’s query publishing to make it easier for everyone using the catalog to build off of each other’s work (particularly exciting for me as a data scientist).
  • Adding the ability to search in other data sources, like Snowflake’s Covid StarSchema.
  • Implementing the ability to query against not only U.S. data but also data from countries like India and China.
  • Making it easy to identify projects and submit your own, and much more.

This project has been near and dear to the hearts of everyone at Alation — and has struck a chord with me personally. A few years ago, I was a community lead for one of the largest volunteer data science collectives in the United States, Data 4 Democracy (D4D). The goal of this collective was to unite an, “… enthusiastic network of individuals utilizing data to drive better choices and improve the world where we live.” Within 6 months of launching, we went from a hundred members or so to over 2,000. We had projects that spanned from building prediction models for Boston EMS and winning the UN’s #IDETECT challenge to helping to advance important work on global data ethics. As you can imagine, the data challenges were real, painful, and some of the very same ones that we face today. I’m intimately familiar with these challenges and hope that the COVID-19 Data Catalog can help alleviate them for the heroes digging into the data to help us beat this disease.

If you want to help with this initiative, you can request access to the Alation COVID-19 Data Catalog by registering here!

What is a Data Catalog eBook