COVID-19 Data Source and Download

We need all Data Scientists around the world helping on COVID-19 data analysis. So, in this post I will talk about these data, location, download and source.

First of all, you can download the processed and cleaned data at my own repository ( https://github.com/guibacellar/COVID-19).

In order to help those want to explore and analyze this data, I’m providing daily updated data (in JSON and CSV) formats, also the Python Code to download, sanitize, convert and generate new features from the original data.

The main source of this data is the https://thevirustracker.com API.

Available MetaData

Original csv (/data/all_timeline.csv) contains 18 fields, but only 5 are really useful:

Name Index Type Description
date0StringDate of Record as MM/DD/YY Format
countrycode 1StringCountry Code with 2 Bytes (Ex: BR)
totalcases 15IntegerNumber of Total Infection Cases
totaldeaths 16 Integer Number of Total Fatalities
totalrecovered 17 Integer Number of Total Recovered Individuals

These simple 5 fields are not enough to perform an comprehensive and more advanced data analysis.

Extended MetaData

After an preliminary exploration and analysis on the original data set, I ended up creating new features to help everyone that uses the data.

This data set is available as csv data format at (/data/all_timeline_with_features.csv) folder on my COVID-19 GitHub repository.

Name Index Type Description
date0StringDate of Record as MM/DD/YY Format
countrycode 1StringCountry Code with 2 Bytes (Ex: BR)
countrylabel2StringSame as Above
totalcases 3DoubleNumber of Total Infection Cases
totaldeaths 4DoubleNumber of Total Fatalities
totalrecovered 5DoubleNumber of Total Recovered Individuals
ft01_previous_day_totalcases 6Double totalcases from previous day
ft02_previous_day_totaldeaths 7Double totaldeaths from previous day
ft03_previous_day_totalrecovered 8Doubletotalrecovered from previous day
ft04_new_cases_per_day 9DoubleNew cases register at day
ft05_new_deaths_per_day 10DoubleNew deaths registered at day
ft06_new_recovered_per_day 11DoubleNew recovered persons at day
ft07_previous_day_new_cases_per_day 12Doubleft04_new_cases_per_day from previous day
ft08_previous_day_new_deaths_per_day 13Doubleft05_new_deaths_per_day from previous day
ft09_previous_day_new_recovered_per_day 14Doubleft06_new_recovered_per_day from previous day
ft10_cases_evolution_rate 15DoubleEvolution Rate for Infection Cases
ft11_deaths_evolution_rate 16DoubleEvolution Rate for Deaths
ft12_recovered_evolution_rate 17DoubleEvolution Rate for Recovered Persons
ft13_death_percent18DoublePercentage of Deaths for each day

Sample

Here we have an simplest sample of all available data using only 3 fields (date, totalcases, countrylabel).

Timeline – Total Cases per Country

Now, download the data you self, explore and help to better understand our current problem.

And, join us to resolve one of ten important challengs at COVID-19 Open Research Dataset Challenge (CORD-19) > https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Leave a Reply

Your email address will not be published. Required fields are marked *