We need all Data Scientists around the world helping on COVID-19 data analysis. So, in this post I will talk about these data, location, download and source.
First of all, you can download the processed and cleaned data at my own repository ( https://github.com/guibacellar/COVID-19).
In order to help those want to explore and analyze this data, I’m providing daily updated data (in JSON and CSV) formats, also the Python Code to download, sanitize, convert and generate new features from the original data.
The main source of this data is the https://thevirustracker.com API.
Available MetaData
Original csv (/data/all_timeline.csv) contains 18 fields, but only 5 are really useful:
Name | Index | Type | Description |
---|---|---|---|
date | 0 | String | Date of Record as MM/DD/YY Format |
countrycode | 1 | String | Country Code with 2 Bytes (Ex: BR) |
totalcases | 15 | Integer | Number of Total Infection Cases |
totaldeaths | 16 | Integer | Number of Total Fatalities |
totalrecovered | 17 | Integer | Number of Total Recovered Individuals |
These simple 5 fields are not enough to perform an comprehensive and more advanced data analysis.
Extended MetaData
After an preliminary exploration and analysis on the original data set, I ended up creating new features to help everyone that uses the data.
This data set is available as csv data format at (/data/all_timeline_with_features.csv) folder on my COVID-19 GitHub repository.
Name | Index | Type | Description |
---|---|---|---|
date | 0 | String | Date of Record as MM/DD/YY Format |
countrycode | 1 | String | Country Code with 2 Bytes (Ex: BR) |
countrylabel | 2 | String | Same as Above |
totalcases | 3 | Double | Number of Total Infection Cases |
totaldeaths | 4 | Double | Number of Total Fatalities |
totalrecovered | 5 | Double | Number of Total Recovered Individuals |
ft01_previous_day_totalcases | 6 | Double | totalcases from previous day |
ft02_previous_day_totaldeaths | 7 | Double | totaldeaths from previous day |
ft03_previous_day_totalrecovered | 8 | Double | totalrecovered from previous day |
ft04_new_cases_per_day | 9 | Double | New cases register at day |
ft05_new_deaths_per_day | 10 | Double | New deaths registered at day |
ft06_new_recovered_per_day | 11 | Double | New recovered persons at day |
ft07_previous_day_new_cases_per_day | 12 | Double | ft04_new_cases_per_day from previous day |
ft08_previous_day_new_deaths_per_day | 13 | Double | ft05_new_deaths_per_day from previous day |
ft09_previous_day_new_recovered_per_day | 14 | Double | ft06_new_recovered_per_day from previous day |
ft10_cases_evolution_rate | 15 | Double | Evolution Rate for Infection Cases |
ft11_deaths_evolution_rate | 16 | Double | Evolution Rate for Deaths |
ft12_recovered_evolution_rate | 17 | Double | Evolution Rate for Recovered Persons |
ft13_death_percent | 18 | Double | Percentage of Deaths for each day |
Sample
Here we have an simplest sample of all available data using only 3 fields (date, totalcases, countrylabel).
Now, download the data you self, explore and help to better understand our current problem.
And, join us to resolve one of ten important challengs at COVID-19 Open Research Dataset Challenge (CORD-19) > https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge