A Simple COVID-19 Data Project

A Simple COVID-19 Data Project

This is a quick tour of the powerful python based analysis of datasets using open data.

European Centre for Disease Control Covid-19 Dataset

The ECDC releases daily data containing statistics for every country. Using the python requests library its easy to download this automatically every day. Once downloaded the data is written to a CSV file.

The next step us to use pandas to create a dataframe from the downloaded file, clean up the data and calculate some new columns to calculate cumulative cases and deaths per country.

No alt text provided for this image
No alt text provided for this image

Now we can specify the countries we are interested in a list, and iterate over that list, finding the latest date in the dataset and using it generate a summary per country.

No alt text provided for this image


Worldwide Analysis

Now we create another dataframe that holds chart data for all countries in the data set. We add a new column for the 7 day rolling mean, then reshape the dataframe using melt so that we have both cases and rolling mean in a single column. We're ready to plot using Plotly Express.

No alt text provided for this image


The resulting chart shows us that the earliest recorded cases were in mid January, with a peak in early Feb, followed by an exponential growth phase between March and Early April, before flattening again. The rolling mean (red series) removes the spikes in data, but appears to be slightly cyclical.

United Kingdom Analysis

Repeating the same steps as above but performing a query on the dataframe gives us only United Kingdom records, as shown in the following chart.

No alt text provided for this image


Great news, we have passed the peak as Boris has told us. There are still spikes, which could be better understood with more detailed data, but the 7 days rolling mean is decreasing at a a rate that suggests we will have to persist with our lockdown measures for a few more weeks.

The same analysis as above can be applied to cases, which disappointingly don't seem to be decreasing as quickly as deaths, and warrants further investigation.

No alt text provided for this image


This quick tour of python data analysis on the ECDC Covid-19 data set shows what can be achieved using only a small amount of programming expertise, and the pandas library. This could also be achieved using a spreadsheet, but would quickly become unmanageable as the data set grows by 200 records every day. There's a lot more that could be done, by adding country specific data, and merging different data frames together using the country code, but I wanted to keep this one short, and focus on the good news that we're beating Covid-19!

To view or add a comment, sign in

More articles by Paul Whiteside

Explore topics