By Parul Pandey
Downloading the dataset
We’ll be working with the World Development Indicators Dataset which is an open dataset on Kaggle. We will be using the ‘indicators.csv’ file in the dataset.
Also, since we are dealing with geospatial maps, we also need the country coordinates for plotting. Download the file from here.
The file can also be downloaded from my github repo.
Exploring the data set
The World Development Indicators dataset is just a slightly modified version from the dataset that’s actually available from the World Bank. It contains over a thousand annual indicators of economic development from about 247 countries around the world from 1960 to 2015. Few of the Indicators are:
1. Adolescent fertility rate (births per 1,000 women) 2. CO2 emissions (metric tons per capita) 3. Merchandise exports by the reporting economy 4. Time required to build a warehouse (days) 5. Total tax rate (% of commercial profits) 6. Life expectancy at birth, female (years)
- Jump over to the Jupyter Notebooks and import the required libraries. Make sure to create the jupyter notebook in the same folder as data for ease.
- Set up the country co-ordinates
- Read in the database and explore the database
We obtain the following. It seems that the indicators dataset have different indicators for different countries with the year and value of the indicator.
Life expectancy at birth, female (years) appears to be good indicator for investigation. So, let’s pull out the life expectancy data for all the countries in 2013. We are just choosing the year at random.
Also let us set up our data for plotting by keeping just the country code and the values that we’ve plotted. We’ll also want to extract the name of the indicatorfor use as the legend in the figure.
Creating the Folium interactive map
Now we’re actually going to to create the Folium interactive map. We’ll create a map at a fairly high level of zoom. And then next, we’ll use the built-in method called choropleth to attach the country’s geographic json and the plot data.
We also need to specify the relevant parameters. The ‘key on’ parameter refers to the label in the json object which has the country code as the feature ID attached to each country’s border information. This it the tie that we need to set up in our data. Our country code in the data frame should match the feature ID in the json object.
Next, we specify some of the aesthetics, like the color scheme, the opacity and then we label the legend.
The output of this plot is going to be saved as a html file which is actually interactive. So, what we’ll need to do is to save it and read it back into the notebook in order to interact with it on the map.
We will obtain a map like the one below:
And now we have our map. Notice first the dark colors imply higher life expectancy for females. Clearly US and majority of Europe have a higher life expectancy for females.
So, this is an example of how to do geographic overlays. It is also as an example of how to use additional visualization libraries and how they can be powerful depending on our visualization needs.
This was a pretty simple first step into the world of choropleth maps using Pandas dataframes and Folium. You can explore more about folium and the interactiveness it provides at the official documentation page.
To see the actual interactiveness of the map, visit the Github repo .
- Data Visualization Cheat Sheet
- The future of Big Data, Machine Learning and data Visualization in Europe
- 7 Techniques to Visualize Geospatial Data