Combining Datasets with Shapefiles

It's difficult to find complete walkthroughs for how to combine topographic JSON files with csv datasets. Here are the steps we use on Macs, borrowing heavily from Mike Bostock's Let's Make a Map.

Prepare Your Machine

  • Note that the installation of these tools could be complicated by any number of factors on your computer.
  • If a step fails, or you receive some kind of error, double-check your spelling.
  • All of these Terminal commands are case-sensitive.
  • Make sure you have a complete and recent backup of your machine before beginning.

Get the Installation Tools

Get the Math Tools

  • Open Terminal located within the /Applications/Utilities folder and paste in the following commands.
brew install gdal
npm install -g topojson

Get the Map File

  • We used a map file prepared by the Natural Earth Data project, a wonderful resource for geographic data. We grabbed the low-resolution sovereignty map here for demonstration purposes, but any map file should work.

Prep the Map File

  • Natural Earth Data project files come in the .shp shapefile format, which is a non human-readable format. Luckily, we just installed the tools to convert this file into a slightly friendlier geoJSON file. Still in the Terminal app, enter the following commands, which assume that the downloaded Natural Earth Data folder is on your desktop.

This command moves the focus of the Terminal into the Natural Earth Data folder.

cd ~/Desktop/ne_110m_admin_0_sovereignty/

This command converts the downloaded Natural Earth Data shapefile into a geoJSON file.

ogr2ogr -f GeoJSON worldmap.geo.json ne_110m_admin_0_sovereignty.shp

Opening up the newly created worldmap.geo.json file in a text editor (Sublime Text is highly recommended) will reveal how incredible Natural Earth Data files are. Not only do these files contain detailed border geometry data, but each sovereign country is also defined with population, GDP, region, postal codes, size classifications, and many other useful parameters for visualization purposes.

The ogr2ogr command can do some other cool tricks in addition to converting shapefiles. The -where option provides us the ability to filter the dataset based on any JSON parameter. For instance, each country has a subregion parameter associated with it. If we only wanted a map of the Indian Subcontinent, we could enter the following modified command with values found in the converted document.

ogr2ogr -f GeoJSON -where "subregion IN ('South Asia')" southasia.json ne_110m_admin_0_sovereignty.shp

We can now convert the geoJSON file that ogr2ogr created into a topoJSON file that is significantly smaller and more useable by D3.

topojson -o worldmap.topo.json worldmap.geo.json

This will by default cull all of the extra country data out of the geoJSON file, leaving behind only the geographic paths in worldmap.topo.json. If we wanted to preserve some of that data for our visualization, such as estimated population and GDP, we could enter the following command instead. Again, the title were found by examing the geoJSON file directly.

topojson -o worldmap.topo.json --properties pop_est,gdp_md_est worldmap.geo.json

Get More Data

We may also want to combine some more information with the data included in the Natural Earth Data files. For instance, we could download the Happy Planet Index 2016 dataset to visualize the happiness levels of different countries. This is an Microsoft Excel file, rather than the more flexible csv or JSON files that we've been otherwise using. We can simplfy the downloaded file, rename the headers, and save the file in Microsoft Excel as a csv file to make it easier to work with. We named the converted csv hpi.csv and placed it in the downloaded Natural Earth Data folder on our desktops.

Combine the Datasets

The converted csv dataset contains a parameter called "name," as does the Natural Earth Data files we've been working on. A slight edit to our topojson command will allow us to unite the two files based on this shared parameter with the -e option to reference the external hpi.csv file and --id-property option to choose the parameter to match. We can also rename the parameters in the resulting topoJSON file with the target=source syntax.

A parameter named "HPI" from the csv file can be retained through the conversion as well by adding +HPI to the properties to preserve list.

topojson -o happiness.topo.json -e hpi.csv --id-property name --properties population=pop_est,gdp=gdp_md_est,+HPI worldmap.geo.json

Start Plotting

We can now create a visualization using our Natural Earth Data geographic data and the Happy Planet Index values. Here, a Robinson projection is used along with filled paths, the subject of the next tutorial.