Dealing with Datasets

When we began our Communication Design Workshop: Data Visualization our first assignment seemed fairly easy. We were tasked with finding a publicly-available dataset, geographic in nature, that we thought could reveal something interesting when visualized in two- and three-dimensions.

Finding an interesting dataset was easy. Working with it? Not so much.

I came across a unique data set describing the state of human trafficking. Each year the US State Department creates a new Trafficking in Persons (TIP) Report describing the current state of Human Trafficking around the world. In 2013 Richard Frank synthesized the data from 2001-2011 reports into a spreadsheet and accompanying report; “Human Trafficking Indicators, 2000-2011: A New Data Set”. The data explores at what level countries are involved in human trafficking, the kinds of crimes that are being committed, and what is being done about it by each sovereign state.

At first glance, the data spreadsheet seemed to effectively compress the information available in the yearly TIP reports. I still found the hard to digest, though, for a few reasons.

  • It is next to impossible for the human eye to scan numerical information, translate it over time, and infer insights.
  • All of the data is interpreted numerically, though the problems they tackle are not inherently quantifiable or normalizable. The original reports describe what was happening in each country, and didn't attempt to evaluate or assign rankings.
  • The data is one expert's interpretation of the TPI information. Surely there would be variability if another individual parsed the available data.
  • Though it is great that this collation exists, the result is challenging to work with. The dataset uses many different kinds of scales and rankings — 0-1-2-3, -1-0-1, 0-1, and more — which are inconsistently applied.
  • Some aspects of the data use 1-2-3 to represent ascending severity, others use 1-2-3 to represent ascending levels of progress. There is no uniformity, since higher numbers can mean both worse or better situations. This makes producing flexible visualizations and making comparisons very difficult.
  • Some countries existed in 2001, which no longer did in 2011, and vice-versa. Other countries had significant name changes. This asymmetry caused many issues in programming.

I began my exploration of the data by envisioning the most elemental aspects of human trafficking (severity by country) on a global scale. This attempt did not reveal anything revolutionary, but was a necessary first step. When discussing this with my professors, we realized that — in order for the data to reveal anything novel — we would need to look at different aspects and levels of the data at the same time.

We were able to visualize the data as an interactive map with D3's assistance. Here, we are able to look at the data geographically across time and toggle between variables easily through a drop down menu. It has become a tool for me to reach new realizations, and exposes patterns that would have otherwise gone unnoticed. This is the power of data visualization as a discovery tool!

From the map it is possible to compare countries, see trends, and identify anomalies across time and space. It had seemed like this would be the end of the design process, but it is clealy just the beginning. Moving forward, I will be examining and researching individual theories based on the data visualized.

Look out for the final visualization and discovery tool soon!