Data Exploration

In the field of design research, we use data visualization as an additional means to understand the context in which we propose design solutions. Similar to ethnographic research, interviews, secondary research and literature reviews, making sense of readily available data through visualization can serve as another mechanism to enhance our topic knowledge.

Modern day, algorithmically-produced data visualization was originally lauded as a means to visually explore quantitative data, which had to that point appeared conventionally in a spreadsheet. The invention of data visualization allowed us to leverage our perceptual and spatial capabilities to conduct visual and aesthetic analysis of information. With time and expansion of data visualization into other domains, new roles have emerged. For example, data journalism uses visualizations as a way to discover story-worthy topics or to explain an event to the public. Citizen activism uses data visualizations to persuade and promote action.

This all sounds good and logical. However, we’re finding in class that the notion of ‘exploring data’ has many nuances.

A little background: For our class project, we required students to choose a topic area, collect datasets, merge and clean the data. Visually, we asked them early on to create an obvious visualization to get that out of their system, and then sketch out possible visualizations based on the data variables. This would give them initial visual structures and ideas for when they code. Simultaneously, they learned to code and apply D3 functions to produce basic visualizations of their datasets. Their weekly assignments were to use the week’s coding tutorial and come close to implementing their intended visualization in code.

Our design students wrestled with various parts of this process. While these challenges are detailed in literature about data visualization, our students viscerally experienced several key learnings about data and the act of visually representing it. Here are some of their difficulties:

  • Visualizing to build possible hypotheses

    One of the questions that has surfaced in class is, ‘what should I show in my visualization?’ Being designers, as we conduct research, we cannot help but start to produce seed ideas; it’s in the designer’s DNA to speculate the future and iterate. With data visualization tools, this instinct may be counter productive and cause biases to be injected into the visualization. Knowing how to answer the question, ‘what can I show in my visualization?’ presumes you have a hypothesis of what and how to show. Several students were redirected to make the visualization first to figure out what is interesting or meaningful, if anything, in their dataset. Also, adjusting how the visualization is shown – by changing visual parameters – can generate insights or patterns that may otherwise remain unknowable.

  • Describing the world for a new understanding

    A designer may have a hunch of what the data generically may look like (lots of lines radiating out from one location, lots of dots in certain areas, bounded areas of different colors) but we don’t know if that story is true until all of the data from the set is represented. This is problematic for designers, who often need a clear goal in sight before creating visual assets. Designers often engage in conceptual abstraction starting with qualitative data, captured during observation, but data visualization provides opportunity for a different kind of abstraction. It allows the designer to view the gestalt of the dataset in a visual manner while simultaneously examining the specifics of the datapoint.

  • Diagnosing to find problem spots

    Akin to what journalists and citizen activists do, using spatial data visualization allows researchers to determine if there are correlations between variables that indicate an effect, and identify the cause of the effect. In a design context, this can point designers to next steps — such as directly visiting the location to obtain qualitative information or seeking out additional, complementary datasets.

  • Learning about the nature of the dataset itself

    Publically available data comes in all shapes and sizes. Some are institutionally entrenched in the way they are formatted. For example, in Chicago, restaurant violations assessed by health inspectors rank establishments into high, medium or low categories. It is unclear to the layperson what determines the ranking: whether it is based on the number of simultaneous violations, the severity or the duration of a violation, or whether that establishment is a repeat offender. Data visualizations allow us to easily represent rankings into a simple visual code, but the nature of the data may require further understanding and explaining so as to prevent oversimplification and misinterpretation.