Sunday, 21 February 2016

Fundamental rules of data visualization

There are many "rules" of data visualization that we read in many publications. Some contradict others and some just don't make any sense. Some are accompanied by extensive amounts of proofiness, but often is missing appreciation of the fundamentals. I can use algebra to prove to you that 1+1=1, using perfectly legitimate algebraic transformations, but it is invalid because it breaks a fundamental rule (for those who are interested, I will add it at the end of the post).

I like to preach three fundamental rules of data visualization to those who will listen:

1.  Data visualization is all about ratios
This is so fundamental as almost ridiculous to have to mention, but we need to mention it. Any visualization that seeks to juxtapose several values for interpretation must do so using some kind of visual ratio.

There are many kinds of visual ratios and some are more effective than others. Cleveland and McGill (1984) gave us the order of effectiveness of interpretation for these ratios:

  • Position on a common scale
  • Position on non-aligned scales
  • Length
  • Direction
  • Angle
  • Area
  • Volume
  • Curvature
  • Shading
  • Color saturation

To try and create a data visualization that is not based on some kind of visual ratio is a fundamentally flawed approach. Every ratio is not always appropriate for every visualization either, so we need to learn about what works where.

2.  Data visualization is all about context
We can create the most wonderfully beautiful bar charts and present them on a large screen in Times Square or print them on the most opulent paper in the most vivid colors, but without context they are just rectangles.
Context devices will include such simple elements as titles and axes - enough annotation so as to allow the reader to understand exactly what they are looking at.
As Amanda Cox, Graphics Editor at the New York Times, said in her Eyeo Festival talk:

The annotation layer is the most important thing we do... otherwise it's a case of here it is, you go figure it out.

3.  Data visualization is about SFW
This is the most important thing from a business point of view - and good data visualization is about creating a good solution for the business. SFW stands for So What.
I will always remember the day when I had spent hours on a great dashboard to present to a board-level executive at one of our most important clients. It was technically awesome! Really pushing the boundaries of what the tool could do.
I proudly showed it off at the executive presentation. My client sat patiently through it until, finally, he looked me straight in the eye and said:

So f***ing what?

He was right of course. My technically advanced dashboard had a huge fundamental flaw - I had failed to connect it correctly to the business problem. It wasn't a good solution at all - except in my head.
Fundamentally, we need to make sure that our data visualizations connect with the audience that they are intended for. The first two rules give us the correct technical result, the last gives us the brilliant business solution.

We can create some great business solutions by following these three rules. They may not look great, they may have garish colors, but if the CEO is able to use them to track his business then that is a very good dashboard.

To achieve glory among your peers, you need to start going beyond the fundamentals. Learn what works and what doesn't in most situations. Know when you should use a pie chart and when you shouldn't. Learn how to lay things out. Learn the best colors to use. This does lead to a fourth rule that could be considered fundamental:

4.  Get out of the way and show the numbers
We don't talk about all the color and layout stuff for the good of our health. There are good reasons for doing things in the ways that you will read about in the books. Learn about the reasons for good consistent layout, easy on the eye colors and clean presentation.
Above all, learn that if we don't follow the fundamentals then we start to potentially obscure the data, and this is a flaw that is important to correct.
Get out of the way and show the numbers.

For those that are interested, 1 + 1 = 1:

a = b = 1

a = b

a^2 = ab

a^2 - b^2 = ab - b^2

(a + b)(a - b) = b(a - b)

a + b = b

1 + 1 = 1

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Tuesday, 2 February 2016

How to lie with charts - crude oil versus retail gasoline prices

After watching a news item this morning, I posted the following question to social media:

If oil has dropped from > $100 / barrel to < $30, why are consumers still paying > €1 / litre?

There were some interesting responses. There was in my mind a suspicion that the retail prices were not coming down as quickly as the crude prices - but I had nothing to back that up with. I decided to investigate.

Taking crude oil prices from US Energy Information Administration and monthly retail price data from AA Ireland, I put the two together quickly in QlikView. I decided to fix the time period to January 2010 to January 2016, as the last time the Irish government added an additional excise duty to fuel was in December 2009, so I knew that wouldn't interfere with the figures.

I plotted the data on a time series and, Aha!:

"Black and white!", I thought to myself. How obvious. While the crude price has been dropping like a stone, the retail price has had a much gentler decent. I better get straight onto the press to reveal the petrol companies evil intent towards the good people of Ireland.

But wait! There is a real problem here. The problem is that we have started both axes at zero - which is usually a sacrosanct rule. However, in this case, because we are not comparing the same value ranges, it is actually a mistake. By forcing both ranges into one area, I am actually distorting both of them.

In QlikView, the fix is simple, we just take off the force zero option for both expressions, revealing a much different state of affairs:

The crude and retail prices have actually been varying in a very similar way over the period. If I calculate the Pearson's correlation coefficient for these two series, it comes out at aproximately .77 - which is generally considered a high correlation for this type of data. In fact, if I drill into the last couple of years, the correlation is even tighter:

The correlation coefficient for the last 25 months data calculates at approximately .95!

Any data scientists in the room might be tempted to normalize the data (calculating the z-scores) so that we can plot them on the same axis. When we do, we get a similar view to the one above:

And here is an example in Qlik Sense Cloud:

So, perhaps the oil companies are playing a straight bat on this one. There are many different variations into what goes into the retail price of a litre of fuel. The crude oil price is one of those, but quite significant. If we can see a good correlation between the two, then we can have some sense of confidence that all is operating fairly.

The main point here though is that it is quite easy in a lot of visualisation tools to accidentally tell the wrong story. You may have best intentions, but you may end up telling visual lies.

Be careful out there!

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Friday, 29 January 2016

CRISP-DM for Data Viz projects

Do you have a methodology for implementing Data Visualization projects?

How do you go about working with your stakeholders to deliver value?

The conception of CRISP-DM is 20 years old this year. It was conceived of as a process to formalize data mining (Cross Industry Standard Process for Data Mining) but if we have a look at the diagram below, it really fits for data visualization too!

When do we not do all of these steps in a data visualization project? If you are not doing them, why not? I'm OK if you don't, as long as you know why you are not.

It is definitely worth a data visualization practitioners while to review the documentation - much of it freely available online (start with CRISP-DM 1.0 Step-by-step data mining guides)

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Monday, 10 August 2015

Data Experience

Do people believe in the data more if they are holding something in their hand? Do they, literally, give it more weight? According to recent Dutch research, it just might.

My interest was piqued by the recent episode of the Data Stories podcast (interestingly, they are back to being sponsored by Qlik again) where they interviewed Dani Llugany Pearson from Domestic Data Streamers.

Domestic Data Streamers create some wonderful installations, transforming data into art. These are installations that people can interact with and influence by adding to the data. It is a really marvelous concept. People can see, walk around, touch, and engage with data. Dani described it as an Info Experience as opposed to a static info graphic.

During the discussion, some recent research from the Netherlands was mentioned and they kindly shared the link to the research in the show notes. This research, performed by Nils B Jostmann, Daniël Lakens and Thomas Schubert, shows that people holding a heavier weight will effect cognition and lead people to assign more importance. It is an intriguing idea and you can read the research here:

    Weight as an embodiment of importance

So, if we attach more importance to more weight, do we see data visualizations on an iPad as being more important than the same data on an iPhone? Intriguing! How does that data seem on a desktop computer, where the only weight is in the mouse that we slide it across the desk?

Weight, of course, is only one facet of our data experience. The visuals must be important too, just like the research that shows that people eat less when they can't see food, they just don't enjoy it as much:

    Vision and eating behavior. (2002. Linné Y, Barkeling B, Rössner S, Rooth P.)

How about being able to touch and interact with the data? How does that make me feel about it?

In their 2011 paper, David Spiegelhalter et al discuss the ethical imperative to provide transparent information. This is because when we build dashboards for other people, "the desired outcome must be considered from the start" - we have to think about what we are trying to present before we design the dashboard. We are more persuading than informing where we should be more informing:

    Visualizing Uncertainty About the Future (2011. Spiegelhalter, Pearson, Short)

So, a visualization tool that I can hold in my hand and feel the importance of the data, that looks good enough to eat, and allows me to inform myself rather than being persuaded by someone else would be the ideal Data Experience. I wonder where I could get one of those?

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Sunday, 26 July 2015

Crime and employment - Color as a trend

I was playing around with Qlik Cloud again. I grabbed some data from the Irish Central Statistics Office on employment statistics in Ireland and reported crime - both over a period from Q1 2003 up to, and including, Q1 2015.

It was easy enough to upload the data to the cloud and then load it into the application. A scatter chart seemed to be a good choice to see if there is any correlation between the two sets of data:

My first impression of the correlation is that there doesn't appear to be one. Perhaps there is a small negative correlation that crime incidents will reduce as unemployment increases (which is not what I would have thought before looking at the data).

The second thing that I thought about was that I couldn't quickly see the trend of the data in this chart. Now, I know that a line chart would be more useful for seeing any trend, but I felt that, with each dot on the scatter representing a calendar quarter, it would be useful to discern the trend in this chart. I thought that it might be useful to use color to achieve this.

I had a field called QuarterNumber, which is a 5 digit number representing the four digit year and the quarter number from 1 to 4. I created this color expression:

    (QuarterNumber-Min(Total QuarterNumber))/
    (Max(Total QuarterNumber)-Min(Total QuarterNumber)),

This results in a chart in which it is very easy to see where in time a point exists. This, I believe, adds more clarity to the chart:

As to the analysis, it appears that the negative correlation is true for many crimes, and that shows up in the overall figure. However, crimes that people are quite concerned about - theft, burglary and robbery - show a different correlation:

With those three types selected, there is a distinct positive correlation.

The good news for the Irish is that the highest quarter - Q4 of 2014 - was only 29,088 of such incidents. This is a very low percentage and Ireland is actually one of the safest countries in the world to live.

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Thursday, 16 July 2015

Low down on Qlik Cloud 2.0

Qlik Cloud was launched along with Qlik Sense 1.0 at the end of last year. It allowed users of Qlik Sense Desktop to upload a document to the cloud and then share that application with other users. There were limitations. For example, the number of users that you could share with was limited to 3. Also, once they were up there, the applications could not be edited.

With the launch of Qlik Sense 2.0, we were promised new functionality, and this appears to have been made available in the last couple of days.

Now, as well as uploading applications that have been created from Qlik Sense Desktop, we can also upload files and create brand new applications directly on the service. Not only that, but we can make use of Qlik's new DataMarket to bring in curated data sources such as demographics, currencies and weather.

The number of users that we can share applications with has been increased to 5. But we have a brand new feature in that we can choose to share individual charts from these applications on our blogs and other media - like this:

Creating new applications is very straightforward. First, we need to provide some data (Excel, CSV, etc.):

We can now start to create new applications in our personal cloud:

When we create the application, we can choose to load in the data that we have uploaded:

And Qlik will parse it out for us:

Or we can choose to get data from the Data Market:

We can bring in multiple data sources and Qlik provides a profiler to suggest the correct data links.

Once we have loaded the data successfully, we can start creating content with the drag/drop interfaces:

When we have created a chart, there is a right-click (or tap-and-hold) option to share it. We can get a link that can be used to share via email, social media, etc., or an embed link to share via blogs and other web pages (as I have used above):

All very, very easy!

Part of the success of Tableau Software is that they have had Tableau Public, where users can create content and share it for free. Only time will tell whether the Qlik Cloud solution will challenge that, but not having to have an installed Windows application will certainly be interesting for many people.

So, that is the down-low on Qlik Cloud - don't keep it on the low down. Time for you to go and play and start creating data applications in the cloud!

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Saturday, 11 July 2015

Invisible servants

The King woke in the morning and stretched. The sun streamed through the opened curtains.

He eased out of bed straight into his waiting slippers and dressing gown.

He made his way into his bathroom and eased himself into the bath - which was at just the right temperature for him.

After he dried off, using perfectly warmed towels, he made his way to his dressing room to don his pressed trousers and freshly ironed shirt.

He wandered down to the breakfast room to sit down in front of the exact breakfast that he wanted. Laid beside his breakfast plate was the days letters. Of course, only the ones of interest to him were there, there was no junk.

A lovely story, but what is missing? Of course, it is the servants. The servant who opened the curtains at the right time. The servant who laid out the slippers and dressing gown. The servant who prepared the bath at exactly the right temperature.

What on earth has this got to do with data visualization???

It has everything to do with data visualization! Your data visualization needs to be the servants. Your users are the King. The servants should present the right data in the right way so that the King doesn't even realize that the servants are there!

Remember your ink-to-data ratio and let the data dominate.

Let your servants fade into the background and crown your users.

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn