Tuesday, 2 February 2016

How to lie with charts - crude oil versus retail gasoline prices

After watching a news item this morning, I posted the following question to social media:

If oil has dropped from > $100 / barrel to < $30, why are consumers still paying > €1 / litre?

There were some interesting responses. There was in my mind a suspicion that the retail prices were not coming down as quickly as the crude prices - but I had nothing to back that up with. I decided to investigate.

Taking crude oil prices from US Energy Information Administration and monthly retail price data from AA Ireland, I put the two together quickly in QlikView. I decided to fix the time period to January 2010 to January 2016, as the last time the Irish government added an additional excise duty to fuel was in December 2009, so I knew that wouldn't interfere with the figures.

I plotted the data on a time series and, Aha!:


"Black and white!", I thought to myself. How obvious. While the crude price has been dropping like a stone, the retail price has had a much gentler decent. I better get straight onto the press to reveal the petrol companies evil intent towards the good people of Ireland.

But wait! There is a real problem here. The problem is that we have started both axes at zero - which is usually a sacrosanct rule. However, in this case, because we are not comparing the same value ranges, it is actually a mistake. By forcing both ranges into one area, I am actually distorting both of them.

In QlikView, the fix is simple, we just take off the force zero option for both expressions, revealing a much different state of affairs:


The crude and retail prices have actually been varying in a very similar way over the period. If I calculate the Pearson's correlation coefficient for these two series, it comes out at aproximately .77 - which is generally considered a high correlation for this type of data. In fact, if I drill into the last couple of years, the correlation is even tighter:


The correlation coefficient for the last 25 months data calculates at approximately .95!

Any data scientists in the room might be tempted to normalize the data (calculating the z-scores) so that we can plot them on the same axis. When we do, we get a similar view to the one above:


And here is an example in Qlik Sense Cloud:


So, perhaps the oil companies are playing a straight bat on this one. There are many different variations into what goes into the retail price of a litre of fuel. The crude oil price is one of those, but quite significant. If we can see a good correlation between the two, then we can have some sense of confidence that all is operating fairly.

The main point here though is that it is quite easy in a lot of visualisation tools to accidentally tell the wrong story. You may have best intentions, but you may end up telling visual lies.

Be careful out there!


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Friday, 29 January 2016

CRISP-DM for Data Viz projects

Do you have a methodology for implementing Data Visualization projects?

How do you go about working with your stakeholders to deliver value?

The conception of CRISP-DM is 20 years old this year. It was conceived of as a process to formalize data mining (Cross Industry Standard Process for Data Mining) but if we have a look at the diagram below, it really fits for data visualization too!


When do we not do all of these steps in a data visualization project? If you are not doing them, why not? I'm OK if you don't, as long as you know why you are not.

It is definitely worth a data visualization practitioners while to review the documentation - much of it freely available online (start with CRISP-DM 1.0 Step-by-step data mining guides)


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Monday, 10 August 2015

Data Experience

Do people believe in the data more if they are holding something in their hand? Do they, literally, give it more weight? According to recent Dutch research, it just might.


My interest was piqued by the recent episode of the Data Stories podcast (interestingly, they are back to being sponsored by Qlik again) where they interviewed Dani Llugany Pearson from Domestic Data Streamers.

Domestic Data Streamers create some wonderful installations, transforming data into art. These are installations that people can interact with and influence by adding to the data. It is a really marvelous concept. People can see, walk around, touch, and engage with data. Dani described it as an Info Experience as opposed to a static info graphic.

During the discussion, some recent research from the Netherlands was mentioned and they kindly shared the link to the research in the show notes. This research, performed by Nils B Jostmann, Daniël Lakens and Thomas Schubert, shows that people holding a heavier weight will effect cognition and lead people to assign more importance. It is an intriguing idea and you can read the research here:

    Weight as an embodiment of importance

So, if we attach more importance to more weight, do we see data visualizations on an iPad as being more important than the same data on an iPhone? Intriguing! How does that data seem on a desktop computer, where the only weight is in the mouse that we slide it across the desk?

Weight, of course, is only one facet of our data experience. The visuals must be important too, just like the research that shows that people eat less when they can't see food, they just don't enjoy it as much:

    Vision and eating behavior. (2002. Linné Y, Barkeling B, Rössner S, Rooth P.)

How about being able to touch and interact with the data? How does that make me feel about it?

In their 2011 paper, David Spiegelhalter et al discuss the ethical imperative to provide transparent information. This is because when we build dashboards for other people, "the desired outcome must be considered from the start" - we have to think about what we are trying to present before we design the dashboard. We are more persuading than informing where we should be more informing:

    Visualizing Uncertainty About the Future (2011. Spiegelhalter, Pearson, Short)

So, a visualization tool that I can hold in my hand and feel the importance of the data, that looks good enough to eat, and allows me to inform myself rather than being persuaded by someone else would be the ideal Data Experience. I wonder where I could get one of those?


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Sunday, 26 July 2015

Crime and employment - Color as a trend

I was playing around with Qlik Cloud again. I grabbed some data from the Irish Central Statistics Office on employment statistics in Ireland and reported crime - both over a period from Q1 2003 up to, and including, Q1 2015.

It was easy enough to upload the data to the cloud and then load it into the application. A scatter chart seemed to be a good choice to see if there is any correlation between the two sets of data:


My first impression of the correlation is that there doesn't appear to be one. Perhaps there is a small negative correlation that crime incidents will reduce as unemployment increases (which is not what I would have thought before looking at the data).

The second thing that I thought about was that I couldn't quickly see the trend of the data in this chart. Now, I know that a line chart would be more useful for seeing any trend, but I felt that, with each dot on the scatter representing a calendar quarter, it would be useful to discern the trend in this chart. I thought that it might be useful to use color to achieve this.

I had a field called QuarterNumber, which is a 5 digit number representing the four digit year and the quarter number from 1 to 4. I created this color expression:

  ColorMix1(
    (QuarterNumber-Min(Total QuarterNumber))/
    (Max(Total QuarterNumber)-Min(Total QuarterNumber)),
    LightGray(),
    LightBlue()
  )

This results in a chart in which it is very easy to see where in time a point exists. This, I believe, adds more clarity to the chart:



As to the analysis, it appears that the negative correlation is true for many crimes, and that shows up in the overall figure. However, crimes that people are quite concerned about - theft, burglary and robbery - show a different correlation:


With those three types selected, there is a distinct positive correlation.

The good news for the Irish is that the highest quarter - Q4 of 2014 - was only 29,088 of such incidents. This is a very low percentage and Ireland is actually one of the safest countries in the world to live.


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Thursday, 16 July 2015

Low down on Qlik Cloud 2.0

Qlik Cloud was launched along with Qlik Sense 1.0 at the end of last year. It allowed users of Qlik Sense Desktop to upload a document to the cloud and then share that application with other users. There were limitations. For example, the number of users that you could share with was limited to 3. Also, once they were up there, the applications could not be edited.

With the launch of Qlik Sense 2.0, we were promised new functionality, and this appears to have been made available in the last couple of days.

Now, as well as uploading applications that have been created from Qlik Sense Desktop, we can also upload files and create brand new applications directly on the service. Not only that, but we can make use of Qlik's new DataMarket to bring in curated data sources such as demographics, currencies and weather.

The number of users that we can share applications with has been increased to 5. But we have a brand new feature in that we can choose to share individual charts from these applications on our blogs and other media - like this:



Creating new applications is very straightforward. First, we need to provide some data (Excel, CSV, etc.):


We can now start to create new applications in our personal cloud:


When we create the application, we can choose to load in the data that we have uploaded:


And Qlik will parse it out for us:


Or we can choose to get data from the Data Market:



We can bring in multiple data sources and Qlik provides a profiler to suggest the correct data links.

Once we have loaded the data successfully, we can start creating content with the drag/drop interfaces:



When we have created a chart, there is a right-click (or tap-and-hold) option to share it. We can get a link that can be used to share via email, social media, etc., or an embed link to share via blogs and other web pages (as I have used above):


All very, very easy!

Part of the success of Tableau Software is that they have had Tableau Public, where users can create content and share it for free. Only time will tell whether the Qlik Cloud solution will challenge that, but not having to have an installed Windows application will certainly be interesting for many people.

So, that is the down-low on Qlik Cloud - don't keep it on the low down. Time for you to go and play and start creating data applications in the cloud!


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Saturday, 11 July 2015

Invisible servants

The King woke in the morning and stretched. The sun streamed through the opened curtains.

He eased out of bed straight into his waiting slippers and dressing gown.

He made his way into his bathroom and eased himself into the bath - which was at just the right temperature for him.

After he dried off, using perfectly warmed towels, he made his way to his dressing room to don his pressed trousers and freshly ironed shirt.

He wandered down to the breakfast room to sit down in front of the exact breakfast that he wanted. Laid beside his breakfast plate was the days letters. Of course, only the ones of interest to him were there, there was no junk.



A lovely story, but what is missing? Of course, it is the servants. The servant who opened the curtains at the right time. The servant who laid out the slippers and dressing gown. The servant who prepared the bath at exactly the right temperature.

What on earth has this got to do with data visualization???

It has everything to do with data visualization! Your data visualization needs to be the servants. Your users are the King. The servants should present the right data in the right way so that the King doesn't even realize that the servants are there!

Remember your ink-to-data ratio and let the data dominate.

Let your servants fade into the background and crown your users.


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn

Wednesday, 10 June 2015

Are you answering the right question?

I bring before you the story of statistician, Abraham Wald.

During World War II, Wald was part of a team looking at the problem of bomber loss and to consider how they should reinforce the planes to better protect them. The problem was proposed that they should look at the frequency of damage sustained by returning bombers and to use that information to make recommendations on where the plane should be reinforced.

Wald's brilliant insight was to turn the problem on its head. He suggested that the places where the returning planes were being damaged most frequently were the places where those planes could actually sustain damage and, mostly, successfully return to base. What the question should really be is where the planes that weren't returning were being damaged which meant that they failed to return!

It seems so simple when you think about it, but sometimes we are so sure that we are looking at the problem the right way, that new insight that tells us that we are looking at it completely wrong is not always well received. But we should receive it and we should look at it and only dismiss it if we can logically decide that it should be dismissed.

So, think outside the box and solve this problem:

Here is a pattern of 9 dots, arranged in a 3x3 grid:




Now, I want you to connect the pattern of 9 dots using four straight lines drawn without lifting the pen from the paper or retracing any lines. Simple, eh?

Please don't post solutions below. If you discover it, just be happy that you have done so and feel good that you have thought differently and bring that skill to your daily work.


Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn