Tuesday 2 February 2016

How to lie with charts - crude oil versus retail gasoline prices

After watching a news item this morning, I posted the following question to social media:

If oil has dropped from > $100 / barrel to < $30, why are consumers still paying > €1 / litre?

There were some interesting responses. There was in my mind a suspicion that the retail prices were not coming down as quickly as the crude prices - but I had nothing to back that up with. I decided to investigate.

Taking crude oil prices from US Energy Information Administration and monthly retail price data from AA Ireland, I put the two together quickly in QlikView. I decided to fix the time period to January 2010 to January 2016, as the last time the Irish government added an additional excise duty to fuel was in December 2009, so I knew that wouldn't interfere with the figures.

I plotted the data on a time series and, Aha!:

"Black and white!", I thought to myself. How obvious. While the crude price has been dropping like a stone, the retail price has had a much gentler decent. I better get straight onto the press to reveal the petrol companies evil intent towards the good people of Ireland.

But wait! There is a real problem here. The problem is that we have started both axes at zero - which is usually a sacrosanct rule. However, in this case, because we are not comparing the same value ranges, it is actually a mistake. By forcing both ranges into one area, I am actually distorting both of them.

In QlikView, the fix is simple, we just take off the force zero option for both expressions, revealing a much different state of affairs:

The crude and retail prices have actually been varying in a very similar way over the period. If I calculate the Pearson's correlation coefficient for these two series, it comes out at aproximately .77 - which is generally considered a high correlation for this type of data. In fact, if I drill into the last couple of years, the correlation is even tighter:

The correlation coefficient for the last 25 months data calculates at approximately .95!

Any data scientists in the room might be tempted to normalize the data (calculating the z-scores) so that we can plot them on the same axis. When we do, we get a similar view to the one above:

And here is an example in Qlik Sense Cloud:

So, perhaps the oil companies are playing a straight bat on this one. There are many different variations into what goes into the retail price of a litre of fuel. The crude oil price is one of those, but quite significant. If we can see a good correlation between the two, then we can have some sense of confidence that all is operating fairly.

The main point here though is that it is quite easy in a lot of visualisation tools to accidentally tell the wrong story. You may have best intentions, but you may end up telling visual lies.

Be careful out there!

Stephen Redmond is a Data Visualization professional. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook
Follow me on Twitter   LinkedIn