Wednesday, 1 April 2020

Exponential data and logarithmic scales


I have to admit that when I first saw some of the recent data visualisations from the likes of the Financial Times and the New York Times, I wasn't an immediate fan. That is because they were using a logarithmic scale which distorts the data. My feeling was that they should be using a population based metric to compare different territories (XX per 100,000 is common).

Comparison of exponential data shown on a normal scale and on a logarithmic scale
A regular scale will have regular increments in the "Y" axis so if one point is twice as high as another, you can tell that it is twice the value. A logarithmic scale grows exponentially - generally log 2, so doubling on each equal size of increase (though the presentation usually rounds grid lines to 10s). If a point is twice as high as another, the higher value is the original value squared (e.g. 4 -> 16, 8 -> 64). It can be difficult for people to interpret, especially if they are not mathematical. In fact, I would suggest that it is almost impossible for most users to quickly tell the accurate difference in magnitude between different points - merely that one point is greater or lesser than another.

There is a general situation where it is useful to use a log scale, and that is where there is some skew in the data. For example, where there is a mix of some very high and many lower values - such as with exponentially growing data. In that situation, the scale of the higher values can obscure the lower values.

Ten US States growth shown on a normal scale. The higher value in one state hides detail in the other states. The dashed grey lines show example exponential growth patterns.
As an example, consider the chart above which shows growth patterns in several US states. All have a exponential type growth, but the higher values in New York makes it difficult to see the direction in detail of the smaller values. The scale needs to accommodate the high New York values, but most of the "action" in this chart it at the smaller values.

Comparison of ten US states using a logarithmic scale. The trajectory lines are straightened and it is easier to see the trajectory of the states with lower values.
When the same data is presented on a logarithmic chart, all of the lines are straightened and we get a much better view of the trajectory of each state.
I can now clearly see that Michigan's trajectory appears to be heading in a slightly worse direction than New York's. I am not concerning myself with how much farther ahead on the trajectory New York is, only the direction that they are both travelling and hence making mental forecasts about Michigan's future.

Bar chart with a logarithmic scale - don't do this kids! The log scale removes the comparative power of the bar chart.
BTW, I am good with using log scales like this for lines, but don't do it for bar charts! The effect of the logarithmic scale is to remove the power the the bar chart has of aiding our understanding of the difference in magnitudes. These differences are encoded by the length of the bar, a log scale will distort it. Don't do it!


Qlik Luminary, Master's Degree in Data Analytics, Stephen Redmond is a practicing Data Professional of over 20 years experience. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook

No comments:

Post a comment

Note: only a member of this blog may post a comment.