Wednesday 29 April 2020

The most important form of work

If you haven't been following the new series of Qlik Virtual Meetups, they are worth tuning into and there is probably one in your region. Being virtual, it doesn't matter if they are in your region or not!

The recent edition of the Qlik Virtual Meetup Scotland featured an old friend in Treehive Strategy's Donald Farmer (it also featured the ever excellent Michael Tarallo talking about Qlik Sense April 2020!). In an excellent presentation, Donald mentioned two particular quotes that resonated with me and some of my own ideas around data visualisation and data literacy.

The first was from (the excellent!) Michael Lewis's book, Losers: The Road to Everyplace but the Whitehouse. The quote was:

"an explanation is where the mind comes to rest"

Donald uses the quote in the context of an analyst looking at data to what they feel is a conclusion. He feels, and I agree, that it is not enough to say that you have just followed the data. You stop being critical. Donald also paraphrases Deming: "with data, your just another person with data, and an opinion". This is because we are all fundamentally human beings with many, many biases. Where our mind comes to rest we feel comfortable.

For me, an interesting example of this occurred just yesterday with a chart published in this article: Three charts that show where the coronavirus death rate is heading. The chart, which at initial look looks like a spiral mess, with time, becomes clear and quite interesting. It is a great example of a chart that you need to spend some time with to appreciate, but the effort is rewarded.

I shared my opinion on Twitter, and some of the other discussion is interesting. However most of the discussion stopped at, "ugh!" - people were not prepared to give the chart the benefit of the doubt and invest any time. "Ugh!", was their explanation. That is where their minds rested.

The second quote was from a Harvard Business Review article titled What's So New About the New Economy written by Alan M. Webber:

"In the new economy, conversations are the most important form of work."

Donald expressed that Data literacy is about communicating with data. It is not just about a few people understanding data, it is about raising the level in society.

Conversations are enormously important in raising all the boats. Some data visualizations are merely about getting to rest - getting users to a position where they have the explanation that they want. Others are designed to try to get users beyond that point. But it is hard to get a body-at-rest moving. Conversations can help us get over that inertia. Conversations about data. Conversations over data.

Will you join the conversation?

As well as holding a Master's Degree in Data Analytics, Stephen Redmond is a practicing Data Professional of over 20 years experience. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook

Friday 17 April 2020

How Many Segments? And other stories

Following on from my recent presentation at the first ever Qlik Virtual Meetup Scotland (link opens the meeting recording), there were a few questions arising. I thought it would be useful to answer them in a blog post.

I like a pie chart but how many segments before it becomes difficult to judge a value?
Would you not just group smaller segments into OTHER?
Pies / Bars
I think the long tail on a bar chart, with the Sense minichart, is very obvious - where lots of tiny pie wedges do not

Let's start with the "how many segments" question and hopefully the thread will move through the answering of the others. First, consider this typical pie chart, produced in Qlik Sense:

A typical pie chart with many segments

As with the majority of visualization tools, Qlik Sense has defaulted to the sensible option of sorting the segments in size order. This immediately allows us to see that the USA is bigger than Germany which is bigger than Austria. I can roughly estimate the values of each (as shown in my research) and I can quickly tell that the three markets make up just under half of the whole.

The very simple answer to the question of "how many segments" is, how many makes sense to meet the business requirements.

On the question of grouping smaller segments, I can modify the original chart as below, and still answer those same business questions:

Pie chart showing the top 3 segments versus all others

In this situation, I can still see answer that business question and have reduced the number of segments on display. I would argue that I can answer that business question equally well with either chart, although the first may actually deliver me additional insights, and that is the critical thing about either chart - that they answer that business question.

What can sometimes be a problem, however, is that with interactive tools such as Qlik Sense the user could drill down to just those 3 countries:

Pie chart after user has drilled to 3 countries 

Now the user is no longer able to see the part-to-whole of these countries' market share. Instead, they are looking at the part-to-whole of just these countries. This may be OK! It depends on the business question that the user wants to answer. If it is a problem, you can use Set Analysis in Qlik Sense to do something about it - similar to one of my previous pie posts.

Let's consider some alternatives (alt-pie charts!), starting with the simple bar chart.

Typical bar chart showing a measure versus a categorical value

Again, we typically order these charts by value, so we can still quickly see that the US is larger than Germany and both are larger than Austria. If our purpose here is to compare one country versus others, then this is the perfect chart to use. Even given that, it is not really so easy to compare the US vs. Poland or even Belgium, but interactivity can help with this. It is, however, a quite a lot more difficult to see how much of the total market is made up by the top 3 countries. That would be especially more difficult if there were a longer tail of smaller values that you might have to scroll. That is why I prefer the pie chart if the business question is a part-to-whole one. We can, of course, clump the other countries into "Others" in the bar chart, but it is still not easy to see the part-to-whole:

Bar chart with "Others" bar

The recommended choice for part-to-whole coming from anti-pie advocates, is the horizontal bar:

Examples of horizontal bars as an alternative to pie charts

As my research has shown, the horizontal bar does not always work as well as a pie chart. It is a valid option, and one that you could consider, but it should definitely not be the default.

To summarize, if you are asking the question "how many segments", then you may be asking the wrong question. Always remember Redmond's Rules:

  • Use the right visual encodings (and pie charts are a valid choice!)
  • Add labels and annotations to provide context to the user
  • SFW! Make sure that you are answering the business question

The last rule can be hard, because sometimes you don't actually know the questions that the users want to answer! In those circumstances, following a Design Thinking methodology will probably get you where you want to be.

As well as holding a Master's Degree in Data Analytics, Stephen Redmond is a practicing Data Professional of over 20 years experience. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook

Wednesday 1 April 2020

Exponential data and logarithmic scales

I have to admit that when I first saw some of the recent data visualisations from the likes of the Financial Times and the New York Times, I wasn't an immediate fan. That is because they were using a logarithmic scale which distorts the data. My feeling was that they should be using a population based metric to compare different territories (XX per 100,000 is common).

Comparison of exponential data shown on a normal scale and on a logarithmic scale
A regular scale will have regular increments in the "Y" axis so if one point is twice as high as another, you can tell that it is twice the value. A logarithmic scale grows exponentially - generally log 2, so doubling on each equal size of increase (though the presentation usually rounds grid lines to 10s). If a point is twice as high as another, the higher value is the original value squared (e.g. 4 -> 16, 8 -> 64). It can be difficult for people to interpret, especially if they are not mathematical. In fact, I would suggest that it is almost impossible for most users to quickly tell the accurate difference in magnitude between different points - merely that one point is greater or lesser than another.

There is a general situation where it is useful to use a log scale, and that is where there is some skew in the data. For example, where there is a mix of some very high and many lower values - such as with exponentially growing data. In that situation, the scale of the higher values can obscure the lower values.

Ten US States growth shown on a normal scale. The higher value in one state hides detail in the other states. The dashed grey lines show example exponential growth patterns.
As an example, consider the chart above which shows growth patterns in several US states. All have a exponential type growth, but the higher values in New York makes it difficult to see the direction in detail of the smaller values. The scale needs to accommodate the high New York values, but most of the "action" in this chart it at the smaller values.

Comparison of ten US states using a logarithmic scale. The trajectory lines are straightened and it is easier to see the trajectory of the states with lower values.
When the same data is presented on a logarithmic chart, all of the lines are straightened and we get a much better view of the trajectory of each state.
I can now clearly see that Michigan's trajectory appears to be heading in a slightly worse direction than New York's. I am not concerning myself with how much farther ahead on the trajectory New York is, only the direction that they are both travelling and hence making mental forecasts about Michigan's future.

Bar chart with a logarithmic scale - don't do this kids! The log scale removes the comparative power of the bar chart.
BTW, I am good with using log scales like this for lines, but don't do it for bar charts! The effect of the logarithmic scale is to remove the power the the bar chart has of aiding our understanding of the difference in magnitudes. These differences are encoded by the length of the bar, a log scale will distort it. Don't do it!

Qlik Luminary, Master's Degree in Data Analytics, Stephen Redmond is a practicing Data Professional of over 20 years experience. He is author of Mastering QlikView, QlikView Server and Publisher and the QlikView for Developer's Cookbook