Last year, I attended Performance Measure Blueprint training with Stacey Barr. One of Stacey's recommended visualizations was the statistical control chart. Stacey also recommended "Understanding Variation: The Key to Managing Chaos" by Donald J. Wheeler in this regard.
Essentially, the control chart introduces some solid statistical thinking into the analysis of data. Quite often, people might have a bipolar view of variation in data - things are up on last month, everything is good, things are down on last month, everything is bad and we need to start changing the process to make sure that things are up next month! The control chart allows us to see that these up/down variations are part of a statistically expected change and, unless they are outside of the control lines or, repeatedly above or below the mean.
An after-dinner discussion recently, on the probability of having a good/bad summer or a cold/mild winter, led me to think of applying control charts to climate data for Dublin.
I was able to obtain historical data from European Climate Assessment & Dataset and more recent data from MET Éireann (the Irish meteorological service).
The QlikView document that I created is available to download from QlikCommunity and will shortly be available to view on share.qlikview.com.
Some interesting analysis!
There is a perception amongst Irish people (especially after a very unusual period of snow last December) that winters are getting much colder and hence we are likely to have a cold winter again this year.
Looking at the data since 1881 (control is based on average between 1961 and 1990 - this is the comparison period used by MET Éireann), it appears that the Dublin climate has been up and down through the 150 years but has always been within the Control - except for 2010. The average temperature for 2010 made a huge dive south and is now out of control. It would appear that yes, the winters are getting colder and we should all invest in snow chains and shovels that we never needed before.
Looking at the data by month, it appears to be even more apparent. The last couple of winters both have troughs outside the control. I do need a new woolen coat - I might get it cheaper if I buy it now.
An interesting thing about the two troughs in the monthly chart though, they are both representing months that are in one year period - 2010. 2010 had 2 of the coldest months for a long time (December was the coldest on record) so that would be why the year as a whole was so far south of the control line. But those 2 months were not concurrent - they were at either end of the year. Maybe I don't need to get that coat yet?
Perhaps, if I want to look at "winter", I should look at the data in seasons instead. So, in my QlikView document, I created a dimension called Season and one called YearSeason - breaking the data into "traditional" seasons of winter = Nov-Jan, spring = Feb-Apr, etc.
Hmm. This data is every winter from 1960 to 2010 (which includes January 2011). This is quite interesting because it shows that, despite having the coldest December on record, the winter of 2010 was still inside the control. In fact, other cold winters such as 1962 and 1976 also are within the control.
So, how cold will the winter of 2011 be? I can't say for sure but, I do have a good level of statistical confidence that it will be less severe than 2010. The system is still in control and the historical evidence is that it will stay in control.
Maybe I don't need those snow chains - not just yet anyway.
Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner