Friday, 2 December 2011

Easy Ad-Hoc Analysis in QlikView

I was demonstrating server object collaboration recently (the ability for end-user to create different charts and other objects and then share these with other users) to a group of users.  For me, it is quite straightforward to create a new chart but as their eyes glazed over, I thought to myself, "there must be an easier way."

So I created a view something similar to this one:



What I have here is 2 charts and a few list boxes (main filter list-boxes in a container).  The top two list boxes are for "Measure" and "Sort Order".  These are "Data Island" values loaded like this:


Set HidePrefix='%';


Expressions:
Load 
%MeasureID, 
%Measure, 
Replace(%MeasureExpression, '##', '$') As %MeasureExpression 
Inline [
%MeasureID, %Measure, %MeasureExpression
1, Total Sales, Sum(LineSalesAmount)
2, Cost of Sales, Sum(COGS)
3, Sales Margin, Sum(Margin)
4, Sales Current Year, Sum({} LineSalesAmount)
5, Sales Last Year, Sum({} LineSalesAmount)
];


ExpressionSort:
Load * Inline [
%SortOrder
Y-Value
Default
];


Now, the expressions table here would probably not be inline.  It is more likely to be in an Excel document or in a database and then added to based on user demand.  You will note that I have to use "##" and then replace with the "$" as it doesn't work with the "$" in situ (QlikView thinks it is a dollar-expansion).  You wouldn't have this problem if the source was an external one.

So, this table is a list of the most common expressions that would be used in this particular document.  It could be 2, it could be 100.  There is no limit here.

A couple of things to note.  I have prefixed with "%" and set the HidePrefix value.  This is important because your Expression selection does not then appear in Current Selections - which wouldn't make sense to the user.

The second thing to note is that you need to include the MeasureID field because we will use that later to sort the measures.

My bar chart (although it has a fast-change to line) has a Cyclic group as its dimension so I can choose any of my main dimensions.  It has two expressions:

1:  $(=FirstSortedValue(%MeasureExpression, %MeasureID))

and

2:  if(Count(DISTINCT %Measure)=2,
     $(=FirstSortedValue(%MeasureExpression, -%MeasureID)),
     Null())

This structure allows me to pick up to 2 expressions (I check for this in the calculation condition of the chart).  If only one is selected then the 2nd expression is Null()  (I have an equivalent formula for the expression label) and QlikView treats it as if it isn't there - which is what I want.

For the sort order, I override the default group order and set as a descending expression:

    if(%SortOrder='Y-Value', $(=FirstSortedValue(%MeasureExpression, %MeasureID)), Null())

If all the other sort options (e.g. Text Value) are not checked, the null in this expression will default to the default sort order of the dimension anyway.

The pivot table (which fast-changes to a straight table) is similarly constructed except I use 3 cyclic groups as the dimensions.  This gives the users a lot of options.

Worth having a play with it and seeing what you can come up with.

BTW - this will, of course, work with Small Business edition of QlikView which does not support server objects.  Also, users selections can be stored as bookmarks.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Saturday, 26 November 2011

Delivering BI to Customers

It was great to see a project that I have been involved with being mentioned in the press:

   Electricity company pushes out BI tool to customers

It would have been nice to see CapricornVentis being mentioned in the article.

We now have several clients who deliver BI to clients using QlikView.  These clients range from:

Utilities - delivering usage information to end users to encourage behavior change, saving energy.

Banking - delivering information to their customers to allow better decision making.

Global Farm Feed Company - allowing farmers in 25 countries to measure feed efficiency.

While none of these implementations is trivial, it is great to work with a great technology like QlikView as the enabler.  And looking forward to delivering many more - especially on version 11.

If you need to deliver BI to your customers, feel free to give CapricornVentis a call.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Sunday, 6 November 2011

Denormalize for Performance

It was gratifying for me to find out, after many QlikView implementations, that most of what I was doing was close to best practice in Dimensional Modelling.  I thought that I was mostly just building key tables to fix data connection issues and the denormalizing to clean the data structure and improve performance.

If you are unfamiliar with Dimensional Modelling, it has been around for a very long time - as long as QlikView!  Ralph Kimball, one of the early data warehouse pioneers, has literally written the book on the subject and has many great articles on his website:

http://www.kimballgroup.com/html/articles.html

If you are not familiar with relational database theory, the idea of normalization is to remove redundancy.  For example, if I have a simple product table like this:

Product Table
ProductIDProductNameCategory
1BingBongBings
2BingBangBings
3BingBungBings


The category name of "Bings" is repeated several times.  In a relational database this value is actually stored several times.  From a storage point of view, it is more efficient to split the product table into a product and a category table like this:

Product Table
ProductIDProductNameCategoryID
1BingBong1
2BingBang1
3BingBung1


Category Table
CategoryIDCategory
1Bings


In this case, the numeric ID "1" is stored several times but that takes up a lot less space than the word "Bings" which is now only stored once.  It also means that if I need to change the name of that category, I only need to change it in one location, thus improving my data consistency.

This system has been implemented widely.  Some architectures that you might come across, such as SAP or Oracle Financials, have thousands of tables, all perfectly crafted to store the data most efficiently in the relational database.  Normalization works great for transactional databases like this.

For designers, the problem occurred when people started reading data out of the transactional system, especially for reporting.  Because the queries now had to hop across several joins, the performance suffers immensely.  So, they started either undoing the normalisation process (denormalising) or, worse, keeping the normalised tables but duplicating the data into other tables so that queries ran quicker - a nightmare for data consistency.  Both approaches lead to an increase in the data storage requirements meaning that databases bloat more an more - but we can just throw hardware at that issue, right?

So, what is the best thing to do in QlikView?  A colleague of mine who was recently trained asked me about this because of the exercise that was to move the category table's data into the product table, hence denormalizing.  "Why?", he asked.  "Doesn't QlikView work just as well either way?".

There are a couple of reasons as to why you might do this.  And a couple of reasons why you might not.

Reasons for:


In one of his articles, Kimball suggests that such "snow-flaking" (leaving the category out of the product table) "compromises cross-attribute browsing performance".  Not only that, it "may interfere with the legibility of the database".

So that gives us 2 reasons:

- Performance
- Improved schema readability

And I will add a 3rd:

- Data size.

In the QlikView underlying data cloud, there will be an association between the product fields and the CategoryID field.  There will also be an association between the CategoryID field and the CategoryName field.  If I move the CategoryName into the product table (either map or join), I can then remove the CategoryID field and hence remove a redundant set of associations.  The CategoryName is not duplicated in QlikView because the columnar database still only stores it once.

Reasons against:

- Performance
- Improved schema readability
- Data size

I'm sure that you will be thinking that there is some sort of error here.  How can I have the same set of reasons "against" as I did "for"?

It depends!  It depends really on the number of records in each table and the number of fields.  For a relationship like product and category, there is probably only a few fields in the category table and it can really make sense to denormalise (it almost always make sense to do this for a table with only 2 fields - using ApplyMap).  However, with a couple of tables like, say, Order and Calendar, where Order may have millions of records and Calendar could have ten or more fields, it makes sense to keep them apart and just have the key association on the DateID field.  If you did join them in, it would mean that you could drop the DateID field but you would then have associations maintained between each field in the Order table and Year, and Month, and Day, and Week, etc., etc.  Many more associations being maintained than before and increasing your data size and possibly reducing your performance.  It may also become difficult to see the date fields in the schema whereas they are plain to see in their own Calendar table.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Friday, 28 October 2011

Color coding RAGs

I have been thinking a lot about color and its use in data presentation recently.  Color use is something that is especially of interest to me as I find myself to be one of the 8-10% of the male population who suffers from color blindness.  In my case, I have the most common difficulty which is red-green.  Essentially what happens is that different shades of red and green become more brown to me and therefore two distinct colors, that have a similar saturation and brightness, may actually appear to be the same to me.

One area where this is particularly significant is with RAGs - Red, Amber, Green indicators which will usually indicate Bad, not so bad, and good respectively.


In this example, I have shown pure red, amber and pure green.  For me, while they are relatively easy to spot here, the green and amber colors can look very similar on smaller displays or where they are not right together (like in a traffic-light or beacons on a page).

I always like to use Stephen Few as a reference and he has discussed this in his books as well as this useful article:  Practical Rules for Using Colors in Charts.

If we think about it, the situations of good, not so good, and bad are actually a sequence so would fall under Stephen's rule 6:


When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter
colors for high values.


So, why should we have to use red, amber and green?  Perhaps another solution would be just to use shades of red:


Here we have an almost white color representing good (and white would be a good option too as most people are not looking for good, just for the bad).  The range gets darker along with the situation.

Of course, one of the problems that might be raised with you is that people "intuitively" understand RAG and that one needs to conform to the "norm".

This thinking was fresh in my mind while recently passing through a UK airport.  They have the normal type of gate signs which they have color coded for 4 statii which are to advise passengers on the status of boarding their flight: nothing happening, go to gate, plane is boarding, gate is closing.


The "nothing" status is represented by no color - it matches the black background and that is probably a good choice.  The "closing" status is dark red and this is also a good choice because that indicates (in "western" society anyway) danger and the need to get a move on.  However, the "go to gate" is coded in amber and the "boarding" is green.  This doesn't appear very logical to me as, if I was some way from the gate, "boarding" would be something much more urgent than "go to gate".  I am sure that there is a logical explanation, but this doesn't come across to me.

Here, I think, would be a good idea to use something like white background for "go to gate" and something like amber or yellow for "boarding" - they are logically closer to the red of "closing" and should indicate to the passenger that they should get to the gate asap.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Thursday, 15 September 2011

Setting default display format

Following on from the last post on Setting default sort order, there are other things that we can load up in advance that can be of assistance.

For example, I have a date field called OrderDate which is coming in with a default format of DD/MM/YYYY hh:mm:ss but the time portion has no use to me because it is just 00:00:00 for all the dates.

Now, I could load this using a simple:

   ...
   Date(OrderDate, 'DD/MM/YYYY') As OrderDate, 

   ...


This is what I would normally do.  However, what if the data is coming from a very large QVD and I would like to keep the optimized load.  Applying a format function like this will prevent the load being optimized.  So, I should do the format in the script that creates the QVD - but what if I am not controlling that?

Simple - just as with the load order, the default format of a field is set when you first load it.  We can create a simple load like this:

   FormatTable:
   Load
Num(0, '#,##0.00') as LineSalesAmount,
Date(Today(), 'DD/MM/YYYY') As OrderDate
   AutoGenerate(1);

Now, I go ahead and load my Order table (note that I have included a format for the sales value too - you can have as many as you need).

Sometime after I have loaded the Order table, I need to drop the Format Table:

Drop Table FormatTable;

I hope this saves you a bit of time.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Setting default sort order

Sometimes we load some categorical data (e.g. Country) that we want to display in a specific order other than the default (alphabetical for example).  Here's an example that you might have come across:

  High, Medium, Low

An interesting thing about QlikView is that it remembers how things are first loaded.  We can use this little trick to load a temporary table with just the field that we want in the order that we want.  Then, even if we drop this table later, QlikView remembers that order and we can use the "Load Order" sort option.

Here is an example.  If I had a sales table like this:


Sales:
CrossTable(Year, Sales)
Load * Inline [
   Country, 2009, 2010, 2011
   Ireland, 200, 300, 400
   Scotland, 100, 150, 300
   Wales, 500, 300, 400
   England, 800, 700, 800
];


But, rather than the alpha-sort of England, Ireland, Scotland, Wales, I would like to have Wales, Ireland, England, Scotland, then I can load a table just before the Sales table like this:


SortOrder:
Load * Inline [
   Country
   Wales
   Ireland
   England
   Scotland
];


Then, sometime after I have loaded the Sales table, I can drop the SortOrder table:

Drop Table SortOrder;

Now, I can use the "Load Order" sort:




Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Monday, 15 August 2011

Irish crime stats. Sleeping easy in my bed?

I was interested to see the recent announcement of the quarterly Irish crime statistics for Q2 of 2011 by the Central Statistics Office.  The press surrounding this latest information hailed the quarter-on-quarter reduction in crime across all categories (for example, from the Irish Examiner).

So, I thought that I would grab the CSO figures and run them into QlikView.  You can download the resulting document from QlikCommunity.



It does indeed appear that the crime figures are down across all categories.  Of course, one swallow does not make a summer and neither does a change between one period and another indicate a trend.  This is where a control charts come in most usefully - is there a downward trend?


Looking at the national figures in a control chart, it does look like there is a steady downward trend.  The latest figures have dropped below the control limits (my limits are based on the 2006-2008 data).  There is a certain seasonality about the crime figures so I have added a 4 period mean here.  Even that appears to be dropping.  Looks like the police are doing a great job.

I thought that I might have a delve around in the figures and thought about which if the different crime categories would be more of a concern to me.  For example, I am not terribly worried about public order offences, but I am a little more worried about my car being broken into.  I don't worry about homicide (Ireland has one of the lowest rates in the world) but I do have concerns about being assaulted.  There are some crimes that are more "personal" than others.


If I break down the categories into the ones that I am more concerned about (assaults, burglary, robbery, theft, sexual assault), I see that these represent 45% of the total figures for the whole data set.  Interestingly, this is now 48.6% for just 2011!


When I look at these categories in the control chart, I see that these are not decreasing.  They are steadily above the mean line for the last 16 quarters.  While some have associated increases with the recessionary times, it appears that the increase started in 2007 - a year before Lehman Brothers collapsed.

So, while the GardaĆ­ (Irish police) are doing a good job in reducing crime in some areas, there are some that perhaps need more attention.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner