Tuesday 6 December 2011

Saving Space by Nulling Zeros

If reducing the memory requirement of your QlikView document is important, this tip might help shave a few bytes that could add up to a significant performance gain.

There are several documents on performance improvement that will suggest that you should shed fields that do not contain data that will be used.  My tip is to look at fields that has data that you do use and shed the data in those that you won't!

One example would be a numeric field that may contain a lot of zeros.  If the only use of this field is in a sum, then there is no different in that zero being there or not.  The only difference would be in a count or avg.  Therefore, if you have a lot of zeros in a field that will only be summed, consider transforming them to null() in your load.

   ...
   if(Field1=0,Null(),Field1) as Field1,
   ...

If you need to use that field in a horizontal calculation, you can use the Alt function:

   Alt(Field1, 0) + Alt(Field2, 0)

Another type field that we are often told to look at getting rid of is key fields.  They are often highly unique so will take up the most space in the dataset.  It is often straightforward to remove them, but what if they are needed?  Perhaps as a key link in a 1:M relationship.

For example, say you had an Account table with a key field called AccountID.  There is also an AccountHistory table which associates on AccountID.  You can't readily join the two tables so you need to keep the key field.  However, if there is not an entry for every Account in the AccountHistory table, you don't need to keep the value for that particular AccountID in the field!  Null it!

I might create a mapping table:

   KeyMap:
   Mapping Load Distinct
      AccountID, AccountID
   Resident AccountHistory;

Then, in my Account table load:

   ...
   ApplyMap('KeyMap', AccountID, Null()) As AccountID,
   ...

Now, this is not going to be significant over a million records.  However, over a hundred million, you will see a difference.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Sunday 4 December 2011

The Right Background

I have talked about color choice recently, but whatever choice you make for the colors you are going to use, the choice of background color is critical.

You may have seen an image like this before:


If you haven't, the question is - which of the 5 squares inside the colored box is the same color as the square outside the box.  The answer is that all of the 5 squares are the same color.  The only difference is the background.

The rule, therefore, is that when choosing a background, it must be a single color.  You can't use any type of "gradient" effect or, even worse, use an image as your background.

So, what is the best color to use for the background?

In the world of software development, most IDEs (Integrated Developments Environments), such as Microsoft Visual Studio, have a default of a white background for the code and darker contrasting text.  There have been many debates on the use of white as a background.  The main one being that the white is wearing on the eye.

Some prefer black or dark blue with lighter contrasting text.  However, there is a visual effect that works against this - well known by photography experts - called Halation.  This is the effect of brighter colors seeming to spread when on darker backgrounds and hence look a little blurred.  This can cause eye strain.

Some suggest that softer grey backgrounds.  However, the issue here is that the softer contrast can have an effect on interpretation of color coding in scripting, and it is difficult to find a right choice of usable colors to offset this.  This is a feature that increases productivity in developers, so any reduction in its efficacy is not a welcome.

So, it appears that white might just be the right background color for developers, but this post is not about the best color to use for developing - it is for use in the display of quantitative data.

In an article called "Choosing Colors for Data Visualization", Maureen Stone, of StoneSoup Consulting and formerly of the Xerox Palo Alto Research Center, tells us that most color palettes are designed to be printed on white paper so using a white background is the right choice in digital form.  If your presentation is going to end up on paper, then designing on a white background is the right choice.

Further, Maureen tells us that there are perceptual advantages in using white as the human visual system is designed to adapt its color perception relative to the local definition of white and giving a white background gives us a stable definition of "white".


As a last word on this subject, I looked at six of the worlds most popular websites:


All use a lighter background and darker foreground  (not necessarily black, blue/purple are popular) and, with the exception of YouTube (which has only changed recently), that color is white.  These companies have the resources to spend a lot of money on choosing the right background colors.

Who am I to disagree with them.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Friday 2 December 2011

Easy Ad-Hoc Analysis in QlikView

I was demonstrating server object collaboration recently (the ability for end-user to create different charts and other objects and then share these with other users) to a group of users.  For me, it is quite straightforward to create a new chart but as their eyes glazed over, I thought to myself, "there must be an easier way."

So I created a view something similar to this one:



What I have here is 2 charts and a few list boxes (main filter list-boxes in a container).  The top two list boxes are for "Measure" and "Sort Order".  These are "Data Island" values loaded like this:


Set HidePrefix='%';


Expressions:
Load 
%MeasureID, 
%Measure, 
Replace(%MeasureExpression, '##', '$') As %MeasureExpression 
Inline [
%MeasureID, %Measure, %MeasureExpression
1, Total Sales, Sum(LineSalesAmount)
2, Cost of Sales, Sum(COGS)
3, Sales Margin, Sum(Margin)
4, Sales Current Year, Sum({} LineSalesAmount)
5, Sales Last Year, Sum({} LineSalesAmount)
];


ExpressionSort:
Load * Inline [
%SortOrder
Y-Value
Default
];


Now, the expressions table here would probably not be inline.  It is more likely to be in an Excel document or in a database and then added to based on user demand.  You will note that I have to use "##" and then replace with the "$" as it doesn't work with the "$" in situ (QlikView thinks it is a dollar-expansion).  You wouldn't have this problem if the source was an external one.

So, this table is a list of the most common expressions that would be used in this particular document.  It could be 2, it could be 100.  There is no limit here.

A couple of things to note.  I have prefixed with "%" and set the HidePrefix value.  This is important because your Expression selection does not then appear in Current Selections - which wouldn't make sense to the user.

The second thing to note is that you need to include the MeasureID field because we will use that later to sort the measures.

My bar chart (although it has a fast-change to line) has a Cyclic group as its dimension so I can choose any of my main dimensions.  It has two expressions:

1:  $(=FirstSortedValue(%MeasureExpression, %MeasureID))

and

2:  if(Count(DISTINCT %Measure)=2,
     $(=FirstSortedValue(%MeasureExpression, -%MeasureID)),
     Null())

This structure allows me to pick up to 2 expressions (I check for this in the calculation condition of the chart).  If only one is selected then the 2nd expression is Null()  (I have an equivalent formula for the expression label) and QlikView treats it as if it isn't there - which is what I want.

For the sort order, I override the default group order and set as a descending expression:

    if(%SortOrder='Y-Value', $(=FirstSortedValue(%MeasureExpression, %MeasureID)), Null())

If all the other sort options (e.g. Text Value) are not checked, the null in this expression will default to the default sort order of the dimension anyway.

The pivot table (which fast-changes to a straight table) is similarly constructed except I use 3 cyclic groups as the dimensions.  This gives the users a lot of options.

Worth having a play with it and seeing what you can come up with.

BTW - this will, of course, work with Small Business edition of QlikView which does not support server objects.  Also, users selections can be stored as bookmarks.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner