Thursday 20 December 2012

Powerful Preceding

The preceding load functionality in the QlikView load script is extremely powerful for transforming data.

Many developers will never use it.  They simply use SQL queries to format their data and transform it.  Others may use it - but accidentally.  They turned the "Preceding Load" check box on in the "Select" data wizard during training and have never turned it off.  They get a simple SQL statement like:

SQL SELECT * FROM TableName;

And barely notice that there is a QlikView Load statement sitting above it.

This is brilliant though - just because it is a QlikView statement.  This means that we can use the great set of QlikView functions within this statement.  From the simple Year or Month, to parse a date, to the most useful ApplyMap - one I use all the time.

Even there, we are not finished.  We can add additional Preceding Load statements above other Preceding Load statements!

Another thing that some people may not realize is that Preceding Load statements can include their own WHERE and GROUP BY clauses.  This is really powerful.

For a quick example, I want to load in a list of Father Ted episode from:

http://en.wikipedia.org/wiki/List_of_Father_Ted_episodes

I start off with the normal QlikView file wizard:


I see an immediate problem here.  There are 3 columns but there is a 4th column of data wrapped onto the next row.  The simple QlikView script to load this will look like this:


LOAD F1 as Episode, 
     Title, 
     [Original airdate]
FROM
[http://en.wikipedia.org/wiki/List_of_Father_Ted_episodes]
(html, codepage is 1252, embedded labels, table is @3);


Looking at this in a table box shows me the problem.



I need to come up with a simple rule - if the Episode field is numeric, that line has Episode, Title and Original airdate.  If not, that line has the description.  I can then re-craft my load like this:


Load
if(IsNum(Episode), Episode, Previous(Episode)) As Episode,
if(IsNum(Episode), Title, Previous(Title)) As Title,
if(IsNum(Episode), [Original airdate], Previous([Original airdate])) As [Original airdate],
if(IsNum(Episode), Null(), Episode) As Description;
LOAD F1 as Episode, 
     Title, 
     [Original airdate]
FROM
[http://en.wikipedia.org/wiki/List_of_Father_Ted_episodes]
(html, codepage is 1252, embedded labels, table is @3);


Here I have put a Preceding Load on top of the original.  I am checking if the Episode value is numeric.  If it is, I use that value for Episode and use Title for Title and Original airdate for Original airdate.  I use the Null() function to create a blank Description field.  If the Episode field is not numeric, I then grab the Previous values for Episode, Title and Original airdate and use the new text value of Episode as my description.  This leaves me with:


Now, I can see that I have the first 3 values populated on each row, but Description only on every 2nd row. I am going to make one last tweak to my code:


Load
if(IsNum(Episode), Episode, Previous(Episode)) As Episode,
if(IsNum(Episode), Title, Previous(Title)) As Title,
if(IsNum(Episode), [Original airdate], Previous([Original airdate])) As [Original airdate],
if(IsNum(Episode), Null(), Episode) As Description
Where Not IsNum(Episode);
LOAD F1 as Episode, 
     Title, 
     [Original airdate]
FROM
[http://en.wikipedia.org/wiki/List_of_Father_Ted_episodes]
(html, codepage is 1252, embedded labels, table is @3);


By adding a Where clause to the Preceding Load, I am now excluding the lines that do not have the Description.  My mind slightly boggles at this because I am not sure how I would achieve this quickly and easily using a SQL Query.

Can you think of any ways where a Preceding Load might save you multiple steps elsewhere?


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner. We are always looking for the right people to join our team.
Follow me on Twitter: @stephencredmond