Friday, 29 July 2011

Out of Control? Dublin Climate Data

Last year, I attended Performance Measure Blueprint training with Stacey Barr.  One of Stacey's recommended visualizations was the statistical control chart.  Stacey also recommended "Understanding Variation: The Key to Managing Chaos" by Donald J. Wheeler in this regard.

Essentially, the control chart introduces some solid statistical thinking into the analysis of data.  Quite often, people might have a bipolar view of variation in data - things are up on last month, everything is good, things are down on last month, everything is bad and we need to start changing the process to make sure that things are up next month!  The control chart allows us to see that these up/down variations are part of a statistically expected change and, unless they are outside of the control lines or, repeatedly above or below the mean.

An after-dinner discussion recently, on the probability of having a good/bad summer or a cold/mild winter, led me to think of applying control charts to climate data for Dublin.

I was able to obtain historical data from European Climate Assessment & Dataset and more recent data from MET Éireann (the Irish meteorological service).


The QlikView document that I created is available to download from QlikCommunity and will shortly be available to view on share.qlikview.com.

Some interesting analysis!


There is a perception amongst Irish people (especially after a very unusual period of snow last December) that winters are getting much colder and hence we are likely to have a cold winter again this year.

Looking at the data since 1881 (control is based on average between 1961 and 1990 - this is the comparison period used by MET Éireann), it appears that the Dublin climate has been up and down through the 150 years but has always been within the Control - except for 2010.  The average temperature for 2010 made a huge dive south and is now out of control.  It would appear that yes, the winters are getting colder and we should all invest in snow chains and shovels that we never needed before.


Looking at the data by month, it appears to be even more apparent.  The last couple of winters both have troughs outside the control.  I do need a new woolen coat - I might get it cheaper if I buy it now.


An interesting thing about the two troughs in the monthly chart though, they are both representing months that are in one year period - 2010.  2010 had 2 of the coldest months for a long time (December was the coldest on record) so that would be why the year as a whole was so far south of the control line.  But those 2 months were not concurrent - they were at either end of the year.  Maybe I don't need to get that coat yet?

Perhaps, if I want to look at "winter", I should look at the data in seasons instead.  So, in my QlikView document, I created a dimension called Season and one called YearSeason - breaking the data into "traditional" seasons of winter = Nov-Jan, spring = Feb-Apr, etc.


Hmm.  This data is every winter from 1960 to 2010 (which includes January 2011).  This is quite interesting because it shows that, despite having the coldest December on record, the winter of 2010 was still inside the control.  In fact, other cold winters such as 1962 and 1976 also are within the control.

So, how cold will the winter of 2011 be?  I can't say for sure but, I do have a good level of statistical confidence that it will be less severe than 2010.  The system is still in control and the historical evidence is that it will stay in control.

Maybe I don't need those snow chains - not just yet anyway.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Tuesday, 12 July 2011

Is it time to replace the Quick Chart Wizard?

I have been contemplating the Quick Chart Wizard in QlikView recently.

Creating charts in QlikView is very easy once you have a reasonably good grasp of the properties and where everything is.  However, for "business" users, this is not always simple.  Enter the Quick Chart Wizard.  My experience, however, is that the output of the Quick Chart Wizard is almost never an immediately usable chart - you almost always have to then go into the properties to clean it up.

There are, I believe, only a couple of options that need to be set in the Wizard to make it produce immediately usable charts almost all the time.  And while we are at it, why not do away with a "Wizard" at all and just have everything visible on one page with a preview of what you are building.

Here is my proposed solution:




What do you think?

Design reasoning:

Dimensions - only 2 (instead of 3 in current Wizard).  Vast majority of charts will have 1 dimension.  3 is a rarer requirement so remove from Quick Wizard.

Expression - Only one option (as per current Wizard).  Added option to label the expression.

Chart options - reduced to 4 of the most common used (removed Pie and Guage from existing Wizard).  Give options on Orientation and Mode (which grey out if not relevant)

Titles - not available in current Wizard but something that users will almost always want to change.

Sort by:  I never create a Bar chart that I don't specify a sort order on, so this should be available as an option.

Dimension Limitations:  With bar charts, this is something that I probably do more often than not.

Preview Window:  Show the user exactly what they are building as they are building it.

I am really interested in all opinions on this.  Does anyone think that the existing Wizard has value?

Join the discussion on QlikCommunity.



Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Alternative to Grid-Style Multi-Boxes in Ajax

I quite like the "grid style" format for multi-boxes.  In a limited width, I think that it is a better format for the drop-downs because you get the full width of the control rather than the truncated width of just the drop-down part in the standard view.



A problem for me is that "grid style" does not render at all in Ajax (I can't understand why!)

I picked up an interesting tip yesterday from QlikView's Kenneth Lewis.  How about using a Container control with multiple list boxes?  I thought about it and then tried it out.  I think that it works quite nicely.  By adding several list boxes and then setting the Appearance property of the Container to "dropdown at top", I get a result that works in both QV and Ajax.


It is not a perfect replacement for the grid-style multibox but, in some ways, it is better.  I like the fact that you have the list box icons available and it is easier to see the Green/White/Grey.

Have a play with it and see what you think.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Thursday, 30 June 2011

Nobody said it was easy ... Configuring LDAP

Recently I was in discussions with a client where they wanted to authenticate users against a Netscape LDAP server.  Of course, I told them that this would be no problem because QlikView 10 now comes with a Configurable LDAP directory service connector.  Easy peasy!

The client was satisfied and look like they will go ahead and order the new licenses (yay!) and hardware to implement it on.

Gulp!  I still hadn't played with this LDAP connector so I though I better get some handle on it before the implementation went ahead.  Of course, there is bountiful documentation on it in the server reference manual (well, just under 2 pages).  Luckily I had recently met the genius that is Michael Robertshaw, a Solutions Consultant for QlikView in the Netherlands.  With some guidance from him, I was able to get up and running.

I found a distribution of OpenLDAP for Windows and installed it on my PC.  Once I had an idea of what I was doing, it was actually quite straightforward (Note - choose BDB database, not LDAP during setup and all will be well!)  Once I was up and running, I used a tool called LDAP Admin to add new users and groups.

Now to the QlikView side of things.  I had thought that this should be quite straightforward.  It wasn't initially, but with the help of Michael I got there in the end.  The important thing to know is that LDAP and Active Directory LDAP are not necessarily the same thing.  There are a lot of differences.  Unfortunately, the defaults for the DSP entries are AD ones so these needed to change.  Also, the way that the server is specified is not as easy as I thought it should be, but easy to get right once you know.

I had thought that the correct Path entry for my server should have been:

   LDAP://myserver

This is what you will see for AD.  However, I also found that you need to specify the base search path in this string for me, with my settings of "mydomain.com", this was:

   LDAP://myserver/dc=mydomain,dc=com

Once I added this and the correct username and password (note that the username needs to be the fully qualified name, e.g. cn=Manager,dc=mydomain,dc=com), I was able to think about the DSP settings.  These are accessed via the pencil icon beside the password box.  Now, this is where the Active Directory defaults will need to be changed.  The defaults are:

   Account Name       : sAMAccountName
   Directory Label    : DSP1 *
   Display Name       : name
   Distinguished Name : distinguishedName
   Group member       : 
   Group object class : group
   Id Property name   : sAMAccountName
   LDAP Filter        : (&(!(objectclass=computer))(objectGUID=*))
   User member of     : memberOf
   User object class  : user

* the directory label is how you will identify users to QlikView, e.g. in this case - DSP1\username.

With advice from Michael and some testing, I found the best settings for OpenLDAP (other LDAPs may be different - use the LDIF information to help you work it out) to be:


   Account Name       : cn
   Directory Label    : MyDomain
   Display Name       : displayName
   Distinguished Name : dn
   Group member       : memberUid
   Group object class : posixGroup
   Id Property name   : cn
   LDAP Filter        :        (yes, this is blank!)
   User member of     : memberOf  **
   User object class  : posixAccount


** memberOf (used to check users are in a group) will only work in OpenLDAP if the memberOf overlay is enabled and working.

With these settings, I am now able to use the UserManagement in Enterprise Management Console to query the LDAP directory.

I tested that it was working by creating a new document on my server and used DMS to give access to a user called MyDomain\jdoe (who was a valid user in my LDAP).  I then generated a ticket for the user and connected successfully to the document.





Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Tuesday, 14 June 2011

AND Mode in Set Analysis

Was on QlikCommunity recently and answered a query by Lawrence Chapman - Like for Like

When Lawrence replied, he said that he was looking for something quite reasonable in his chart - only sites that had records for all the 4 years.

I was thinking that this should be quite straightforward and then had to have another think!  The thing about QlikView set analysis is that it is just a set of selections.  The default selection mode in a field in QlikView is "OR".  So, if I select all the 4 years, I will still select all the sites because all of them are in at least one year.  Creating a set based in the year field wouldn't work for what Lawrence wanted.

I thought about how I would select those sites if I was just doing it in the layout (which is a great strategy for designing a set).  The only way that I can think of doing it is to have a selection box using QlikView's AND-mode.

Now, for AND-mode to be enabled, there are a certain number of criteria that need to be fulfilled.  Basically, the field needs to be in a table of only 2 fields and it is the 2nd field (the first is the key field associating the rest of the data set).  Also, the table needs to be loaded with the DISTINCT keyword to remove duplicates (have a look in the help file for more info).

Let me create a quick example to show this in action.  Here is a sales table:


Sales:
LOAD * INLINE [
    Customer, Year, Sales
    Customer A, 2009, 111
    Customer B, 2009, 111
    Customer C, 2009, 111
    Customer A, 2010, 222
    Customer B, 2010, 222
    Customer C, 2010, 222
    Customer A, 2011, 333
    Customer C, 2011, 333
];


You will note that only Customer A and Customer C exist in all years.  However, if I load this data, there is no set that I can create that shows me just the totals for those 2 customers based only on the fact that they have sales in each of the years.

So, I will create an AND-mode field:


SalesYear:
Load Distinct 
Customer,
Year As SalesYear
Resident Sales;


Once loaded, I add this field to the layout and select the "AND-mode" option in the List Box properties (this option will be grayed out unless the field fulfills the criteria above).

N.B.  At this stage, just after setting "AND-mode" in the list box, you may need to just run a re-load to set the internal flag that makes everything else below work.

Now, I can create a set expression like this:

Sum({<SalesYear={2009,2010,2011}>} Sales)

And it will only show me the sales for Customer A and C.

I could make this more generic by limiting the SalesYear to the currently selected Years:

Sum({<SalesYear=P({$} Year)>} Sales)


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Friday, 20 May 2011

It's the discovery, stupid!

QlikView is Qool.  It looks great.  It is easy to create great looking content.  It is fantastically easy to create content and distribute to multiple users.  What could possibly be bad?  I love QlikView, I think it looks great.

I was surprised recently to read Jay Jakosky's blog entry entitled, "Can a humble chart object get some love"

Amazingly, after 5 years working with QlikView, this was the first time that I have heard anyone suggest that QlikView displays are anything but fantastically beautiful.  Jay makes some valid points about displays and about how QlikView's competitors do things.  The real point is that it's not about the display!

You see, 99 times out of a 100, you don't need to think about displays like these.  For the other 1 time, you can probably do it another way - probably a much simpler and more effective way.

It's not about the display.  It's not about the bells and whistles.  It is about the effective presentation of lots of data in an intuitive and easy to use fashion that users love to use.  It is about allowing users to navigate around their data so that they can find real answers to real business problems real quickly.

It's the discovery, stupid!


P.S.  I'm not really calling anyone stupid just because they have a different opinion to mine - just paraphrasing the famous Clinton camp slogan from the 1992 US Presidential election.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner

Tuesday, 17 May 2011

Rules for creating a Key/Link Table in QlikView

A key/link table is frequently required in QlikView to resolve Synthetic Key or Circular Join issues.  It can also help to tidy  up a schema that has tables all over the place.  The goal is to create a star (and sometimes snowflake) schema with dimension tables connected to a central key table.  Unlike in classical data warehousing, the central table doesn't often have the measures - they generally stay in the dimension tables.

There are 3 rules for creating a key table.  The first 2 are very straightforward - the last is where you need to use your skill and judgement to create the right result.  So, here are the rules:

1.  All tables should have a primary key.  If a table doesn't have one unique key, derive one using a function like autonumber or autonumberhash256.

2.  Break all the existing joins by renaming foreign keys (e.g. in the Orders table, rename CustomerID to O_CustomerID).  For a "pure" star schema, there should be no QlikView Associations remaining and all the tables should be standalone.  From a pragmatic point of view, it is fine to leave some hierarchical tables associated (e.g. Product to ProductCategory) to have a more "snowflake" schema.

3.  Use a mixture of Concatenate and Join to generate the Key table using the Resident data.  You will load the previously renamed foreign key with the correct name so that they connect to the right dimension table (e.g. ... O_CustomerID As CustomerID).

For an example, if I have a simple structure with Customer, Calendar, Order, OrderDetail and Product.

Step 1 - Customer, Product, Calendar (DateKey), and Order already have a primary key.  In OrderDetail I will create an ID from OrderID and LineNo (we will do a bit of step 2 while we are at it):

   ...
   AutoNumberHash256(OrderID, LineNo) As OrderDetailID,
   OrderID as OD_OrderID,   // rename order foreign key
   ProductID as OD_ProductID, // rename product foreign key
   ...

Step 2 - Customer and Product are not an issue because they don't have a foreign key.  I already renamed my foreign keys in OrderDetail so I need to attend to Order:

   ...
   OrderID,
   CustomerID as O_CustomerID,
   DateKey as O_DateKey,
   ...

Now all my links will be broken.

Step 3 - Now I load my key table.  I will begin with data from the Order table:

   Key:
   Load 
      OrderID,
      O_CustomerID As CustomerID,
      O_DateKey as DateKey,
   ...
   Resident Order;

Now Join in the Product and OrderDetail keys from the OrderDetail table.

   Join (Key) Load
      OD_OrderID As OrderID,
      OD_ProductID As ProductID,
      OrderDetailID,
   ...
   Resident OrderDetail;

Now I have a key table which will connect all my detail.  I can extend this by Joining or Concatenating additional tables.  For example, I could concatenate data from Purchases that also has Date and ProductID information.

Any of the previously renamed foreign keys (e.g. O_CustomerID) can actually be dropped now - their information is encapsulated in the key table so that keeping them is just duplicating data.  I might choose to hang onto them for a while, just to test my relations, but best practice will be to remove them.


Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner