Monday, 18 February 2013

What is a "Data Scientist" and how do I become one?

I read an interesting post yesterday by Daniel Tunkelang of LinkedIn called "Data Science: What's in a name".

Essentially, a Data Scientist is someone who applies the scientific method - explore, hypothesize, test, repeat - to data.  This is probably what a lot of "QlikViewers" and other "data discovery" (Tableau, Spotfire, etc.) experts think they do.  But there are potential gaps that need to be considered.

Daniel introduces the Data Science Venn Diagram from Drew Conway in his article:

My experience is that a lot of data discovery experts come from an IT background - databases, reporting, etc.  Very much a "computer science" type of person.  They may have built up a lot of the substantive expertise in many areas of business and probably have a great set of "hacker" skills that they have built up over the years, but perhaps a lack of the core statistical skills or understandings.

This is where I found myself over the last number of years.  I have a great range of "hacking" skills and abilities to be able to get at data and get it into a form that I can use it to answer the business questions where I can apply my built-up business expertise.  But Drew Conway identifies this as "Danger Zone!" on his venn diagram - someone who knows enough to be dangerous.

I think that I recognized this in myself a while ago, so decided to take action.  Starting with just reading - I highly recommend How to Lie with Statistics by Darrell Huff - and then moving on to taking Statistics One on Coursera.  I even find that regularly listening to More Or Less from BBC Radio 4 is an education in itself.

Massive Open Online Courses (MOOCs) are a great way for people to educate themselves about the gaps in their knowledge.  They do have a commitment in time, but I think that it is worth it.  One of the great things about it is the ability to interact with others from right around the world.

I see a new one from Coursera called Introduction to Data Science is scheduled to start in April 2013.  Time to sign yourself up?

Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner.
  2. Hi Stephen,

    An interesting post, and good advise indeed. Like you, sometime ago I also identified that my math & statistical skills had gotten a little rusty and have been working hard to refresh and improve them since.

    Besides the sites you mention, these two are also interesting (though more bite-sized in content):

    - : lots of short movies on a variety of math subjects, very well explained in my opinion.

    - : not that many posts yet, but looks promising. Explains analytics concepts in an entertaining way through narratives involving skeezy characters and their lives.

    Regarding the Venn diagram, personally I would expand the danger zone a little bit towards the part that overlaps with math & statistical skills. IMHO people with only limited math & statistical skills are even more dangerous than people who know that they don't know.


