Wednesday, May 16, 2018

What do you do when your Data Analysis project is not working? #analytics

We have all suffered project "block" at some stage - we reach a point where we don't know what to do next, or something we are trying to do just doesn't work the way we would like it to. It is common  in Analytics for students to understand how a basic data set used in class works, perhaps also understand and get the slightly more complicated example used in a tutorial - but when they try to use a new data set it doesn't fit neatly into what was covered in class. I carefully select data sets (many are recommended by text books) to use in class. They work for me and are often perfect for explaining concepts such as regression and principal component analysis. But what happens when students try to use their own or third party data in a project? Believe it or not, I do get students who often say to me "I can't find a data set", but what I really feel they are saying is "I can't find a data set that does what I want it to do".

Image source:

Jonathan Nolis, writing in Medium, poses "So your data science project isn’t working". He wonders what to do when you try to "predict something no one has predicted before", and "optimize something no one has optimized before", or "understand data that no one has looked at before". 

Sometimes the data you want just doesn't exist in the format you need, or is inaccessible behind a firewall or paywall that students can't afford to pay for. As Nolis says "If the data isn’t there then you can’t science it". Sometimes even after a suitable data set is found, the analysis leads to very little insight or a model just doesn't work. Students often forget that a "no" or a negative answer can also be useful. Either a data set will tell you something useful or it won't - go figure.

Perhaps you have asked the wrong or inappropriate question - mistakes happen. Nolis advises "Flops will happen to you and it’s okay! You can’t avoid them, so accept them and let them happy early and often". Most of the time we cannot control what data is stored or made available by an organization, nor can we make it give us the insight we want. There is no guaranteed pot of gold at the end of the Big Data rainbow. If there is value to be found, it will take effort to find it.

No comments:

Post a Comment