Monday, August 17, 2020

90% of Data is Crap? #analytics

In a recent book "World Wide Waste" by Gerry McGovern, he tells us that up to "90% of digital data is not used" and that we " and then don’t use" our data. He cites quite a lot of sources to base this information on:
  • Around 90% of data is never accessed three months after it is first stored, according to Tech Target. 
  • 80% of all digital data is never accessed or used again after it is stored, according to a 2018 report by Active Archive Alliance.
  • Businesses typically only analyze around 10% of the data they collect, according to search technology specialist Lucidworks. 
  • 90% of unstructured data is never analyzed, according to IDC.
  • 90% of all sensor data collected from Internet of Things devices is never used, according to IBM

McGovern goes on to say: "Cheap storage combined with cheap processing power made the World Wide Web the World Wide Waste" and that the "Web is an ocean full of crap". I don't disagree with this. My own Google Drive right now has 192,178 files, 19,853 folders, and takes up 360,216,594,133 bytes - I have no idea what all this "crap" is!

This 90% figure is based on Sturgeon's Law, which states that "ninety percent of everything is crap", and is also similar to the Pareto Principle or 80/20 rule ("80% of the effects come from 20% of the causes"). I'm sure everyone can think of situations where this applies, and it is no surprise that data is similarly regarded.

So if 90% is "crap", 10% is therefore useful. But the key thing here is how to identify the useful 10%? This is where we need skilled data scientists and analysts posing the right questions and using the right tools to find value in data. Learning how to prioritise the 10% is not easy, but it starts with questions. If 90% of your sales come from 10% of your customers - do you know who the customers are who make up the 10%? 

Asking a question is easy, but asking the right question is not.

No comments:

Post a Comment