A few days ago I wrote about some advice for Data Analysts given by Brent Dykes, writing in Forbes.com, who tells us Why Companies Must Close The Data Literacy Divide. Amongst other things he advises us all to be both sceptical and curious about any data that we analyse. Critical thinking is a very important skill for students to develop, so I often tells my class to challenge any number that they see. My most basic example is "8 out of 10 cats" prefer Whiskas (according to their owners). How did they test this? Did they set up an experiment with 10 cats or more? What was the sample size? Was there a control group? Did the owners conduct a test? Such a claim would not pass a scientific test today!
Brent Dykes offers up some questions that you can ask to challenge the source and value of data. As he states: "it is important to be able to step back and weigh other less obvious factors that may be influencing the results and its interpretation". Here are his questions, and I recommend them to students analysing any data set:
- Collection method: Could the method or way in which the data was collected influence the results?
- Credibility: How credible or reliable is the source of the data?
- Bias: Is there potential bias from either the data producer or you as the consumer?
- Truthful: Is the data being manipulated in a way—intentionally or inadvertently—that misrepresents its true meaning?
- Assumptions: Are there any implied assumptions that could be affecting how the numbers are interpreted?
- Context: Is there additional context or background information that is missing and needed to properly understand the data?
- Comparisons: If supplemental data is included for comparison purposes (e.g., period-over-period data), does it offer a fair and relevant comparison? Alternatively, is an obvious comparison missing?
- Causation: Are you potentially confusing correlation with causation, which represents a direct pattern of cause and effect?
- Significance: If the data is statistically significant, is it also practically significant?
- Outliers: Is an outlier important or is it unnecessarily skewing the overall results?
- Quality: Are you able to distinguish between data that is unusable or that which is still directionally helpful?
Source: Dykes (2017)