By the end of the Covid19 crisis, we will all become more data literate. Every day new figures are being thrown at us and we are learning new terms and expressions such as "flattening the curve". Today I want to give attention to the word "median".
The median is a measure of central tendency - it is the middle value of a data set when it is ranked from lowest to highest (or vice versa). In other words, half the values in the data set are higher than the median, and half are lower. In a normal distribution of data, the median will be the same or similar to the mean (average). In 1955, R.R. Sokal and P.E. Hunter (who obviously had nothing better to do) measured the wing lengths of 100 house flies (in 0.1 mm). They found that when they plotted the results in a histogram - they had shown an almost perfectly normal distribution, which educators in statistics have been using since as an example of a perfect normal distribution. In this data set, the mean (average) is 45.5, and the median is also 45.5 - here's what the distribution looks like:
|Data source: Sokal & Hunter (1955)
Now let's take a look at some Covid-19 data. We are hearing a lot about the median age of death of Covid-19 victims - why not use the mean (average)? First, let's take a look at the distribution for the ages at death of 32 males and 16 females in South Korea:
|Data source: DS4C: Data Science for COVID-19 in South Korea.
You can see straight away that the shape of the histogram differs a lot from the house fly data above. This histogram tells us at a glance that more older people are dying from Covid-19 than middle aged or younger people. The mean (average) age at death is 73.6, but the median age is 75 - a good bit higher. The median gives us a clearer picture of age at death than the mean. If you use the mean as an indicator, it gives a false picture. You can see in the histogram above that the shape is skewed by one person under 40 - this one value alone lowers the mean, but has very little impact on the median.
If you would like to learn more about median values, check out my YouTube video below: