Understanding the implications of variance and its effects on insights is critical. However, too often visualizations and metrics that are used for reporting purposes—which executives rely on for understanding the health of their business—fail to account for or represent variance.
The most common example of this is reporting a mean, without also reporting standard deviation.
To illustrate the criticality of understanding variance, I created three random data sets that all have a mean of 10 but have different standard deviations:
I then created two additional files from ‘high_variance.csv’—one that sorts values in ascending order, and another that sorts the values in descending order. There are a total of five data files used to generate the visualizations used in this post.
Consider each file represents one month of gross revenue from a lemonade stand. We want to evaluate how our lemonade stand is performing. The most common (and misleading) way to visually support this evaluation is to compare monthly revenue using a bar chart:
It’s not very interesting, largely because we have aggregated away the variance by visualizing the total gross revenue for each month, only reporting the average daily revenue.
There are a number of data patterns that, when aggregated, will produce visualizations like the one above. Too often, this is where the analytic inquiry ends. Revenue is stable, consistent, flat—it is impossible really to interpret the visualization above in any way that implies action or is even suggestive of what to do next.
The visualizations below of daily gross revenue all produce the same aggregated metric (monthly gross revenue), but each tells a very different story with different business implications for our lemonade stand. However, if we rely on summary metrics in a bar chart to initiate our analysis, we miss out on potential insights—insights that can be used to draft hypotheses and test them, pivot in a business plan or otherwise act in a data-driven way.
Next week's post will provide some more effective ways of visualizing data that capture variance.
Andrew Malinow, PhD, leads the Data Science team at Zylotech, where he leverages his background as a Cognitive Psychologist, statistical expertise and passion for surfacing actionable insights from large, messy data sets. At home he loves to spend time with his wife and 4 kids, doing anything outdoors, and tending to his ever-growing flock of chickens on his farm in Pomfret, CT.
If you liked this post, check out our other blog post on how to measure and predict customer churn.