Correlated Data Could be Costing You Millions

analytics data stakeholders Apr 30, 2023
Photo by John Barkiple on Unsplash

Stakeholders frequently ask for data from analysts without providing context. Worse, with the increase in self-service tools, stakeholders are drawing their own conclusions with data, even though they lack the proper understanding and training to use the data. These conclusions are then used to make multi-million-dollar business decisions, even though they are wrong.

All too often I see scenarios where an analyst, a stakeholder, or an every-day person takes accurate data and uses it to explain causation when the data is merely correlated. I’ve also seen plenty of cases where someone doesn’t realize the nuances in the data that could have a significant impact on the interpretation of the data. This could be a system outage that few people are aware of or a situation such as Covid-19, which caused massive abnormalities in businesses and data. While Covid-19 is still somewhat fresh in some people’s minds today, many won’t immediately triangulate various pieces of information that aren’t all neatly put together on a chart.

I remember one specific example from many years ago when I was working at a small tech company. This company developed a desktop software application that seemed to be successful, but roughly 60% of new users were uninstalling it within 24 hours, and 80% within the first 60 days. Given the high abandonment rates and other factors, we didn’t consider a user to be a “retained user” until they crossed the 60-day mark. Obviously, this level of abandonment wasn’t good for business, so the product team needed to come up with a solution.

To solve our abandonment issue, or at least test this theory, the team decided to build the equivalent of an app for the iPhone. The theory was that our main desktop product had issues that were difficult, if not impossible to solve, and this was causing users to uninstall the product. This seemed plausible because we know that few people want to use what they perceive as a completely broken or partially broken product. So, the team built a new app that was designed to overcome the challenges our desktop product in hopes that customers would install the desktop product, then install the new app, find that everything worked extremely well, and retain the desktop product.

Once the new app was built, we rolled it out to customers to determine if this new app was the solution to the problems found in our desktop product. Next, we compared customers that downloaded our desktop product (without using the app) to customers that downloaded the desktop product and used the new app. 


Bad Assumptions with Data

After waiting 60 days to reach to our defined “retained user” threshold, the product team asked for an analysis, and I put a few slides together to present to the team. One of the first slides had a graph that showed that customers with the new app had a significantly higher retention rate than those that didn’t install the app. But before I could get another word out of my mouth, the product managers were jumping for joy thinking that they solved world hunger and that new app was the solution to all our problems.

Problem solved! We had a lower uninstall rate when customers used our new app, so it’s time to invest in the app and build more features. But not so fast.  While it was true that the customers that had use the new app had a much lower uninstall rate, the data may have only been correlated, not necessarily causal. Meaning, did using the new app cause customers to keep the desktop product (and not uninstall) or, was there simply correlation between retention of the desktop product and the new app?

To determine this, we must ask how, when, and why customers installed the app and ask the same questions about the desktop product. One of the quickest ways to understand this is to perform a quartile analysis of the new app installations. With this quartile analysis, my goal was to better understand when people were installing the app because timing matters. If users were installing the new app almost immediately and then becoming a retained user, we could probably conclude that the app may be causing retention. But that wasn’t the case.

75% of users didn’t install the new app until around day 57. Meaning, the customers that were using the app not only didn’t install the app right away, but that they had already committed to retaining the desktop product, and then decided to install the new app. The new app didn’t cause retention. It was an add-on product after customer had already decided to retain the primary product. Had those users installed the app on day-1 and had a higher retention rate for the primary product, it would have been easier to logically assume that the app was the cause of retention. Now, another issue was that the team didn’t need to wait 60 days. If most of the abandoning customers were occurring within the first day, a quick analysis showing that few customers installed the new app on day-1 would have been a red flag.




This is why as a data analyst it is so important to think like a business owner and a customer to understand how a customer might use a product. It’s not enough to be able to pull data at the request of a stakeholder. The surface-level data might indeed be what the stakeholder asked for, but there’s a good chance that the data might not tell the entire story. And by not telling the entire story, your company might invest millions of dollars into the development of something, only to later find out that they aren’t getting the desired results, all because of a misinterpretation of the data.


Brandon Southern, MBA, is the founder of Analytics Mentor, specializing in providing analytics advising, consulting, training, and mentorship for organizations and individuals. Brandon has been in tech for 20 years in roles including analytics, software development, release management, quality assurance, six-sigma process improvement, project & product management, and more. He has been an individual contributor as well as a senior leader at start-up companies, GameStop, VMWare, eBay, Amazon, and more. Brandon specializes in building world-class analytics organizations and elevating individuals.

You can learn more about Brandon and Analytics Mentor at



Subscribe to Receive New Articles

Join our mailing list to receive the latest articles and tips to elevate your career.

We hate SPAM. We will never sell your information, for any reason.