Tips and Components of Great Analytics Project

analytics data Feb 07, 2022

Recently I was mentoring an individual (early in his career) and he asked me a question: “What makes an analytics project a good project?” I thought that this was a really good question and worth talking about because the answer impacts individuals that are working on existing projects as well as individuals that are attempting to prepare for job interview questions.

My answer to this question is a simple one or two sentences because there are a number of components that comprise an analytics project. This answer is further complicated because different people will care about different components, and some of the work that makes a project a “good” project will never be directly seen by most people in the organization.

Below are the most common components of a “good” project, along with examples and reasons for why these components are important.

 

Storytelling


It doesn’t matter how accurate or insightful your work is if you can’t communicate it effectively. This also means that even if you spend days, weeks, and even months working on a very important analysis or model, if your audience isn’t able to understand the significance of your work, all of your efforts might have been for nothing.

This storytelling can take the form of verbal communication, emails, presentations, or write-ups (eg. Amazon style). You’ll need to ensure that you understand what your audience cares about, how well they understand the topic, what their priorities are, and how your work fits into this picture. Unfortunately, even if you do have a great story and communication, sometimes you’ll have an up-hill battle with people that are distracted (sick child, putting out “fires”, or think that they already know, when they don’t). While you might not be able to overcome all of these challenges, one of the most important things that you can do is to set the context for your audience.

Chances are that you know the situation better than anyone else because you pulled the data. But because you’re so close to the situation, you may overlook the fact that others don’t have this information or situation at the top of their mind. This can lead to diving into the deep end with data and charts before ever helping the audience to understand the problem. To overcome this potential issue, provide proper context about the situation, what you are trying to solve or explain (the task), the results, and how you will provide supporting evidence.

 

Visual representations


When dealing with visual representations, we aren’t only talking about Tableau or PowerBI, although these are quite common in analytics environments today. Anything that you represent visually falls into this category. This includes charts, graphs, Excel workbooks, as well as your visualization dashboards.

What constitutes a “good” visual? For starters, the basics, such as a title, axis labels, units, and units of measurement are all table-stakes. But those alone don’t necessarily lead to a “good” visual. Your data should be presented in a way that does not mislead the reader and it should be easy to comprehend with little explanation.

Here’s an example of something that could be misleading. The chart on the right looks drastically different from the chart on the left. However, the chart on the right uses the same data from the last 16 periods as the chart on the left. Both charts are “right”, but context matters, and we have no context of units of measurement, what this is measuring, or any other details to help us understand what is being presented.


Aside from the basics, your visualizations should look professional and be worthy of being presented on a web page of a top-tier company, as opposed to something that looks like it was built with Microsoft clipart from 1999.

 

Accuracy


Accuracy is extremely important and can be hard to prove at times because there are many situations where you may not have data to compare against. But in almost all situations, you can find ways to estimate if your data feels correct. Remember those questions such as, “how many fire hydrants do you think are in the city of New York”? Your ability to obtain an answer that is roughly in the ballpark will help you to determine if your data is accurate. Because if your data isn’t accurate, there is a potentially disastrous downstream impact.

For example, a few years ago Amazon had to scrap one of their machine learning models used for candidate recruiting. Why? Because they accidentally built a bias model.

If you’re new to machine learning, here’s how it works at an extremely high level. You take a set of data for an employee (years of work experience, level of education, current title, years in current role, certifications, etc.), put some attributes (such as “good” and “bad”) on the record, and then make the computer tell you which combination of attributes are the “best” for hiring the highest performing people. The problem is, if you have bias in your original data, your model is going to amplify that bias, which is what Amazon’s model did.

At Amazon, the company’s employees were predominantly male. And given the low numbers of female employees, you could probably imagine that few female employees rose to the highest ranks in the company. As such, the model might tell you that females wouldn’t make for good high-ranking employees and you shouldn’t hire them. Now, I’m sure that you can see that this is a problem, but if you’re letting the model make predictions, one individual at a time, you may never know that there is an issue.

This brings us to the bigger issue: Few people actually invest in ensuring that analytics output is accurate. You might shake your head and say that isn’t true but allow me to share a parallel.

When you buy a car, a phone, a piece of software, or a medication, it has been tested (in most situations). In the United States, the Food and Drug Administration (FDA) approves all new pharmaceuticals after numerous drug trials. For cars, auto manufactures perform crash safety tests along with other tests to ensure that parts last for a minimum amount of time or wear. And for software, engineers write (or they should) unit, functional, and integration tests. These software companies even hire software quality assurance engineers to double check the work of the software engineers. Also, the engineers use numerous tools and well-defined processes to reduce defects.

But when it comes to analytics, seldom, if ever, will someone check your work. If you have read some of my other articles, you’ll probably remember that almost every analytics team that I’ve encountered does not use a code repository, conduct code reviews, or build automated tests for their dashboards or data. Worse yet, the companies that make the dashboard visualization tools tell everyone how easy it is to “drag and drop” all of your data into their software. And like magic, you have “analytics”. Unfortunately, these practices or lack-thereof can contribute to poor decisions being made in your organization.

Ensuring accuracy is a thankless job, but it will help to protect the company, the people that you work with, society, and your personal image and brand.

 

“The Why” or “So What”


Every project should have a point and if you were to be asked, “why should I care” or, “so what do I do with this”, you should have a strong answer. Unfortunately, many times analytics projects get off track from the original task or maybe there wasn’t a well-defined task to begin with. And by the time you’re done, you have a lot of data, some charts and fancy dashboards, but they don’t really provide any insights or meaning.

To be a good project, your output doesn’t always have to have an answer. Sometimes the answer is that your findings are inconclusive and in a way, I guess that is an answer. But you should be able to take away something meaningful, like where to take action or if we should care.

For example, what if I were to tell you that our investment revenue is down by $1m this month. Does this matter? Well, the answer is, it depends. If you’re Elon Musk, who’s worth about $240B, the answer is probably, no. $1m to Elon Musk is equivalent to someone with $100,000 losing 42 cents. But if you have $1,100,000, losing $1m is quite meaningful.

And while context matters, it’s also important to understand what to do next with this information. Sometimes presenting the information is important for KPIs on a dashboard, but if you’re building a project, simply outputting facts isn’t usually enough. You’ll want to have insights and recommendations for what actions you can take based on this information.

 

Code & Architecture


Much like accuracy, your code (R, SQL, Python) is often overlooked and ignored by most people in your organization. Stakeholders almost certainly don’t care about your code because it’s not the end product. But your current and future analytics peers should care about your code because they are going to be the people that have to borrow, leverage, debug, and maintain the code that you wrote in order to produce a model, dashboard, or results for an ad-hoc request.

Good quality code should be well formatted, properly commented, follow best-practice conventions, be readable, bug-free, and be efficient from a query execution standpoint. And while it is often overlooked, poorly written code or code practices lead to massive problems within an analytics organization. Like a virus, once unleashed, the problems will multiply rapidly and cause quality issues, multiple sources of [inaccurate] truth, increased costs, slower speed-to-market, and increased frustration. You can read more about these issues in my series on, Building a Better Analytics Organization.

 

Practices


How you go about building your analytics project or analysis matters a lot. These practices are really about architecture, project management, documentation, and intelligent decision making. But again, many of these items tend to be overlooked by most teams and organizations because most people are only focused on the output.

Years ago I worked for a company that had an employee review process focused on the “what” and the “how” when rating employees. Unfortunately, they viewed the “how” [you get things done] as “how you communicated” and the “what” [did you deliver] as the output. No value was ever placed on the real “how”, which is not only about how you verbally communicate and help colleagues, but how well you made life for current and future employees, through the manner in which you constructed your solutions.

For example, two people could each build a new set of dashboards, and these two people could generate the same end-results. And while most people would be none the wiser, the actual dashboards could be drastically different. Case in point: About 6 years ago I completely re-architected an entire team’s set of dashboards, code, and datasets. Not only did the work remove a major bottleneck (only 5 analysts with access) and enable over 120 analysts across the company to access the data, but it also completely automated the job of those 5 analysts so that they could be free to work on more important projects. Even better, the code and tools that were written practically run themselves today, because that’s the way it was architected.

But the practices that you use aren’t limited to code architecture. Everything that you create can one day become technical debt, which can be incredibly expensive if you leave the company or no longer have a perfect memory of exactly what you did and why you did it. For this reason, having proper documentation of your tables, views, and dashboards is extremely important. Also, it’s important to leave breadcrumbs of why you did what you did, which is where Jira and other project management tools and practices come into play because again, context matters.

 

Intuition


For me, intuition is about making intelligent decisions regarding your project. Sometimes these decisions surround how you architect your project, but many times these decisions come down to whether or not you should even begin the project in the first place.

Frequently in the analytics domain, stakeholders will ask you for a number as an output and many times, they won’t provide you with any context of why they are asking for this number. But this context is extremely important for many of the reasons mentioned above. Without this context, you may end up working on a project that adds little to no value.

For example, if someone asked you to calculate the average revenue per user, you might run off and conduct an in-depth analysis, troubleshoot data issues, chase down edge-case issues with data, build a dashboard, create documentation, and thus have hours, days, or even weeks invested in deriving a conclusion. While this could be necessary in some situations, many situations only require a directional or less precise answer. What you spent weeks building may have been could have been determined through 5 minutes of critical thinking and intuition.

 

Closing


While some of these aspects of a project can be hard to measure, they are an important part to determine if you constructed a “good” analytics project. I hope that you’ll consider each of these components for your next project and reflect on previous projects for opportunities of improvement.

Thanks for reading!

 

Subscribe to Receive New Articles

Join our mailing list to receive the latest articles and tips to elevate your career.

We hate SPAM. We will never sell your information, for any reason.