Tips – Data analysis workflow

Tips – Data analysis workflow

Creating an analysis is basically always a special type of project. As all disciplines analytics also needs a specific workflow and methodology to get real results. I have seen many times how workflows unfit for analytics work (such as software development methodologies) are being forced onto these projects actually damaging end results. There is no one good answer to who to do such projects, but I will try to bring you some insight into it.

I have been inspired for this post originally by an article I have read on DataScience 101. It points out that delivering data products is the key objective of any analytics project, yet it is not a software product. While code is an essential element in many cases and the closest worflow discipline is software engineering, it is not the same.

Let’s just apply to concept of the burn down chart. It presumes a steady and linear work approach, like in a factory. Now I ask you: how many times did you encounter a new, key finding in the middle of an analysis excercise, just to restart the whole workflow, because the key concept changed due to the new insight? I bet it already happened a few times.

Principles

Now there are already a few methods out there, which I will introduce in a moment, for a specific analytics, data science workflow. Nevertheless they share some basic principles and steps:

  1. Understand the context first – in many cases also called business understanding
  2. Do spend time on preparing the data
  3. While going through the workflow prepare for iterations – points where you need to take a step back and restart certain parts of the process

From the most important general disciplines I would like to highlight two. While there are others and new ones will keep coming, these two seem to be the most accepted and mature workflow concepts.

CRISP-DM

The most popular and oldest method around is called CRISP-DM, which was originally designed for deata mining projects. Because of this the workflow is relatively close to software engineering, but still has some specifics. It comprises of 6 steps, as collected below:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment
CRISP-DM
CRISP-DM workflow steps

Data science project lifecycle

It can be considered as an upgraded version of the CRISP-DM methodology. What I like in particular about the data science project lifecycle that the iterative, back-and-forth, nature of data analysis shines throughs clearly in this process. It is a 7 step process more focused on the actual analysis work:

  1. Data acquisition
  2. Data preparation
  3. Hypothesis and modeling
  4. Evaluation and Interpretation
  5. Deployment
  6. Operations
  7. Optimization
Data Science project  lifecycle
Data Science project lifecycle

In conclusion

I’m certain there are more or maybe even better analysis workflows out there, but point being you should figure out what works for you, but make sure you are not copying other methods, but count with some key principles and unique aspects of data analysis.

Do you know more resources or have questions? Let me know in the comments below!


Leave a Reply