The Pareto principle is everywhere online. “Don’t be a busy fool - only 20% of what you do matters anyway”.

For those of you who don’t know, the Pareto principle maintains that you achieve 80% of your results from 20% of your effort. I’m not sure that Pareto himself meant for this to be applied indiscriminately across nearly every area of human endeavour, and I’m fairly sure he didn’t mean to suggest that that last 20% wasn’t worthwhile.

I think he was just trying to frame your expectations when it came to return on time-investment.

In either case, I’m not sure it holds in machine learning and data science.

The first part of a machine learning project feels like climbing a steep hill. You do feature engineering blindfolded, you scrape websites, you write queries. You do statistical tests, you prepare exploratory charts.

You clean up the data. You try a simple model, something linear. You get more exotic; decision trees, SVMs, AdaBoost, XGBoost, over-sampling, under-sampling, grid search.

You get something that kind of works.

Then you get sidetracked. Maybe you need to make an API. Maybe the underlying data source has changed meaningfully. Maybe the product you’re interfacing with was scrapped or altered. Maybe you decide to reframe the problem in a new way or try something else. You go back to the drawing board.

It doesn’t feel like it, but you’re 80% done.

You’ll have a sudden insight, things will start clicking, and you’ll speed down the hill.

You’ll get 80% of the results for 20% of the effort, but only after you’ve expended that first 80%. The Pareto principle is the wrong way round for data science. We have to work hard to earn that first 1/5. We have to experiment. It’s called data science for a reason.