Machine Learning is Complicated.
I get it.
I help people like you discover machine learning.
You should never ask a ‘Can I?’ question until you’ve tried the thing you’re asking. If you’re looking for a job, asking random strangers online if you’ll be successful is a losing proposition. If they say yes, you’ve wasted time, if they say no, you’re defeated before you’ve begun.
No matter if I’m speaking to a client, a student, or a distant family member, people always ask me for examples of how I’ve applied Machine Learning in the real world. It seems that even though we’re being bombarded by articles and tutorials, that some context is missing. In this article I’m going to discuss three (semi-)recent projects of mine so that you can better understand how machine learning and data science works in practice.
The last article I wrote urged people to use experimentation in order to learn data science. But for those people who need a little more concrete advice, here’s a list of 5 essential ideas that I feel will help any beginning data scientist gain intuition about what is happening when they build and deploy machine learning models.
Right now, I’m in a fairly unique position. On the one hand I’m writing a book (The Science of Data Science), which I hope will be as inclusive and as easy to read as possible. On the other, I’m trying to settle on a topic for my PhD thesis, which means going out to the edges of the known and poking around to see where the wall gives.
While it’s true that the best data science is done by those who know their organisation very well, there’s a lot about data science that lends itself well to consulting style engagements. I’ve worked as a freelancer in data science (and analytics more generally) for the better part of a decade and in this post I’ll be showing how you can freelance using your data science skills.
Churn prediction is difficult. Before you can do anything to prevent customers leaving, you need to know everything from who’s going to leave and when, to how much it will impact your bottom line. In this post I’m going to explain some techniques for churn prediction and prevention using survival analysis.
There’s an idea in machine learning called ‘the unreasonable effectiveness of data’. It describes the tendency of all models to converge on accurate representations, given enough data.
For the past 10 years, I’ve been working with businesses of all shapes and sizes. I’ve worked on problems that ran the gamut from simple to incredibly complex. All that time, I’ve been trying to extract a framework for achieving results in analytics.
The Pareto principle is everywhere online. “Don’t be a busy fool - only 20% of what you do matters anyway”.
If you’re a data analyst right now, or anyone working in business intelligence, you might be eager to get your hands on the latest and greatest machine learning algorithms. You might be trying to sell your business on the benefits of predictive analytics and patiently waiting for the go ahead.
It’s high time that higher education was usurped by a cheaper and more flexible alternative. The replacement of broken, centuries-old institutions was one of the primary promises of the information revolution, but this hasn’t yet come to pass for education.
The R vs Python debate has been around a long time. Choosing between these immensely popular languages has been the source of countless infographics, Twitter-wars, and blog posts.
I wanted to write a post about getting a job as a data scientist. But my god, there are so many already out there. There’s advice about portfolio building. There’s listicles filled with suggested skills. There are template answers to common interview questions. Reading all these got me thinking - what kind of candidates is this community creating?
Data Scientists are in demand. And while that job title is applied to all sorts of disciplines, the reality is that nearly anybody with some analytical and predictive capabilities can (safely) call themselves a data scientist.
Imbalanced learning problems often stump those new to dealing with them. When the ratio between classes in your data is 1:100 or larger, early attempts to model the problem are rewarded with very high accuracy but very low specificity. You can solve the specificity problem in imbalanced learning in a few different ways:
When you undertake feature engineering for a new project, two outcomes are most likely:
Today I met my PhD supervisor face-to-face for the first time. Obviously, most of what was said primarily relates to my specific research topic, but we did discuss a few general things which may be of use to others (and myself in the future!)
Something that comes up surprisingly often in my data work is the idea of capturing local (in the geographical sense) patterns. Whether it be modelling an individual’s likelihood to make a purchase based on their neighbours activity (classical Keeping Up With the Joneses!) or predicting crime risk using local crime history.
A lot of the projects I work on are time-bound in one way or another. My clients need to know the churn rate next week, the risk of fraud next month, their anticipated revenue next quarter. But what features does a model need to do this well?