Hi, I'm Carl Dawson, an AI researcher, freelance data scientist, and founder based in London, UK. I solve difficult problems with data and I love to write about technology and its impact on our lives.
Today, I’m excited to announce a new project.
There’s an idea in machine learning called ‘the unreasonable effectiveness of data’. It describes the tendency of all models to converge on accurate representations, given enough data.
If you haven’t heard yet, consider yourself lucky. There’s a new service called Predictim. I’m not going to link to it because I think it’s abhorrent, but Google’s always there to help.
For the past 10 years, I’ve been working with businesses of all shapes and sizes. I’ve worked on problems that ran the gamut from simple to incredibly complex. All that time, I’ve been trying to extract a framework for achieving results in analytics.
The Pareto principle is everywhere online. “Don’t be a busy fool - only 20% of what you do matters anyway”.
If you’re a data analyst right now, or anyone working in business intelligence, you might be eager to get your hands on the latest and greatest machine learning algorithms. You might be trying to sell your business on the benefits of predictive analytics and patiently waiting for the go ahead.
It’s high time that higher education was usurped by a cheaper and more flexible alternative. The replacement of broken, centuries-old institutions was one of the primary promises of the information revolution, but this hasn’t yet come to pass for education.
The R vs Python debate has been around a long time. Choosing between these immensely popular languages has been the source of countless infographics, Twitter-wars, and blog posts.
I wanted to write a post about getting a job as a data scientist. But my god, there are so many already out there. There’s advice about portfolio building. There’s listicles filled with suggested skills. There are template answers to common interview questions. Reading all these got me thinking - what kind of candidates is this community creating?
Something’s just dawned on me - most of the time I spend at the computer is spent interacting with people who desperately want my attention. Seeing as most of my time is spent at the computer, this leaves me in a kind of tragic position.
Data Scientists are in demand. And while that job title is applied to all sorts of disciplines, the reality is that nearly anybody with some analytical and predictive capabilities can (safely) call themselves a data scientist.
Imbalanced learning problems often stump those new to dealing with them. When the ratio between classes in your data is 1:100 or larger, early attempts to model the problem are rewarded with very high accuracy but very low specificity. You can solve the specificity problem in imbalanced learning in a few different ways:
When you undertake feature engineering for a new project, two outcomes are most likely:
Today I met my PhD supervisor face-to-face for the first time. Obviously, most of what was said primarily relates to my specific research topic, but we did discuss a few general things which may be of use to others (and myself in the future!)
Something that comes up surprisingly often in my data work is the idea of capturing local (in the geographical sense) patterns. Whether it be modelling an individual’s likelihood to make a purchase based on their neighbours activity (classical Keeping Up With the Joneses!) or predicting crime risk using local crime history.
A lot of the projects I work on are time-bound in one way or another. My clients need to know the churn rate next week, the risk of fraud next month, their anticipated revenue next quarter. But what features does a model need to do this well?