Hi, I'm Carl Dawson, an AI researcher, freelance data scientist, and founder based in London, UK. I solve difficult problems with data and I love to write about technology and its impact on our lives.


  • Churn Prediction as a Service

    Today, I’m excited to announce a new project.


  • The fuel on the fire

    There’s an idea in machine learning called ‘the unreasonable effectiveness of data’. It describes the tendency of all models to converge on accurate representations, given enough data.


  • Services, not people

    If you haven’t heard yet, consider yourself lucky. There’s a new service called Predictim. I’m not going to link to it because I think it’s abhorrent, but Google’s always there to help.


  • How I do data science

    For the past 10 years, I’ve been working with businesses of all shapes and sizes. I’ve worked on problems that ran the gamut from simple to incredibly complex. All that time, I’ve been trying to extract a framework for achieving results in analytics.


  • The inverse Pareto principle

    The Pareto principle is everywhere online. “Don’t be a busy fool - only 20% of what you do matters anyway”.


  • Remember the users

    If you’re a data analyst right now, or anyone working in business intelligence, you might be eager to get your hands on the latest and greatest machine learning algorithms. You might be trying to sell your business on the benefits of predictive analytics and patiently waiting for the go ahead.


  • The Achilles' heel of online education

    It’s high time that higher education was usurped by a cheaper and more flexible alternative. The replacement of broken, centuries-old institutions was one of the primary promises of the information revolution, but this hasn’t yet come to pass for education.


  • R vs Python: why I'm going back to R

    The R vs Python debate has been around a long time. Choosing between these immensely popular languages has been the source of countless infographics, Twitter-wars, and blog posts.


  • How to hire a data scientist

    I wanted to write a post about getting a job as a data scientist. But my god, there are so many already out there. There’s advice about portfolio building. There’s listicles filled with suggested skills. There are template answers to common interview questions. Reading all these got me thinking - what kind of candidates is this community creating?


  • Stepping out into the internet

    Something’s just dawned on me - most of the time I spend at the computer is spent interacting with people who desperately want my attention. Seeing as most of my time is spent at the computer, this leaves me in a kind of tragic position.


  • Why I choose to be a freelance data scientist

    Data Scientists are in demand. And while that job title is applied to all sorts of disciplines, the reality is that nearly anybody with some analytical and predictive capabilities can (safely) call themselves a data scientist.


  • Outlier detection with one-class SVMs

    Imbalanced learning problems often stump those new to dealing with them. When the ratio between classes in your data is 1:100 or larger, early attempts to model the problem are rewarded with very high accuracy but very low specificity. You can solve the specificity problem in imbalanced learning in a few different ways:


  • Finding and fixing multicollinearity

    When you undertake feature engineering for a new project, two outcomes are most likely:


  • Meeting my supervisor

    Today I met my PhD supervisor face-to-face for the first time. Obviously, most of what was said primarily relates to my specific research topic, but we did discuss a few general things which may be of use to others (and myself in the future!)


  • Geographic features in SQL

    Something that comes up surprisingly often in my data work is the idea of capturing local (in the geographical sense) patterns. Whether it be modelling an individual’s likelihood to make a purchase based on their neighbours activity (classical Keeping Up With the Joneses!) or predicting crime risk using local crime history.


  • Slopes as features: making time-sensitive predictions

    A lot of the projects I work on are time-bound in one way or another. My clients need to know the churn rate next week, the risk of fraud next month, their anticipated revenue next quarter. But what features does a model need to do this well?