# 5 Essential Machine Learning Ideas

The last article I wrote urged people to use experimentation in order to learn data science. But for those people who need a little more concrete advice, here’s a list of 5 essential ideas that I feel will help any beginning data scientist gain intuition about what is happening when they build and deploy machine learning models.

Instead of giving in-depth details of how to implement each of these ideas, I’m just going to post a brief explanation of each and point to a resource that you can use to learn more.

Learning the intuition behind each of these ideas has really helped me understand what people do when they do machine learning and has continually informed my work and research. Good luck!

# 1. Gradient Descent

Computing the optimal values for the parameters in machine learning models can be very computationally expensive. By optimising (reducing) a loss function instead we save on a lot of the computational work but still arrive at good solutions (in most cases).

Gradient descent is used all over the place in machine learning including gradient boosting and neural networks (via backpropagation).

In order to understand and implement Gradient Descent you’ll need to have a basic awareness of partial differentiation (which is covered in Calculus).

To learn more: Check out Andrew Ng’s Machine Learning course on Coursera.

# 2. The Kernel Trick

Separability is *the* goal when you’re trying to do classification. If your data isn’t separable in the space its in, you can increase the dimensionality until you’re able to find a separating surface.

By using the features in your dataset to create the higher order terms (the ones that increase dimensionality), you can quickly calculate the observation’s position in the higher dimensional space, allowing you to compare and measure distances between the points and therefore classify the points correctly. The ‘kernel’ is the function that maps the lower dimensional data to the higher dimensions.

In order to understand and implement The Kernel Trick (and algorithms like the Support Vector Machine), you’ll need to understand inner products, distance metrics, transpose matrices, and the dot product (covered in Learning Algebra).

To learn more: Try Christopher Bishop’s book Pattern Recognition and Machine Learning.

# 3. Dimensionality Reduction

It’s difficult to test for correlation in datasets which contain a lot of features. However, it is crucial to the performance of many algorithms (due to their reliance on non singular matrices) that the features used for learning only add additional dimensions when they yield increased predictive accuracy.

Dimensionality reduction techniques, including Principal Component Analysis and Linear Discriminant Analysis, are a set of statistical procedures which reduce the dimensionality of the data without losing predictive power.

PCA, for example, iteratively selects orthogonal (at right angles) transformations of the data with the highest variance, leaving you with a feature set of uncorrelated transformations.

In order to understand and implement Dimensionality Reduction techniques you’ll need to have a good grasp of Linear Algebra and the idea of variance.

To learn more: The Elements of Statistical Learning uses Dimensionality Reduction throughout to improve other algorithms.

# 4. Deep Neural Networks

Feature Engineering is one of the crucial parts of any data science project. However, determining features from images and textual data is complex and subject to many nuances. Deep learning has somewhat abated the necessity of feature engineering in those disciplines by automagically selecting hierarchical features by iteratively learning node weights.

In order to implement Deep Neural Networks (from scratch) you’ll have to understand Gradient Descent, matrix operations (including dot products) and Logistic Regression (for the sigmoid activation function).

To learn more: Check out the Deep Learning Book by Ian Goodfellow (and others), it’s not for the faint of heart, but it’s very good!

# 5. Reinforcement Learning

Why stop at adding a loss function? Deep Reinforcement Learning adds another function (the Q function) which the network aims to *maximise* the value of over its lifespan.

Reinforcement Learning is often considered to be the third branch of machine learning after Supervised and Unsupervised learning and has been used to play Atari games at superhuman levels and beat the world Go champion. It’s use in robotics and self-driving cars makes it a very attractive area of machine learning to study.

In order to implement and understand Deep Reinforcement Learning, you’ll need to understand neural networks, Markov decision processes, and Dynamic Programming.

To learn more: Read Chris Watkins’ PhD thesis which introduced Q-Learning.