# Learning Math for Machine Learning

Vincent Chen is a student at Stanford University studying Computer Science. He is also a Research Assistant at the Stanford AI Lab.

It’s not entirely clear what level of mathematics is necessary to get started in machine learning, especially for those who didn’t study math or statistics in school.

In this piece, my goal is to suggest the mathematical background necessary to build products or conduct academic research in machine learning. These suggestions are derived from conversations with machine learning engineers, researchers, and educators, as well as my own experiences in both machine learning research and industry roles.

To frame the math prerequisites, I first propose different mindsets and strategies for approaching your math education outside of traditional classroom settings. Then, I outline the specific backgrounds necessary for different kinds of machine learning work, as these subjects range from high school-level statistics and calculus to the latest developments in probabilistic graphical models (PGMs). By the end of the post, my hope is that you’ll have a sense of the math education you’ll need to be effective in your machine learning work, whatever that may be!

To preface the piece, I acknowledge that learning styles/frameworks/resources are unique to a learner’s personal needs/goals— your opinions would be appreciated in the discussion on HN!

A Note on Math Anxiety
It turns out that a lot of people — including engineers — are scared of math. To begin, I want to address the myth of “being good at math.”

The truth is, people who are good at math have lots of practice doing math. As a result, they’re comfortable being stuck while doing math. A student’s mindset, as opposed to innate ability, is the primary predictor of one’s ability to learn math (as shown by recent studies).

To be clear, it will take time and effort to achieve this state of comfort, but it’s certainly not something you’re born with. The rest of this post will help you figure out what level of mathematical foundation you need and outline strategies for building it.

https://www.ycombinator.com/library/51-learning-math-for-machine-learning

# A Concrete Introduction to Probability (using Python)

This notebook covers the basics of probability theory, with Python 3 implementations. (You should have some background in probability and Python.)

In 1814, Pierre-Simon Laplace wrote:

Probability … is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible … when nothing leads us to expect that any one of these cases should occur more than any other.

Laplace really nailed it, way back then! If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We’ll start being methodical by defining some vocabulary:

• Experiment: An occurrence with an uncertain outcome that we can observe.
For example, rolling a die.
• Outcome: The result of an experiment; one particular state of the world. What Laplace calls a “case.”
For example: `4`.
• Sample Space: The set of all possible outcomes for the experiment.
For example, `{1, 2, 3, 4, 5, 6}`.
• Event: A subset of possible outcomes that together have some property we are interested in.
For example, the event “even die roll” is the set of outcomes `{2, 4, 6}`.
• Probability: As Laplace said, the probability of an event with respect to a sample space is the number of favorable cases (outcomes from the sample space that are in the event) divided by the total number of cases in the sample space. (This assumes that all outcomes in the sample space are equally likely.) Since it is a ratio, probability will always be a number between 0 (representing an impossible event) and 1 (representing a certain event).
For example, the probability of an even die roll is 3/6 = 1/2.

This notebook will develop all these concepts; I also have a second part that covers paradoxes in Probability Theory.

http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb

# PyThalesians – Open Source Financial Library

PyThalesians is a Python financial library developed by the Thalesians (http://www.thalesians.com). I have used the library to develop my own trading strategies and I’ve included simple samples which show some of the functionality including an FX trend following model and other bits of financial analysis.

There are many open source Python libraries for making trading strategies around! However, I’ve developed this one to be as flexible as possible in terms of what types of strategies you can develop with it. In addition, a lot of the library can be used to analyse and plot financial data for broader based analysis, of the type that I’ve had to face being in markets over the years. Hence, it can be used by a wider array of users.

At present the PyThalesians offers:

• Backtesting of systematic trading strategies for cash markets (including cross sectional style trading strategies)
• Sensitivity analysis for systematic trading strategies parameters
• Seamless historic data downloading from Bloomberg (requires licence), Yahoo, Quandl, Dukascopy and other market data sources
• Produces beautiful line plots with PyThalesians wrapper (via Matplotlib), Plotly (via cufflinks) and a simple wrapper for Bokeh
• Analyse seasonality analysis of markets
• Calculates some technical indicators and gives trading signals based on these
• Helper functions built on top of Pandas
• Automatic tweeting of charts
• And much more!
• Please bear in mind at present PyThalesians is currently a highly experimental alpha project and isn’t yet fully documented
• Uses Apache 2.0 licence

https://github.com/thalesians/pythalesians

# Why Percentiles Don’t Work the Way you Think

“Customers ask us for p99 (99th percentile) of metrics pretty frequently.

It’s a request that certainly makes sense, and we plan to add such a functionality to VividCortex (more on that later). But a lot of the time, when customers make this request, they actually have something very specific in mind — something problematic. They’re not asking for the 99th percentile of a metric, they’re asking for a metric of 99th percentile. This is very common in systems like Graphite, and it doesn’t achieve what many people seem to think it does. This blog post explains how you might have the wrong idea™ about percentiles, the degree of the mistake (it depends), and what you can do instead…”

https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think

# Beginning deep learning with 500 lines of Julia

“There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages: (1) It is difficult for a beginner student to understand what the code is doing, which is a shame because sometimes the code can be a lot simpler than the underlying math. (2) Every other day new ideas come out for optimization, regularization, etc. If the package used already has the trick implemented, great. But if not, it is difficult for a researcher to test the new idea using impenetrable code with a steep learning curve. So I started writing KUnet.jl which currently implements backprop with basic units like relu, standard loss functions like softmax, dropout for generalization, L1-L2 regularization, and optimization using SGD, momentum, ADAGRAD, Nesterov’s accelerated gradient etc. in less than 500 lines of Julia code. Its speed is competitive with the fastest GPU packages (here is a benchmark). For installation and usage information, please refer to the GitHub repo. The remainder of this post will present (a slightly cleaned up version of) the code as a beginner’s neural network tutorial (modeled after Honnibal’s excellent parsing example)…”

http://www.denizyuret.com/2015/02/beginning-deep-learning-with-500-lines.html

# How juries are fooled by statistics

“Oxford mathematician Peter Donnelly reveals the common mistakes humans make in interpreting statistics — and the devastating impact these errors can have on the outcome of criminal trials…”