A Concrete Introduction to Probability (using Python)

This notebook covers the basics of probability theory, with Python 3 implementations. (You should have some background in probability and Python.)

In 1814, Pierre-Simon Laplace wrote:

Probability … is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible … when nothing leads us to expect that any one of these cases should occur more than any other.

Laplace

Pierre-Simon Laplace
1814

Laplace really nailed it, way back then! If you want to untangle a probability problem, all you have to do is be methodical about defining exactly what the cases are, and then careful in counting the number of favorable and total cases. We’ll start being methodical by defining some vocabulary:

  • Experiment: An occurrence with an uncertain outcome that we can observe.
    For example, rolling a die.
  • Outcome: The result of an experiment; one particular state of the world. What Laplace calls a “case.”
    For example: 4.
  • Sample Space: The set of all possible outcomes for the experiment.
    For example, {1, 2, 3, 4, 5, 6}.
  • Event: A subset of possible outcomes that together have some property we are interested in.
    For example, the event “even die roll” is the set of outcomes {2, 4, 6}.
  • Probability: As Laplace said, the probability of an event with respect to a sample space is the number of favorable cases (outcomes from the sample space that are in the event) divided by the total number of cases in the sample space. (This assumes that all outcomes in the sample space are equally likely.) Since it is a ratio, probability will always be a number between 0 (representing an impossible event) and 1 (representing a certain event).
    For example, the probability of an even die roll is 3/6 = 1/2.

This notebook will develop all these concepts; I also have a second part that covers paradoxes in Probability Theory.

http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb

PyThalesians – Open Source Financial Library

PyThalesians is a Python financial library developed by the Thalesians (http://www.thalesians.com). I have used the library to develop my own trading strategies and I’ve included simple samples which show some of the functionality including an FX trend following model and other bits of financial analysis.

There are many open source Python libraries for making trading strategies around! However, I’ve developed this one to be as flexible as possible in terms of what types of strategies you can develop with it. In addition, a lot of the library can be used to analyse and plot financial data for broader based analysis, of the type that I’ve had to face being in markets over the years. Hence, it can be used by a wider array of users.

At present the PyThalesians offers:

  • Backtesting of systematic trading strategies for cash markets (including cross sectional style trading strategies)
  • Sensitivity analysis for systematic trading strategies parameters
  • Seamless historic data downloading from Bloomberg (requires licence), Yahoo, Quandl, Dukascopy and other market data sources
  • Produces beautiful line plots with PyThalesians wrapper (via Matplotlib), Plotly (via cufflinks) and a simple wrapper for Bokeh
  • Analyse seasonality analysis of markets
  • Calculates some technical indicators and gives trading signals based on these
  • Helper functions built on top of Pandas
  • Automatic tweeting of charts
  • And much more!
  • Please bear in mind at present PyThalesians is currently a highly experimental alpha project and isn’t yet fully documented
  • Uses Apache 2.0 licence

https://github.com/thalesians/pythalesians

Why Percentiles Don’t Work the Way you Think

“Customers ask us for p99 (99th percentile) of metrics pretty frequently.

It’s a request that certainly makes sense, and we plan to add such a functionality to VividCortex (more on that later). But a lot of the time, when customers make this request, they actually have something very specific in mind — something problematic. They’re not asking for the 99th percentile of a metric, they’re asking for a metric of 99th percentile. This is very common in systems like Graphite, and it doesn’t achieve what many people seem to think it does. This blog post explains how you might have the wrong idea™ about percentiles, the degree of the mistake (it depends), and what you can do instead…”

https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think

Beginning deep learning with 500 lines of Julia

“There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages: (1) It is difficult for a beginner student to understand what the code is doing, which is a shame because sometimes the code can be a lot simpler than the underlying math. (2) Every other day new ideas come out for optimization, regularization, etc. If the package used already has the trick implemented, great. But if not, it is difficult for a researcher to test the new idea using impenetrable code with a steep learning curve. So I started writing KUnet.jl which currently implements backprop with basic units like relu, standard loss functions like softmax, dropout for generalization, L1-L2 regularization, and optimization using SGD, momentum, ADAGRAD, Nesterov’s accelerated gradient etc. in less than 500 lines of Julia code. Its speed is competitive with the fastest GPU packages (here is a benchmark). For installation and usage information, please refer to the GitHub repo. The remainder of this post will present (a slightly cleaned up version of) the code as a beginner’s neural network tutorial (modeled after Honnibal’s excellent parsing example)…”

http://www.denizyuret.com/2015/02/beginning-deep-learning-with-500-lines.html

Little Book of R for Time Series

“By Avril Coghlan, Parasite Genomics Group, Wellcome Trust Sanger Institute, Cambridge, U.K. Email: alc@sanger.ac.uk

This is a simple introduction to time series analysis using the R statistics software.

There is a pdf version of this booklet available at https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-series/latest/a-little-book-of-r-for-time-series.pdf.

If you like this booklet, you may also like to check out my booklet on using R for biomedical statistics, http://a-little-book-of-r-for-biomedical-statistics.readthedocs.org/, and my booklet on using R for multivariate analysis, http://little-book-of-r-for-multivariate-analysis.readthedocs.org/…”

http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/index.html