Beginning deep learning with 500 lines of Julia

“There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages: (1) It is difficult for a beginner student to understand what the code is doing, which is a shame because sometimes the code can be a lot simpler than the underlying math. (2) Every other day new ideas come out for optimization, regularization, etc. If the package used already has the trick implemented, great. But if not, it is difficult for a researcher to test the new idea using impenetrable code with a steep learning curve. So I started writing KUnet.jl which currently implements backprop with basic units like relu, standard loss functions like softmax, dropout for generalization, L1-L2 regularization, and optimization using SGD, momentum, ADAGRAD, Nesterov’s accelerated gradient etc. in less than 500 lines of Julia code. Its speed is competitive with the fastest GPU packages (here is a benchmark). For installation and usage information, please refer to the GitHub repo. The remainder of this post will present (a slightly cleaned up version of) the code as a beginner’s neural network tutorial (modeled after Honnibal’s excellent parsing example)…”

Little Book of R for Time Series

“By Avril Coghlan, Parasite Genomics Group, Wellcome Trust Sanger Institute, Cambridge, U.K. Email:

This is a simple introduction to time series analysis using the R statistics software.

There is a pdf version of this booklet available at

If you like this booklet, you may also like to check out my booklet on using R for biomedical statistics,, and my booklet on using R for multivariate analysis,…”

Markov Chains – Explained

“Markov Chains is a probabilistic process, that relies on the current state to predict the next state. For Markov chains to be effective the current state has to be dependent on the previous state in some way; For instance, from experience we know that if it looks cloudy outside, the next state we expect is rain. We can also say that when the rain starts to subside into cloudiness, the next state will most likely be sunny. Not every process has the Markov Property, such as the Lottery, this weeks winning numbers have no dependence to the previous weeks winning numbers…”

How to gamble if you must—the mathematics of optimal stopping

“Every decision is risky business. Selecting the best time to stop and act is crucial. When Microsoft prepares to introduce Word 2020, it must decide when to quit debugging and launch the product. When a hurricane veers toward Florida, the governor must call when it’s time to stop watching and start evacuating. Bad timing can be ruinous. Napoleon learned that the hard way after invading Russia. We face smaller-consequence stopping decisions all the time, when hunting for a better parking space, responding to a job offer or scheduling retirement.

The basic framework of all these problems is the same: A decision maker observes a process evolving in time that involves some randomness. Based only on what is known, he or she must make a decision on how to maximize reward or minimize cost. In some cases, little is known about what’s coming. In other cases, information is abundant. In either scenario, no one predicts the future with full certainty. Fortunately, the powers of probability sometimes improve the odds of making a good choice.

While much of mathematics has roots that reach back millennia to Euclid and even earlier thinkers, the history of probability is far shorter. And its lineage is, well, a lot less refined. Girolamo Cardano’s famed 1564 manuscript De Ludo Aleae, one of the earliest writings on probability and not published until a century after he wrote it, primarily analyzed dice games. Although Galileo and other 17th-century scientists contributed to this enterprise, many credit the mathematical foundations of probability to an exchange of letters in 1654 between two famous French mathematicians, Blaise Pascal and Pierre de Fermat. They too were concerned with odds and dice throws—for example, whether it is wise to bet even money that a pair of sixes will occur in 24 rolls of two fair dice. Some insisted it was, but the true probability of a double six in 24 rolls is about 49.1 percent…”,y.2009,no.2,content.true,page.1,css.print/issue.aspx