Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found

While machine learning has a rich history dating back to 1959, the field is evolving at an unprecedented rate. In a recent article, I discussed why the broader artificial intelligence field is booming and likely will for some time to come. Those interested in learning ML may find it daunting to get started.

As I prepare to start my Ph.D. program in the Fall, I’ve been scouring the web for good resources on all aspects of machine learning and NLP. Typically, I’ll find an interesting tutorial or video, and that leads to three or four more tutorials or videos, and before I know it, I have 20 tabs of new material I need to go through. (On a side note, Tab Bundler has been helpful to stay organized.)

After finding over 25 ML-related “cheat sheets”, I created a post that links to all the good ones.

To help others that are going through a similar discovery process, I’ve put together a list of the best tutorial content that I’ve found so far. It’s by no means an exhaustive list of every ML-related tutorial on the web — that would be overwhelming and duplicative. Plus, there is a bunch of mediocre content out there. My goal was to link to the best tutorials I found on the important subtopics within machine learning and NLP.

By tutorial, I’m referring to introductory content that is intending to teach a concept succinctly. I’ve avoided including chapters of books, which have a greater breadth of coverage, and research papers, which generally don’t do a good job in teaching concepts. Why not just buy a book? Tutorials are helpful when you’re trying to learn a specific niche topic or want to get different perspectives.

I’ve split this post into four sections: Machine LearningNLPPython, and Math. I’ve included a sampling of topics within each section, but given the vastness of the material, I can’t possibly include every possible topic.

For future posts, I may create a similar list of books, online videos, and code repos as I’m compiling a growing collection of those resources too.

If there are good tutorials you are aware of that I’m missing, please let me know! I’m trying to limit each topic to five or six tutorials since much beyond that would be repetitive. Each link should have different material from the other links or present information in a different way (e.g. code versus slides versus long-form) or from a different perspective.


An Upgrade to SyntaxNet, New Models and a Parsing Competition

At Google, we continuously improve the language understanding capabilities used in applications ranging from generation of email responses to translation. Last summer, we open-sourced SyntaxNet, a neural-network framework for analyzing and understanding the grammatical structure of sentences. Included in our release was Parsey McParseface, a state-of-the-art model that we had trained for analyzing English, followed quickly by a collection of pre-trained models for 40 additional languages, which we dubbed Parsey’s Cousins. While we were excited to share our research and to provide these resources to the broader community, building machine learning systems that work well for languages other than English remains an ongoing challenge. We are excited to announce a few new research resources, available now, that address this problem.


Oxford Deep NLP 2017 course

This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford.

This is an advanced course on natural language processing. Automatically processing natural language inputs and producing language outputs is a key component of Artificial General Intelligence. The ambiguities and noise inherent in human communication render traditional symbolic AI techniques ineffective for representing and analysing language data. Recently statistical techniques based on neural networks have achieved a number of remarkable successes in natural language processing leading to a great deal of commercial and academic interest in the field

This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.

This course is organised by Phil Blunsom and delivered in partnership with the DeepMind Natural Language Research Group.


Practical Natural Language Processing for Determing Wifi Quality in Hostels

“I was planning my trip to Amsterdam in January and was looking through hostels in Hostel World filtering for different features and amenities. One amenity that I thought I would definitely need was free wifi if I wanted to do some programming from the hostel and also just because life demands it in general. While there’s a ton of hostels that offer free wifi, I’ve definitely been at the end of the stick where the quality of wifi has been unmentionably bad. This probably goes for hotels as well as hostels, but generally hostels are cheaper and offer less in the way of complementary services.

That got me thinking about creating an interesting application that could judge the quality of wifi in reviews. Randomly I decided to spin up a new idea for a scraping/api for Hostel World where I could actually find the reviews that mention wifi and other amenities that would be useful. Instead of meticulously scanning through hundreds of reviews, I could just scrape the reviews, parse out keywords, and assign sentiment scores to each review…”