Python for NLP: Introduction to the TextBlob Library

This is the seventh article in my series of articles on Python for NLP. In my previous article, I explained how to perform topic modeling using Latent Dirichlet Allocation and Non-Negative Matrix factorization. We used the Scikit-Learn library to perform topic modeling.

In this article, we will explore TextBlob, which is another extremely powerful NLP library for Python. TextBlob is built upon NLTK and provides an easy to use interface to the NLTK library. We will see how TextBlob can be used to perform a variety of NLP tasks ranging from parts-of-speech tagging to sentiment analysis, and language translation to text classification.

https://stackabuse.com/python-for-nlp-introduction-to-the-textblob-library/

The Illustrated Word2vec

I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2).

Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like AirbnbAlibabaSpotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines.

In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

https://jalammar.github.io/illustrated-word2vec/

Practical Text Classification With Python and Keras

Imagine you could know the mood of the people on the Internet. Maybe you are not interested in its entirety, but only if people are today happy on your favorite social media platform. After this tutorial, you’ll be equipped to do this. While doing this, you will get a grasp of current advancements of (deep) neural networks and how they can be applied to text.

Reading the mood from text with machine learning is called sentiment analysis, and it is one of the prominent use cases in text classification. This falls into the very active research field of natural language processing (NLP). Other common use cases of text classification include detection of spam, auto tagging of customer queries, and categorization of text into defined topics. So how can you do this?

https://realpython.com/python-keras-text-classification/

Deep Learning for NLP Best Practices

This post is a collection of best practices for using neural networks in Natural Language Processing. It will be updated periodically as new insights become available and in order to keep track of our evolving understanding of Deep Learning for NLP.

There has been a running joke in the NLP community that an LSTM with attention will yield state-of-the-art performance on any task. While this has been true over the course of the last two years, the NLP community is slowly moving away from this now standard baseline and towards more interesting models.

However, we as a community do not want to spend the next two years independently (re-)discovering the next LSTM with attention. We do not want to reinvent tricks or methods that have already been shown to work. While many existing Deep Learning libraries already encode best practices for working with neural networks in general, such as initialization schemes, many other details, particularly task or domain-specific considerations, are left to the practitioner.

This post is not meant to keep track of the state-of-the-art, but rather to collect best practices that are relevant for a wide range of tasks. In other words, rather than describing one particular architecture, this post aims to collect the features that underly successful architectures. While many of these features will be most useful for pushing the state-of-the-art, I hope that wider knowledge of them will lead to stronger evaluations, more meaningful comparison to baselines, and inspiration by shaping our intuition of what works.

I assume you are familiar with neural networks as applied to NLP (if not, I recommend Yoav Goldberg’s excellent primer [43]) and are interested in NLP in general or in a particular task. The main goal of this article is to get you up to speed with the relevant best practices so you can make meaningful contributions as soon as possible.

I will first give an overview of best practices that are relevant for most tasks. I will then outline practices that are relevant for the most common tasks, in particular classification, sequence labelling, natural language generation, and neural machine translation.

Disclaimer: Treating something as best practice is notoriously difficult: Best according to what? What if there are better alternatives? This post is based on my (necessarily incomplete) understanding and experience. In the following, I will only discuss practices that have been reported to be beneficial independently by at least two different groups. I will try to give at least two references for each best practice.

http://ruder.io/deep-learning-nlp-best-practices/

Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found

While machine learning has a rich history dating back to 1959, the field is evolving at an unprecedented rate. In a recent article, I discussed why the broader artificial intelligence field is booming and likely will for some time to come. Those interested in learning ML may find it daunting to get started.

As I prepare to start my Ph.D. program in the Fall, I’ve been scouring the web for good resources on all aspects of machine learning and NLP. Typically, I’ll find an interesting tutorial or video, and that leads to three or four more tutorials or videos, and before I know it, I have 20 tabs of new material I need to go through. (On a side note, Tab Bundler has been helpful to stay organized.)

After finding over 25 ML-related “cheat sheets”, I created a post that links to all the good ones.

To help others that are going through a similar discovery process, I’ve put together a list of the best tutorial content that I’ve found so far. It’s by no means an exhaustive list of every ML-related tutorial on the web — that would be overwhelming and duplicative. Plus, there is a bunch of mediocre content out there. My goal was to link to the best tutorials I found on the important subtopics within machine learning and NLP.

By tutorial, I’m referring to introductory content that is intending to teach a concept succinctly. I’ve avoided including chapters of books, which have a greater breadth of coverage, and research papers, which generally don’t do a good job in teaching concepts. Why not just buy a book? Tutorials are helpful when you’re trying to learn a specific niche topic or want to get different perspectives.

I’ve split this post into four sections: Machine LearningNLPPython, and Math. I’ve included a sampling of topics within each section, but given the vastness of the material, I can’t possibly include every possible topic.

For future posts, I may create a similar list of books, online videos, and code repos as I’m compiling a growing collection of those resources too.

If there are good tutorials you are aware of that I’m missing, please let me know! I’m trying to limit each topic to five or six tutorials since much beyond that would be repetitive. Each link should have different material from the other links or present information in a different way (e.g. code versus slides versus long-form) or from a different perspective.

https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78

An Upgrade to SyntaxNet, New Models and a Parsing Competition

At Google, we continuously improve the language understanding capabilities used in applications ranging from generation of email responses to translation. Last summer, we open-sourced SyntaxNet, a neural-network framework for analyzing and understanding the grammatical structure of sentences. Included in our release was Parsey McParseface, a state-of-the-art model that we had trained for analyzing English, followed quickly by a collection of pre-trained models for 40 additional languages, which we dubbed Parsey’s Cousins. While we were excited to share our research and to provide these resources to the broader community, building machine learning systems that work well for languages other than English remains an ongoing challenge. We are excited to announce a few new research resources, available now, that address this problem.

https://research.googleblog.com/2017/03/an-upgrade-to-syntaxnet-new-models-and.html

Oxford Deep NLP 2017 course

This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford.

This is an advanced course on natural language processing. Automatically processing natural language inputs and producing language outputs is a key component of Artificial General Intelligence. The ambiguities and noise inherent in human communication render traditional symbolic AI techniques ineffective for representing and analysing language data. Recently statistical techniques based on neural networks have achieved a number of remarkable successes in natural language processing leading to a great deal of commercial and academic interest in the field

This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.

This course is organised by Phil Blunsom and delivered in partnership with the DeepMind Natural Language Research Group.

https://github.com/oxford-cs-deepnlp-2017/lectures