The Illustrated Word2vec

I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2).

Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like AirbnbAlibabaSpotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines.

In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

https://jalammar.github.io/illustrated-word2vec/

Anomaly detection on Amazon DynamoDB Streams using the Amazon SageMaker Random Cut Forest algorithm

Have you considered introducing anomaly detection technology to your business? Anomaly detection is a technique used to identify rare items, events, or observations which raise suspicion by differing significantly from the majority of the data you are analyzing.  The applications of anomaly detection are wide-ranging including the detection of abnormal purchases or cyber intrusions in banking, spotting a malignant tumor in an MRI scan, identifying fraudulent insurance claims, finding unusual machine behavior in manufacturing, and even detecting strange patterns in network traffic that could signal an intrusion.

There are many commercial products to do this, but you can easily implement an anomaly detection system by using Amazon SageMaker, AWS Glue, and AWS Lambda. Amazon SageMaker is a fully-managed platform to help you quickly build, train, and deploy machine learning models at any scale. AWS Glue is a fully-managed ETL service that makes it easy for you to prepare your data/model for analytics. AWS Lambda is a well-known a serverless real-time platform. Using these services, your model can be automatically updated with new data, and the new model can be used to alert for anomalies in real time with better accuracy.

In this blog post I’ll describe how you can use AWS Glue to prepare your data and train an anomaly detection model using Amazon SageMaker. For this exercise, I’ll store a sample of the NAB NYC Taxi data in Amazon DynamoDB to be streamed in real time using an AWS Lambda function.

The solution that I describe provides the following benefits:

  • You can make the best use of existing resources for anomaly detection. For example, if you have been using Amazon DynamoDB Streams for disaster recovery (DR) or other purposes, you can use the data in that stream for anomaly detection. In addition, stand-by storage usually has low utilization. The data in low awareness can be used for training data.
  • You can automatically retrain the model with new data on a regular basis with no user intervention.
  • You can make it easy to use the Random Cut Forest built-in Amazon SageMaker algorithm. Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows in a secure and scalable environment.

https://aws.amazon.com/pt/blogs/machine-learning/anomaly-detection-on-amazon-dynamodb-streams-using-the-amazon-sagemaker-random-cut-forest-algorithm/

190 universities just launched 600 free online courses. Here’s the full list

If you haven’t heard, universities around the world are offering their courses online for free (or at least partially free). These courses are collectively called MOOCs or Massive Open Online Courses.

In the past six years or so, over 800 universities have created more than 10,000 of these MOOCs. And I’ve been keeping track of these MOOCs the entire time over at Class Central, ever since they rose to prominence.

In the past four months alone, 190 universities have announced 600 such free online courses. I’ve compiled a list of them and categorized them according to the following subjects: Computer Science, Mathematics, Programming, Data Science, Humanities, Social Sciences, Education & Teaching, Health & Medicine, Business, Personal Development, Engineering, Art & Design, and finally Science.

If you have trouble figuring out how to signup for Coursera courses for free, don’t worry — here’s an article on how to do that, too.

Many of these are completely self-paced, so you can start taking them at your convenience.

https://qz.com/1437623/600-free-online-courses-you-can-take-from-universities-worldwide/

Practical Text Classification With Python and Keras

Imagine you could know the mood of the people on the Internet. Maybe you are not interested in its entirety, but only if people are today happy on your favorite social media platform. After this tutorial, you’ll be equipped to do this. While doing this, you will get a grasp of current advancements of (deep) neural networks and how they can be applied to text.

Reading the mood from text with machine learning is called sentiment analysis, and it is one of the prominent use cases in text classification. This falls into the very active research field of natural language processing (NLP). Other common use cases of text classification include detection of spam, auto tagging of customer queries, and categorization of text into defined topics. So how can you do this?

https://realpython.com/python-keras-text-classification/

A visual introduction to machine learning

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions.

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

Model Tuning and
the Bias-Variance Tradeoff

The goal of modeling is to approximate real-life situations by identifying and encoding patterns in data. Models make mistakes if those patterns are overly simple or overly complex.

http://www.r2d3.us/visual-intro-to-machine-learning-part-2/

Face recognition with OpenCV, Python, and deep learning

In today’s blog post you are going to learn how to perform face recognition in both images and video streams using:

  • OpenCV
  • Python
  • Deep learning

As we’ll see, the deep learning-based facial embeddings we’ll be using here today are both (1) highly accurate and (2) capable of being executed in real-time.

To learn more about face recognition with OpenCV, Python, and deep learning, just keep reading!

https://www.pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/

AWS DeepLens: first impressions + tutorial

Getting up-and-running with Amazon’s new machine learning-enabled camera

tl;dr It’s awesome. Get one.

At the end of 2017, Amazon announced DeepLens, a camera with specialized hardware that allows developers to deploy machine learning and computer vision models to “the edge,” and integrate the data it collects with other AWS services.

On a whim, I put in a one-click order on Prime (devices started shipping just last week); it arrived a couple days later and just hours from unboxing — with one or two minor hiccups — I got it up-and-running and integrated with other AWS services. I’ve been pleasantly surprised, to say the least.

https://medium.com/@CUlstrup/aws-deeplens-first-impressions-tutorial-17e6d448d58d

Machine Learning on AWS

Why machine learning on AWS?

Machine Learning for everyone

Whether you are a data scientist, ML researcher, or developer, AWS offers machine learning services and tools tailored to meet your needs and level of expertise.

API-driven ML services

Developers can easily add intelligence to any application with a diverse selection of pre-trained services that provide computer vision, speech, language analysis, and chatbot functionality.

Broad framework support

AWS supports all the major machine learning frameworks, including TensorFlow, Caffe2, and Apache MXNet, so that you can bring or develop any model you choose.

Breadth of compute options

AWS offers a broad array of compute options for training and inference with powerful GPU-based instances, compute and memory optimized instances, and even FPGAs.

Deep platform integrations

ML services are deeply integrated with the rest of the platform including the data lake and database tools you need to run ML workloads. A data lake on AWS gives you access to the most complete platform for big data.

Comprehensive analytics

Choose from a comprehensive set of services for data analysis including data warehousing, business intelligence, batch processing, stream processing, data workflow orchestration.

Secure

Control access to resources with granular permission policies. Storage and database services offer strong encryption to keep your data secure. Flexible key management options allow you to choose whether you or AWS will manage the encryption keys.

Pay-as-you-go

Consume services as you need them and only for the period you use them. AWS pricing has no upfront fees, termination penalties, or long term contracts. The AWS Free Tier helps you get started with AWS.