What you should know to really understand the Node.js Event Loop

Node.js is an event-based platform. This means that everything that happens in Node is the reaction to an event. A transaction passing through Node traverses a cascade of callbacks.
Abstracted away from the developer, this is all handled by a library called libuv which provides a mechanism called an event loop.

This event loop is maybe the most misunderstood concept of the platform.

I work for Dynatrace, a performance monitoring vendor and when we approached the topic of event loop monitoring, we put a lot of effort into properly understanding what we are actually measuring.

In this article I will cover our learnings about how the event loop really works and how to monitor it properly.

https://medium.com/the-node-js-collection/what-you-should-know-to-really-understand-the-node-js-event-loop-and-its-metrics-c4907b19da4c

Advertisements

A practical explanation of a Naive Bayes classifier

The simplest solutions are usually the most powerful ones, and Naive Bayes is a good proof of that. In spite of the great advances of the Machine Learning in the last years, it has proven to not only be simple but also fast, accurate and reliable. It has been successfully used for many purposes, but it works particularly well with natural language processing (NLP) problems.

Naive Bayes is a family of probabilistic algorithms that take advantage of probability theory and Bayes’ Theorem to predict the category of a sample (like a piece of news or a customer review). They are probabilistic, which means that they calculate the probability of each category for a given sample, and then output the category with the highest one. The way they get these probabilities is by using Bayes’ Theorem, which describes the probability of a feature, based on prior knowledge of conditions that might be related to that feature.

We’re going to be working with an algorithm called Multinomial Naive Bayes. We’ll walk through the algorithm applied to NLP with an example, so by the end not only will you know how this method works, but also why it works. Then, we’ll lay out a few advanced techniques that can make Naive Bayes competitive with more complex Machine Learning algorithms, such as SVM and neural networks.

https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/

Deep Learning for NLP Best Practices

This post is a collection of best practices for using neural networks in Natural Language Processing. It will be updated periodically as new insights become available and in order to keep track of our evolving understanding of Deep Learning for NLP.

There has been a running joke in the NLP community that an LSTM with attention will yield state-of-the-art performance on any task. While this has been true over the course of the last two years, the NLP community is slowly moving away from this now standard baseline and towards more interesting models.

However, we as a community do not want to spend the next two years independently (re-)discovering the next LSTM with attention. We do not want to reinvent tricks or methods that have already been shown to work. While many existing Deep Learning libraries already encode best practices for working with neural networks in general, such as initialization schemes, many other details, particularly task or domain-specific considerations, are left to the practitioner.

This post is not meant to keep track of the state-of-the-art, but rather to collect best practices that are relevant for a wide range of tasks. In other words, rather than describing one particular architecture, this post aims to collect the features that underly successful architectures. While many of these features will be most useful for pushing the state-of-the-art, I hope that wider knowledge of them will lead to stronger evaluations, more meaningful comparison to baselines, and inspiration by shaping our intuition of what works.

I assume you are familiar with neural networks as applied to NLP (if not, I recommend Yoav Goldberg’s excellent primer [43]) and are interested in NLP in general or in a particular task. The main goal of this article is to get you up to speed with the relevant best practices so you can make meaningful contributions as soon as possible.

I will first give an overview of best practices that are relevant for most tasks. I will then outline practices that are relevant for the most common tasks, in particular classification, sequence labelling, natural language generation, and neural machine translation.

Disclaimer: Treating something as best practice is notoriously difficult: Best according to what? What if there are better alternatives? This post is based on my (necessarily incomplete) understanding and experience. In the following, I will only discuss practices that have been reported to be beneficial independently by at least two different groups. I will try to give at least two references for each best practice.

http://ruder.io/deep-learning-nlp-best-practices/

Let’s Encrypt and Google App Engine in 2017

So yesterday I set out to protect my Google App Engine (GAE) API with HTTPS. I had already set up my Google App Engine instance. I was vaguely familiar with Let’s Encrypt (https://letsencrypt.org/), although I had never used it. I also had a basic grasp of cryptographic primitives used in encryption and certificates.

There are a lot of blog posts out there about this along with a bunch of StackOverflow posts, all with varying instructions. The official ones by Google seem outdated, and the only thing I could find on Let’s Encrypt side was a bug saying something along the lines of “plz add support for GAE”. I tried a few, and by combining steps in some of them (and a lot of frustration and eventually-resolved confusion) I got it to work. I’m writing up this blog post primarily for myself (so that when I need to re-encrypt in a few months I know how), but hopefully someone out there might find it useful.

https://medium.com/google-cloud/lets-encrypt-and-google-app-engine-in-2017-7cfe0928768e

A hacker stole $31M of Ether — how it happened, and what it means for Ethereum

Yesterday, a hacker pulled off the second biggest heist in the history of digital currencies.

Around 12:00 PST, an unknown attacker exploited a critical flaw in the Parity multi-signature wallet on the Ethereum network, draining three massive wallets of over $31,000,000 worth of Ether in a matter of minutes. Given a couple more hours, the hacker could’ve made off with over $180,000,000 from vulnerable wallets.

But someone stopped them.

Having sounded the alarm bells, a group of benevolent white-hat hackers from the Ethereum community rapidly organized. They analyzed the attack and realized that there was no way to reverse the thefts, yet many more wallets were vulnerable. Time was of the essence, so they saw only one available option: hack the remaining wallets before the attacker did.

By exploiting the same vulnerability, the white-hats hacked all of the remaining at-risk wallets and drained their accounts, effectively preventing the attacker from reaching any of the remaining $150,000,000.

https://medium.freecodecamp.org/a-hacker-stole-31m-of-ether-how-it-happened-and-what-it-means-for-ethereum-9e5dc29e33ce

Credit card OCR with OpenCV and Python

Today’s blog post is a continuation of our recent series on Optical Character Recognition (OCR) and computer vision.

In a previous blog post, we learned how to install the Tesseract binary and use it for OCR. We then learned how to cleanup images using basic image processing techniques to improve the output of Tesseract OCR.

However, as I’ve mentioned multiple times in these previous posts, Tesseract should not be considered a general, off-the-shelf solution for Optical Character Recognition capable of obtaining high accuracy.

In some cases, it will work great — and in others, it will fail miserably.

A great example of such a use case is credit card recognition, where given an input image,
we wish to:

  1. Detect the location of the credit card in the image.
  2. Localize the four groupings of four digits, pertaining to the sixteen digits on the credit card.
  3. Apply OCR to recognize the sixteen digits on the credit card.
  4. Recognize the type of credit card (i.e., Visa, MasterCard, American Express, etc.).

In these cases, the Tesseract library is unable to correctly identify the digits (this is likely due to Tesseract not being trained on credit card example fonts). Therefore, we need to devise our own custom solution to OCR credit cards.

In today’s blog post I’ll be demonstrating how we can use template matching as a form of OCR to help us create a solution to automatically recognize credit cards and extract the associated credit card digits from images.

To learn more about using template matching for OCR with OpenCV and Python, just keep reading.

http://www.pyimagesearch.com/2017/07/17/credit-card-ocr-with-opencv-and-python/

An intuitive library to add plotting functionality to scikit-learn objects

Scikit-learn with plotting.

The quickest and easiest way to go from analysis…

roc_curves

…to this.

Scikit-plot is the result of an unartistic data scientist’s dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought.

Gaining insights is simply a lot easier when you’re looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.

That said, there are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

https://github.com/reiinakano/scikit-plot