Introduction to TensorFlow Datasets and Estimators

Datasets and Estimators are two key TensorFlow features you should use:

  • Datasets: The best practice way of creating input pipelines (that is, reading data into your program).
  • Estimators: A high-level way to create TensorFlow models. Estimators include pre-made models for common machine learning tasks, but you can also use them to create your own custom models.


How Cargo Cult Bayesians encourage Deep Learning Alchemy

There is a struggle today for the heart and minds of Artificial Intelligence. It’s a complex “Game of Thrones” conflict that involves many houses (or tribes) (see: “The Many Tribes of AI”). The two waring factions I focus on today is on the practice Cargo Cult science in the form of Bayesian statistics and in the practice of alchemy in the form of experimental Deep Learning.

For the uninitiated, let’s talk about what Cargo Cult science means. Cargo Cult science is a phrase coined by Richard Feynman to illustrate a practice in science of not working from fundamentally sound first principles. Here is Richard Feynman’s original essay on “Cargo Cult Science”. If you’ve never read it before, it great and refreshing read. I read this in my youth while studying physics. I am unsure if its required reading for physicists, but a majority of physicists are well aware of this concept.

Think Bayes

Think Bayes is an introduction to Bayesian statistics using computational methods.

The premise of this book, and the other books in the Think X series, is that if you know how to program, you can use that skill to learn other topics.

Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.

I think this presentation is easier to understand, at least for people with programming skills. It is also more general, because when we make modeling decisions, we can choose the most appropriate model without worrying too much about whether the model lends itself to conventional analysis. Also, it provides a smooth development path from simple examples to real-world problems.

Think Bayes is a Free Book. It is available under the Creative Commons Attribution-NonCommercial 3.0 Unported License, which means that you are free to copy, distribute, and modify it, as long as you attribute the work and don’t use it for commercial purposes.

Other Free Books by Allen Downey are available from Green Tea Press.

Web Scraping With Python: Scrapy, SQL, Matplotlib To Gain Web Data Insights

Now I’m going to show you a comprehensive example how you can make raw web data useful and interesting using Scrapy, SQL and Matplotlib. It’s really supposed to be just an example because there are so many types of data out there and there are so many ways to analyze them and it really comes down to what is the best for you and your business.

Scraping And Analyzing Soccer Data

Briefly, this is the process I’m going to be using now to create this example project:

  • Task Zero: Requirements Of Reports

Figuring out what is really needed to be done. What are our (business) goals and what reports should we create? What would a proper analysis look like?

  • Task One: Data Fields And Source Of Data

Planning ahead what data fields and attributes we’ll need to satisfy the requirements. Also, looking for websites where I can get data from.

  • Task Two: Scrapy Spiders

Creating scrapers for the website(s) that we’ve chosen in the previous task.

  • Task Three: Process Data

Cleaning, standardizing, normalizing, structuring and storing data into a database.

  • Task Four: Analyze Data

Creating reports that help you make decisions or help you understand data more.

  • Task Five: Conclusions

Draw conclusions based on analysis. Understand data.

Storytime is over. Start working!

An Introduction to Implementing Neural Networks using TensorFlow


If you have been following Data Science / Machine Learning, you just can’t miss the buzz around Deep Learning and Neural Networks. Organizations are looking for people with Deep Learning skills wherever they can. From running competitions to open sourcing projects and paying big bonuses, people are trying every possible thing to tap into this limited pool of talent. Self driving engineers are being hunted by the big guns in automobile industry, as the industry stands on the brink of biggest disruption it faced in last few decades!

If you are excited by the prospects deep learning has to offer, but have not started your journey yet – I am here to enable it. Starting with this article, I will write a series of articles on deep learning covering the popular Deep Learning libraries and their hands-on implementation.

In this article, I will introduce TensorFlow to you. After reading this article you will be able to understand application of neural networks and use TensorFlow to solve a real life problem. This article will require you to know the basics of neural networks and have familiarity with programming. Although the code in this article is in python, I have focused on the concepts and stayed as language-agnostic as possible.

Let’s get started!


Using the TensorFlow API: An Introductory Tutorial Series

This post summarizes and links to a great multi-part tutorial series on learning the TensorFlow API for building a variety of neural networks, as well as a bonus tutorial on backpropagation from the beginning.

By Erik Hallström, Deep Learning Research Engineer.

Editor’s note: The TensorFlow API has undergone changes since this series was first published. However, the general ideas are the same, and an otherwise well-structured tutorial such as this provides a great jumping off point and opportunity to consult the API documentation to identify and implement said changes.

Schematic of RNN processing sequential over time
Schematic of a RNN processing sequential data over time.

The Perceptron Algorithm explained with Python code

Most tasks in Machine Learning can be reduced to classification tasks. For example, we have a medical dataset and we want to classify who has diabetes (positive class) and who doesn’t (negative class). We have a dataset from the financial world and want to know which customers will default on their credit (positive class) and which customers will not (negative class).
To do this, we can train a Classifier with a ‘training dataset’ and after such a Classifier is trained (we have determined  its model parameters) and can accurately classify the training set, we  can use it to classify new data (test set). If the training is done properly, the Classifier should predict the class probabilities of the new data with a similar accuracy.

There are three popular Classifiers which use three different mathematical approaches to classify data. Previously we have looked at the first two of these; Logistic Regression and the Naive Bayes classifier. Logistic Regression uses a functional approach to classify data, and the Naive Bayes classifier uses a statistical (Bayesian) approach to classify data.

Logistic Regression assumes there is some function f(X) which forms a correct model of the dataset (i.e. it maps the input values correctly to the output values). This function is defined by its parameters \theta_1, \theta_2, .... We can use the gradient descent method to find the optimum values of these parameters.

The Naive Bayes method is much simpler than that; we do not have to optimize a function, but can calculate the Bayesian (conditional) probabilities directly from the training dataset. This can be done quiet fast (by creating a hash table containing the probability distributions of the features) but is generally less accurate.

Classification of data can also be done via a third way, by using a geometrical approach. The main idea is to find a line, or a plane, which can separate the two classes in their feature space. Classifiers which are using a geometrical approach are the Perceptron and the SVM (Support Vector Machines) methods.

Below we will discuss the Perceptron classification algorithm. Although Support Vector Machines is used more often, I think a good understanding of the Perceptron algorithm is essential to understanding Support Vector Machines and Neural Networks.

For the rest of the blog-post, click here.