Introduction to TensorFlow Datasets and Estimators

Datasets and Estimators are two key TensorFlow features you should use:

  • Datasets: The best practice way of creating input pipelines (that is, reading data into your program).
  • Estimators: A high-level way to create TensorFlow models. Estimators include pre-made models for common machine learning tasks, but you can also use them to create your own custom models.


Think Bayes

Think Bayes is an introduction to Bayesian statistics using computational methods.

The premise of this book, and the other books in the Think X series, is that if you know how to program, you can use that skill to learn other topics.

Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.

I think this presentation is easier to understand, at least for people with programming skills. It is also more general, because when we make modeling decisions, we can choose the most appropriate model without worrying too much about whether the model lends itself to conventional analysis. Also, it provides a smooth development path from simple examples to real-world problems.

Think Bayes is a Free Book. It is available under the Creative Commons Attribution-NonCommercial 3.0 Unported License, which means that you are free to copy, distribute, and modify it, as long as you attribute the work and don’t use it for commercial purposes.

Other Free Books by Allen Downey are available from Green Tea Press.

ZFS from a MySQL perspective

Since the purpose of a database system is to store data, there is close relationship with the filesystem. As MySQL consultants, we always look at the filesystems for performance tuning opportunities. The most common choices in term of filesystems are XFS and EXT4, on Linux it is exceptional to encounter another filesystem. Both XFS and EXT4 have pros and cons but their behaviors are well known and they perform well. They perform well but they are not without shortcomings.

Over the years, we have developed a bunch of tools and techniques to overcome these shortcomings. For example, since they don’t allow a consistent view of the filesystem, we wrote tools like Xtrabackup to backup a live MySQL database. Another example is the InnoDB double write buffer. The InnoDB double write buffer is required only because neither XFS nor EXT4 is transactional. There is one filesystem which offers nearly all the features we need, ZFS.  ZFS is arguably the most advanced filesystem available on Linux. Maybe it is time to reconsider the use of ZFS with MySQL.

ZFS on Linux or ZoL (from the OpenZFS project), has been around for quite a long time now. I first started using ZoL back in 2012, before it was GA (general availability), in order to solve a nearly impossible challenge to backup a large database (~400 GB) with a mix of InnoDB and MyISAM tables. Yes, ZFS allows that very easily, in just a few seconds. As of 2017, ZoL has been GA for more than 3 years and most of the issues that affected it in the early days have been fixed. ZFS is also GA in FreeBSD, illumos, OmniOS and many others.

This post will hopefully be the first of many posts, devoted to the use of ZFS with MySQL. The goal here is not to blindly push for ZFS but to see when ZFS can help solve real problems. We will first examine ZFS and try to draw parallels with the architecture of MySQL. This will help us to better understand how ZFS works and behaves. Future posts will be devoted to more specific topics like performance, PXC, backups, compression, database operations, bad and poor use cases and sample configurations.

Web Scraping With Python: Scrapy, SQL, Matplotlib To Gain Web Data Insights

Now I’m going to show you a comprehensive example how you can make raw web data useful and interesting using Scrapy, SQL and Matplotlib. It’s really supposed to be just an example because there are so many types of data out there and there are so many ways to analyze them and it really comes down to what is the best for you and your business.

Scraping And Analyzing Soccer Data

Briefly, this is the process I’m going to be using now to create this example project:

  • Task Zero: Requirements Of Reports

Figuring out what is really needed to be done. What are our (business) goals and what reports should we create? What would a proper analysis look like?

  • Task One: Data Fields And Source Of Data

Planning ahead what data fields and attributes we’ll need to satisfy the requirements. Also, looking for websites where I can get data from.

  • Task Two: Scrapy Spiders

Creating scrapers for the website(s) that we’ve chosen in the previous task.

  • Task Three: Process Data

Cleaning, standardizing, normalizing, structuring and storing data into a database.

  • Task Four: Analyze Data

Creating reports that help you make decisions or help you understand data more.

  • Task Five: Conclusions

Draw conclusions based on analysis. Understand data.

Storytime is over. Start working!

An Introduction to Implementing Neural Networks using TensorFlow


If you have been following Data Science / Machine Learning, you just can’t miss the buzz around Deep Learning and Neural Networks. Organizations are looking for people with Deep Learning skills wherever they can. From running competitions to open sourcing projects and paying big bonuses, people are trying every possible thing to tap into this limited pool of talent. Self driving engineers are being hunted by the big guns in automobile industry, as the industry stands on the brink of biggest disruption it faced in last few decades!

If you are excited by the prospects deep learning has to offer, but have not started your journey yet – I am here to enable it. Starting with this article, I will write a series of articles on deep learning covering the popular Deep Learning libraries and their hands-on implementation.

In this article, I will introduce TensorFlow to you. After reading this article you will be able to understand application of neural networks and use TensorFlow to solve a real life problem. This article will require you to know the basics of neural networks and have familiarity with programming. Although the code in this article is in python, I have focused on the concepts and stayed as language-agnostic as possible.

Let’s get started!


Using the TensorFlow API: An Introductory Tutorial Series

This post summarizes and links to a great multi-part tutorial series on learning the TensorFlow API for building a variety of neural networks, as well as a bonus tutorial on backpropagation from the beginning.

By Erik Hallström, Deep Learning Research Engineer.

Editor’s note: The TensorFlow API has undergone changes since this series was first published. However, the general ideas are the same, and an otherwise well-structured tutorial such as this provides a great jumping off point and opportunity to consult the API documentation to identify and implement said changes.

Schematic of RNN processing sequential over time
Schematic of a RNN processing sequential data over time.

How to Build Your Own Blockchain

How to Build Your Own Blockchain Part 1 — Creating, Storing, Syncing, Displaying, Mining, and Proving Work

I can actually look up how long I have by logging into my Coinbase account, looking at the history of the Bitcoin wallet, and seeing this transaction I got back in 2012 after signing up for Coinbase. Bitcoin was trading at about $6.50 per. If I still had that 0.1 BTC, that’d be worth over $500 at the time of this writing. In case people are wondering, I ended up selling that when a Bitcoin was worth $2000. So I only made $200 out of it rather than the $550 now. Should have held on…

How to Build Your Own Blockchain Part 2 — Syncing Chains From Different Nodes

Welcome to part 2 of the JackBlockChain, where I write some code to introduce the ability for different nodes to communicate.

Initially my goal was to write about nodes syncing up and talking with each other, along with mining and broadcasting their winning blocks to other nodes.   In the end, I realized that the amount of code and explanation to accomplish all of that was way too big for one post. Because of this, I decided to make part 2 only about nodes beginning the process of talking for the future…

How to Build Your Own Blockchain Part 3 — Writing Nodes that Mine and Talk

Hello all and welcome to Part 3 of building the JackBlockChain — JBC. Quick past intro, in Part 1 I coded and went over the top level math and requirements for a single node to mine its own blockchain; I create new blocks that have the valid information, save them to a folder, and then start mining a new block. Part 2 covered having multiple nodes and them having the ability to sync. If node 1 was doing the mining on its own and node 2 wanted to grab node 1’s blockchain, it can now do so…