Datasets and Estimators are two key TensorFlow features you should use:
- Datasets: The best practice way of creating input pipelines (that is, reading data into your program).
- Estimators: A high-level way to create TensorFlow models. Estimators include pre-made models for common machine learning tasks, but you can also use them to create your own custom models.
There is a struggle today for the heart and minds of Artificial Intelligence. It’s a complex “Game of Thrones” conflict that involves many houses (or tribes) (see: “The Many Tribes of AI”). The two waring factions I focus on today is on the practice Cargo Cult science in the form of Bayesian statistics and in the practice of alchemy in the form of experimental Deep Learning.
For the uninitiated, let’s talk about what Cargo Cult science means. Cargo Cult science is a phrase coined by Richard Feynman to illustrate a practice in science of not working from fundamentally sound first principles. Here is Richard Feynman’s original essay on “Cargo Cult Science”. If you’ve never read it before, it great and refreshing read. I read this in my youth while studying physics. I am unsure if its required reading for physicists, but a majority of physicists are well aware of this concept.
Think Bayes is an introduction to Bayesian statistics using computational methods.
The premise of this book, and the other books in the Think X series, is that if you know how to program, you can use that skill to learn other topics.
Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.
I think this presentation is easier to understand, at least for people with programming skills. It is also more general, because when we make modeling decisions, we can choose the most appropriate model without worrying too much about whether the model lends itself to conventional analysis. Also, it provides a smooth development path from simple examples to real-world problems.
Think Bayes is a Free Book. It is available under the Creative Commons Attribution-NonCommercial 3.0 Unported License, which means that you are free to copy, distribute, and modify it, as long as you attribute the work and don’t use it for commercial purposes.
Other Free Books by Allen Downey are available from Green Tea Press.
For the latest episode of Stack Stories, we did something a bit different. We decided to focus on the origin story of the one of the most popular technologies in the software development world: React. We sat down with Pete Hunt, one of the original creators of React, now CEO at Smyte, to get the untold, in-depth story of why React was first created, how it gained adoption within Facebook due to the Instagram acquisition, and it’s eventual release to the public.
Listen to the interview in full or check out the transcript below (edited for brevity).
Check out Stack Stories’ sponsor STRV at strv.com/stackshare.
Since the purpose of a database system is to store data, there is close relationship with the filesystem. As MySQL consultants, we always look at the filesystems for performance tuning opportunities. The most common choices in term of filesystems are XFS and EXT4, on Linux it is exceptional to encounter another filesystem. Both XFS and EXT4 have pros and cons but their behaviors are well known and they perform well. They perform well but they are not without shortcomings.
Over the years, we have developed a bunch of tools and techniques to overcome these shortcomings. For example, since they don’t allow a consistent view of the filesystem, we wrote tools like Xtrabackup to backup a live MySQL database. Another example is the InnoDB double write buffer. The InnoDB double write buffer is required only because neither XFS nor EXT4 is transactional. There is one filesystem which offers nearly all the features we need, ZFS. ZFS is arguably the most advanced filesystem available on Linux. Maybe it is time to reconsider the use of ZFS with MySQL.
ZFS on Linux or ZoL (from the OpenZFS project), has been around for quite a long time now. I first started using ZoL back in 2012, before it was GA (general availability), in order to solve a nearly impossible challenge to backup a large database (~400 GB) with a mix of InnoDB and MyISAM tables. Yes, ZFS allows that very easily, in just a few seconds. As of 2017, ZoL has been GA for more than 3 years and most of the issues that affected it in the early days have been fixed. ZFS is also GA in FreeBSD, illumos, OmniOS and many others.
This post will hopefully be the first of many posts, devoted to the use of ZFS with MySQL. The goal here is not to blindly push for ZFS but to see when ZFS can help solve real problems. We will first examine ZFS and try to draw parallels with the architecture of MySQL. This will help us to better understand how ZFS works and behaves. Future posts will be devoted to more specific topics like performance, PXC, backups, compression, database operations, bad and poor use cases and sample configurations.
Now I’m going to show you a comprehensive example how you can make raw web data useful and interesting using Scrapy, SQL and Matplotlib. It’s really supposed to be just an example because there are so many types of data out there and there are so many ways to analyze them and it really comes down to what is the best for you and your business.
Scraping And Analyzing Soccer Data
Briefly, this is the process I’m going to be using now to create this example project:
- Task Zero: Requirements Of Reports
Figuring out what is really needed to be done. What are our (business) goals and what reports should we create? What would a proper analysis look like?
- Task One: Data Fields And Source Of Data
Planning ahead what data fields and attributes we’ll need to satisfy the requirements. Also, looking for websites where I can get data from.
Creating scrapers for the website(s) that we’ve chosen in the previous task.
Cleaning, standardizing, normalizing, structuring and storing data into a database.
Creating reports that help you make decisions or help you understand data more.
Draw conclusions based on analysis. Understand data.
Storytime is over. Start working!