Web Scraping With Python: Scrapy, SQL, Matplotlib To Gain Web Data Insights

Now I’m going to show you a comprehensive example how you can make raw web data useful and interesting using Scrapy, SQL and Matplotlib. It’s really supposed to be just an example because there are so many types of data out there and there are so many ways to analyze them and it really comes down to what is the best for you and your business.

Scraping And Analyzing Soccer Data

Briefly, this is the process I’m going to be using now to create this example project:

  • Task Zero: Requirements Of Reports

Figuring out what is really needed to be done. What are our (business) goals and what reports should we create? What would a proper analysis look like?

  • Task One: Data Fields And Source Of Data

Planning ahead what data fields and attributes we’ll need to satisfy the requirements. Also, looking for websites where I can get data from.

  • Task Two: Scrapy Spiders

Creating scrapers for the website(s) that we’ve chosen in the previous task.

  • Task Three: Process Data

Cleaning, standardizing, normalizing, structuring and storing data into a database.

  • Task Four: Analyze Data

Creating reports that help you make decisions or help you understand data more.

  • Task Five: Conclusions

Draw conclusions based on analysis. Understand data.

Storytime is over. Start working!



A Guide to Automating & Scraping the Web with JavaScript (Chrome + Puppeteer + Node JS)

Learn to Automate and Scrape the web with Headless Chrome

What Will We Learn?

In this tutorial you’ll learn how to automate and scrape the web with JavaScript. To do this, we’ll use Puppeteer. Puppeteer is a Node library API that allows us to control headless Chrome. Headless Chrome is a way to run the Chrome Browser without actually running Chrome.


Modern JavaScript Explained For Dinosaurs

Images from Dinosaur Comics by Ryan North

Learning modern JavaScript is tough if you haven’t been there since the beginning. The ecosystem is growing and changing so rapidly that it’s hard to understand the problems that different tools are trying to solve. I started programming in 1998 but only began to learn JavaScript seriously in 2014. At the time I remember coming across Browserify and staring at its tagline:

“Browserify lets you require(‘modules’) in the browser by bundling up all of your dependencies.”

I pretty much didn’t understand any word in this sentence, and struggled to make sense of how this would be helpful for me as a developer.

The goal of this article is to provide a historical context of how JavaScript tools have evolved to what they are today in 2017. We’ll start from the beginning and build an example website like the dinosaurs did — no tools, just plain HTML and JavaScript. Then we’ll introduce different tools incrementally to see the problems that they solve one at a time. With this historical context, you’ll be better able to learn and adapt to the ever-changing JavaScript landscape going forward. Let’s get started!


An Introduction to Implementing Neural Networks using TensorFlow


If you have been following Data Science / Machine Learning, you just can’t miss the buzz around Deep Learning and Neural Networks. Organizations are looking for people with Deep Learning skills wherever they can. From running competitions to open sourcing projects and paying big bonuses, people are trying every possible thing to tap into this limited pool of talent. Self driving engineers are being hunted by the big guns in automobile industry, as the industry stands on the brink of biggest disruption it faced in last few decades!

If you are excited by the prospects deep learning has to offer, but have not started your journey yet – I am here to enable it. Starting with this article, I will write a series of articles on deep learning covering the popular Deep Learning libraries and their hands-on implementation.

In this article, I will introduce TensorFlow to you. After reading this article you will be able to understand application of neural networks and use TensorFlow to solve a real life problem. This article will require you to know the basics of neural networks and have familiarity with programming. Although the code in this article is in python, I have focused on the concepts and stayed as language-agnostic as possible.

Let’s get started!



Learning Go by porting a medium-sized web backend from Python

Summary: To learn Go, I ported the backend of a small site I run from Python to Go, and had a fun, pain-free experience doing so.

I’ve been wanting to learn Go for a while now: I like the philosophy of a language that’s small, has a gentle learning curve, and compiles very fast (for a statically-typed language). What pushed me over the line to actually go and do it was seeing more and more fast, robust tools that are written in Go – Docker and ngrok are two I’ve used recently.

The philosophy of Go is not to everyone’s taste (no exceptions, no user-defined generics, etc), but it fit my mental model well. Simple, speedy, does things the obvious way. During the port, I was especially impressed with how robust the standard library and tooling was.


Javalin – Simple REST APIs for Java and Kotlin

Javalin is a very lightweight web framework for Kotlin and Java, inspired by Sparkjava and koa.js. Javalin is written in Kotlin with a few functional interfaces written in Java. This was necessary to provide an enjoyable and near identical experience for both Kotlin and Java developers.

Javalin is really more library than framework; you don’t need to extend or implement anything and there are very few “Javalin-concepts” you have to learn.

REST API Simplicity

Javalin started as a fork of the Spark Java and Kotlin web framework but quickly turned into a ground-up rewrite influenced by koa.js. All of these web frameworks are inspired by the modern micro web framework grandfather: Sinatra, so if you’re coming from Ruby then Javalin shouldn’t feel too unfamiliar.

Javalin is not aiming to be a full web framework, but rather just a lightweight REST API library. There is no concept of MVC, but there is support for template engines, websockets, and static file serving for convenience. This allows you to use Javalin for both creating your RESTful API backend, as well as serving an index.html with static resources (in case you’re creating an SPA). This is practical if you don’t want to deploy to apache or nginx in addition to your Javalin service. If you wish to use Javalin to create web-pages instead of just REST APIs, there are some simple template engine wrappers available for a quick and easy setup.

If you want a more established Java web framework, you could try Spark. If you want a more established Kotlin web framework, you can try the Spark Kotlin Wrapper.



Using the TensorFlow API: An Introductory Tutorial Series

This post summarizes and links to a great multi-part tutorial series on learning the TensorFlow API for building a variety of neural networks, as well as a bonus tutorial on backpropagation from the beginning.

By Erik Hallström, Deep Learning Research Engineer.

Editor’s note: The TensorFlow API has undergone changes since this series was first published. However, the general ideas are the same, and an otherwise well-structured tutorial such as this provides a great jumping off point and opportunity to consult the API documentation to identify and implement said changes.

Schematic of RNN processing sequential over time
Schematic of a RNN processing sequential data over time.