A Guide to Automating & Scraping the Web with JavaScript (Chrome + Puppeteer + Node JS)

Learn to Automate and Scrape the web with Headless Chrome

What Will We Learn?

In this tutorial you’ll learn how to automate and scrape the web with JavaScript. To do this, we’ll use Puppeteer. Puppeteer is a Node library API that allows us to control headless Chrome. Headless Chrome is a way to run the Chrome Browser without actually running Chrome.



Getting Started with Headless Chrome


Headless Chrome is a way to run the Chrome browser in a headless environment. Essentially, running Chrome without chrome! It brings all modern web platform features provided by Chromium and the Blink rendering engine to the command line.

Why is that useful?

A headless browser is a great tool for automated testing and server environments where you don’t need a visible UI shell. For example, you may want to run some tests against a real web page, create a PDF of it, or just inspect how the browser renders an URL.


DeepBreath: Preventing angry emails with machine learning

We all have bad days. Maybe deadlines are slipping, your cat destroyed your couch (again), or you just have a regular case of the Mondays. Whatever the source of your stress, you hit “Send” on a Gmail draft at work, and you immediately regret it. No matter what, you never want to send excessively emotional or angry emails to coworkers, clients or even friends.

Inspired by many other fun use cases of Google Cloud Natural Language API, we wrote a Chrome plugin called DeepBreath that automatically sends all your saved drafts to Cloud Natural Language API for sentiment analysis. The API automatically detects how positive or negative any given piece of text is with a simple API call, so a plugin to solve the angry email problem was very easy and quick to build for Gmail, and could also be easily repurposed for any other places you write text (forums, project management tools, etc). Please see “A Note On User Data Privacy” below before considering making these extensions.

If your email is of sufficiently negative magnitude, it will automatically display a warning so you can consider a rewrite before you hit send, rather than after. The warning gives you a chance to take a literal deep breath and reconsider the contents of the email.

How does it work? Every time a draft is saved, the body of the draft is sent to the analyzeSentiment API endpoint. A score (the positive or negative sentiment) and the magnitude (how strong the feeling is) is returned. You can read more about score and magnitude in the docs. If the score is sufficiently negative and the magnitude sufficiently strong, a warning pops up. Only one warning pops up per draft.


AMP + Progressive Web Apps: Start fast, stay engaged – Google I/O 2016

Alex Russell on AMP + Progressive Web Apps: Start fast, stay engaged.

AMP delivers outstanding page-load performance for users browsing content on the mobile web, which is hugely important on limited or flaky networks. AMP gets content in front of users fast.

Progressive Web Apps deliver reliable performance for re-visits to sites thanks to Service Workers and the App Shell architecture. This technique allows sites to deliver rich experiences without worrying about networks.

Until now, however, these approaches for accelerating the mobile web have appeared to be in conflict. What if it were possible to use them in conjunction to deliver fast initial loading and reliable second-visit performance, as well as advanced features like offline reading and richer UI treatment?

Come learn about how to make AMP-based PWAs and hear about how this architecture is working for real-world publishers today.


Configuring & Optimizing WebSocket Compression

“Good news, browser support for the latest draft of “Compression Extensions” for WebSocket protocol — a much needed and overdue feature — will be landing in early 2014: Chrome M32+ (available in Canary already), and Firefox and Webkit implementations should follow.

Specifically, it enables the client and server to negotiate a compression algorithm and its parameters, and then selectively apply it to the data payloads of each WebSocket message: the server can compress delivered data to the client, and the client can compress data sent to the server…”