AMP + Progressive Web Apps: Start fast, stay engaged – Google I/O 2016

Alex Russell on AMP + Progressive Web Apps: Start fast, stay engaged.

AMP delivers outstanding page-load performance for users browsing content on the mobile web, which is hugely important on limited or flaky networks. AMP gets content in front of users fast.

Progressive Web Apps deliver reliable performance for re-visits to sites thanks to Service Workers and the App Shell architecture. This technique allows sites to deliver rich experiences without worrying about networks.

Until now, however, these approaches for accelerating the mobile web have appeared to be in conflict. What if it were possible to use them in conjunction to deliver fast initial loading and reliable second-visit performance, as well as advanced features like offline reading and richer UI treatment?

Come learn about how to make AMP-based PWAs and hear about how this architecture is working for real-world publishers today.

Amazon Web Services in Plain English

Hey, have you heard of the new AWS services: ContainerCache, ElastiCast and QR72? Of course not, I just made those up.

But with 50 plus opaquely named services, we decided that enough was enough and that some plain english descriptions were needed.

How Discord Stores Billions of Messages

Discord continues to grow faster than we expected and so does our user-generated content. With more users comes more chat messages. In July, we announced 40 million messages a day, in December we announced 100 million, and as of this blog post we are well past 120 million. We decided early on to store all chat history forever so users can come back at any time and have their data available on any device. This is a lot of data that is ever increasing in velocity, size, and must remain available. How do we do it? Cassandra!

Reindexing Data with Elasticsearch

Sooner or later, you’ll run into a problem of reindexing the data of your Elasticsearch instances. When we do Elasticsearch consulting for clients we always look at whether they have some way to efficiently reindex previously indexed data. The reasons for reindexing vary – from data type changes, analysis changes, to introduction of new fields that that need to be populated. No matter the case, you may either reindex from your source of truth or treat your Elasticsearch instance as such. Up to Elasticsearch 2.3 we had to use external tools to help us with this operation, like Logstash or stream2es. We even wrote about how to approach reindexing of data with Logstash. However, today we would like to look at the new functionality that will be added to Elasticsearch 2.3 – the re-index API.

The pre-requisites are quite low – you only need Elasticsearch 2.3 (not yet officially released as of this writing) and you need to be able to run a command on it. And that’s it, nothing more is needed and Elasticsearch will do the rest for us.

Caching at Reddit

Performance matters. One of the first tools we as developers reach for when looking to get more performance out of a system is caching. As Reddit has grown in users and response times have improved, the amount of caching has grown to be quite large as well.

In this post we’ll talk about some of the nuts-and-bolts numbers of Reddit’s caching infrastructure—the number of instances, size of instances, and overall throughput. We hope that sharing this information may help others gauge what type of performance and sizing they can expect when building similar clusters. At the very least, we hope you’ll find it interesting to see a bit more about how Reddit works under the hood.

We’ll also go over the Reddit-specific type of work our caches do, how we use mcrouter to manage our caches more effectively, and the custom monitoring (MemcachedSlabCollector and mcsauna) we’ve written to help us understand what’s going on behind the scenes. We’ll also talk about some of the more subtle issues that we’ve run into when deploying changes to our caches.

Dismissing Python Garbage Collection at Instagram

By dismissing the Python garbage collection (GC) mechanism, which reclaims memory by collecting and freeing unused data, Instagram can run 10% more efficiently. Yes, you heard it right! By disabling GC, we can reduce the memory footprint and improve the CPU LLC cache hit ratio. If you’re interested in knowing why, buckle up!

Monetize your APIs in AWS Marketplace using API Gateway

Amazon API Gateway helps you quickly build highly scalable, secure, and robust APIs. Today, we are announcing an integration of API Gateway with AWS Marketplace. You can now easily monetize your APIs built with API Gateway, market them directly to AWS customers, and reuse AWS bill calculation and collection mechanisms.

AWS Marketplace lists over 3,500 software listings across 35 product categories with over 100K active customers. With the recent announcement of SaaS Subscriptions, API sellers can, for the first time, take advantage of the full suite of Marketplace features, including customer acquisition, unified billing, and reporting. For AWS customers, this means that they can now subscribe to API products through AWS Marketplace and pay on an existing AWS bill. This gives you direct access to the AWS customer base.

To get started, identify the API on API Gateway that you want to sell on AWS Marketplace. Next, package that API into usage plans. Usage plans allow you to set throttling limits and quotas to your APIs and allow you to control third-party usage of your API. You can create multiple usage plans with different limits (e.g., Silver, Gold, Platinum) and offer them as different API products on AWS Marketplace.

20 Python libraries you aren’t using (but should)

Discover lesser-known Python libraries that are easy to install and use, cross-platform, and applicable to more than one domain.

The Python ecosystem is vast and far-reaching in both scope and depth. Starting out in this crazy, open-source forest is daunting, and even with years of experience, it still requires continual effort to keep up-to-date with the best libraries and techniques.

In this report we take a look at some of the lesser-known Python libraries and tools. Python itself already includes a huge number of high-quality libraries; collectively these are called the standard library. The standard library receives a lot of attention, but there are still some libraries within it that should be better known. We will start out by discussing several, extremely useful tools in the standard library that you may not know about.

We’re also going to discuss several exciting, lesser-known libraries from the third-party ecosystem. Many high-quality third-party libraries are already well-known, including Numpy and Scipy, Django, Flask, and Requests; you can easily learn more about these libraries by searching for information online. Rather than focusing on those standouts, this report is instead going to focus on several interesting libraries that are growing in popularity.

11 IPython Tutorials for Data Science and Machine Learning

The 11 IPythonTutorials
  • Example Machine Learning – Notebook by Randal S. Olson, supported by Jason H. Moore. University of Pennsylvania Institute for Bioinformatics
  • Python Machine Learning Book – 400 pages rich in useful material just about everything you need to know to get started with machine learning … from theory to the actual code that you can directly put into action!
  • Learn Data Science – The initial beta release consists of four major topics: Linear Regression, Logistic Regression, Random Forests, K-Means Clustering
  • Machine Learning – This repo contains a collection of IPython notebooks detailing various machine learning algorithms. In general, the mathematics follows that presented by Dr. Andrew Ng’s Machine Learning course taught at Stanford University (materials available from ITunes U, Stanford Machine Learning), Dr. Tom Mitchell’s course at Carnegie Mellon, and Christopher M. Bishop’s “Pattern Recognition And Machine Learning”.
  • Research Computing Meetup – Linux and Python for data analysis (tutorials). University of Colorado, Computational Science and Engineering.
  • Theano Tutorial – A brief IPython notebook-based tutorial on basic Theano concepts, including a toy multi-layer perceptron example..
  • IPython Theano Tutorials – A collection of tutorials in ipynb format that illustrate how to do various things in Theano.
  • IPython Notebooks – Demonstrations and use cases for many of the most widely used “data science” Python libraries. Implementations of the exercises presented in Andrew Ng’s “Machine Learning” class on Coursera. Implementations of the assignments from Google’s Udacity course on deep learning.

Tutorial: Deep Learning in PyTorch

This Blogpost Will Cover:

  • Part 1: PyTorch Installation
  • Part 2: Matrices and Linear Algebra in PyTorch
  • Part 3: Building a Feedforward Network (starting with a familiar one)
  • Part 4: The State of PyTorch

Pre-Requisite Knowledge:

  • Simple Feedforward Neural Networks (Tutorial)
  • Basic Gradient Descent (Tutorial)

Torch is one of the most popular Deep Learning frameworks in the world, dominating much of the research community for the past few years (only recently being rivaled by major Google sponsored frameworks Tensorflow and Keras). Perhaps its only drawback to new users has been the fact that it requires one to know Lua, a language that used to be very uncommon in the Machine Learning community. Even today, this barrier to entry can seem a bit much for many new to the field, who are already in the midst of learning a tremendous amount, much less a completely new programming language.

However, thanks to the wonderful and billiant Hugh Perkins, Torch recently got a new face, PyTorch… and it’s much more accessible to the python hacker turned Deep Learning Extraordinare than it’s Luariffic cousin. I have a passion for tools that make Deep Learning accessible, and so I’d like to lay out a short “Unofficial Startup Guide” for those of you interested in taking it for a spin.