Python at Netflix

As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members. We use and contribute to many open-source Python packages, some of which are mentioned below.


Developer Experience Lessons Operating a Serverless-like Platform At Netflix

The Netflix API is based on a dynamic scripting platform that handles thousands of changes per day. This platform allows our client developers to create a customized API experience on over a thousand device types by executing server side adapter code in response to HTTP requests. Developers are only responsible for the adapter code they write; they do not have to worry about infrastructure concerns related to server management and operations. To these developers, the scripting platform in effect, provides an experience similar to that offered by serverless or FaaS platforms. It is important to note that the similarities are limited to the developer experience (DevEx); the runtime is a custom implementation that is not designed to support general purpose serverless use cases. A few years of developing and operating this platform for a diverse set of developers has yielded several DevEx learnings for us…

In Part 1 of this series, we outlined key learnings the Edge Developer Experience team gained from operating the API dynamic scripting platform which provides a serverless or FaaS like experience for client application developers. We addressed the concerns around getting code ready for production deployment. Here, we look at what it takes to deploy it safely and operate it on an ongoing basis…

Distributed delay queues based on Dynomite

Netflix’s Content Platform Engineering runs a number of business processes which are driven by asynchronous orchestration of micro-services based tasks, and queues form an integral part of the orchestration layer amongst these services.
Few examples of these processes are:
  • IMF based content ingest from our partners
  • Process of setting up new titles within Netflix
  • Content Ingest, encode and deployment to CDN
Traditionally, we have been using a Cassandra based queue recipe along with Zookeeper for distributed locks, since Cassandra is the de facto storage engine at Netflix. Using Cassandra for queue like data structure is a known anti-pattern, also using a global lock on queue while polling, limits the amount of concurrency on the consumer side as the lock ensures only one consumer can poll from the queue at a time.  This can be addressed a bit by sharding the queue but the concurrency is still limited within the shard.  As we started to build out a new orchestration engine, we looked at Dynomite for handling the task queues.
We wanted the following in the queue recipe:
  1. Distributed
  2. No external locks (e.g. Zookeeper locks)
  3. Highly concurrent
  4. At-least-once delivery semantics
  5. No strict FIFO
  6. Delayed queue (message is not taken out of the queue until some time in the future)
  7. Priorities within the shard
The queue recipe described here is used to build a message broker server that exposes various operations (push, poll, ack etc.) via REST endpoints and can potentially be exposed by other transports (e.g. gRPC).  Today, we are open sourcing the queue recipe.

Netflix Billing Migration to AWS

On January 4, 2016, right before Netflix expanded itself into 130 new countries, Netflix Billing infrastructure became 100% AWS cloud-native. Migration of Billing infrastructure from Netflix Data Center(DC) to AWS Cloud was part of a broader initiative. This prior blog post is a great read that summarizes our strategic goals and direction towards AWS migration.  

For a company, its billing solution is its financial lifeline, while at the same time, it is a visible representation of a company’s attitude towards its customers. A great customer experience is one of Netflix’s core values. Considering the sensitive nature of Billing for its direct impact on our monetary relationship with our members as well on financial reporting, this migration needed to be handled as delicately as possible. Our primary goal was to define a secure, resilient and granular path for migration to the Cloud, without impacting the member experience.

This blog entry discusses our approach to migration of a complex Billing ecosystem from Netflix Data Center(DC) into AWS Cloud.

Netflix Open Connect Appliance Software

“Netflix delivers streaming content using a combination of intelligent clients, a central control system, and a network of Open Connect appliances.

When designing the Open Connect Appliance Software, we focused on these fundamental design goals:

Use of Open Source software
Ability to efficiently read from disk and write to network sockets
High-performance HTTP delivery
Ability to gather routing information via BGP…”