The mechanism for uploading files from a browser has been around since the early days of the Internet. In the server-full environment it’s very easy to use Django, Express, or any other popular framework. It’s not an exciting topic — until you experience the scaling problem.
Imagine this scenario — you have an application that uploads files. All is well until the site suddenly gains popularity. Instead of handling a gigabyte of uploads a month, usage grows to 100Gb an hour for the month leading up to tax day. Afterwards, the usage drops back down again for another year. This is exactly the problem we had to solve.
File uploading at scale gobbles up your resources — network bandwidth, CPU, storage. All this data is ingested through your web server(s), which you then have to scale — if you’re lucky this means auto-scaling in AWS, but if you’re not in the cloud you’ll also have to contend with the physical network bottleneck issues.
You can also face some difficult race conditions if your server fails in the middle of handling the uploaded file. Did the file make to its end destination? What was the state of the processing? It can be very hard to replay the steps to failure or know the state of transactions when the server is overloaded.
Fortunately, this particular problem turns out to be a great use case for serverless — as you can eliminate the scaling issues entirely. For mobile and web apps with unpredictable demand, you can simply allow the application to upload the file directly to S3. This has the added benefit of enabling an https endpoint for the upload, which is critical for keeping the file’s contents secure in transit.
All this sounds great — but how does this work in practice when the server is no longer there to do the authentication and intermediary legwork?
Similarity computation is a very common task in real-world machine learning and data mining problems such as recommender systems, spam detection, online advertising etc. Consider a tweet recommendation problem where one has to find tweets similar to the tweet user previously clicked. This problem becomes extremely challenging when there are billions of tweets created each day.
In this post, we will discuss the two most common similarity metric, namely Jaccard similarity and Cosine similarity; and Locality Sensitive Hashing based approximation of those metrics.
The premise behind autoscaling in AWS is simple: you can maximize your ability to handle load spikes and minimize costs if you automatically scale your application out based on metrics like CPU or memory utilization. If you need 100 Docker containers to support your load during the day but only 10 when load is lower at night, running 100 containers at all times means that you’re using 900% more capacity than you need every night. With a constant container count, you’re either spending more money than you need to most of the time or your service will likely fall over during a load spike.
In the serverless world, we often get the impression that our applications can scale without limits. With the right design (and enough money), this is theoretically possible. But in reality, many components of our serverless applications DO have limits. Whether these are physical limits, like network throughput or CPU capacity, or soft limits, like AWS Account Limits or third-party API quotas, our serverless applications still need to be able to handle periods of high load. And more importantly, our end users should experience minimal, if any, negative effects when we reach these thresholds.
There are many ways to add resiliency to our serverless applications, but this post is going to focus on dealing specifically with quotas in third-party APIs. We’ll look at how we can use a combination of SQS, CloudWatch Events, and Lambda functions to implement a precisely controlled throttling system. We’ll also discuss how you can implement (almost) guaranteed ordering, state management (for multi-tiered quotas), and how to plan for failure. Let’s get started!
strace allows you to see what system calls are being made.
ltrace allows you to see what dynamic library calls are being made.
Though useful, these tools are limited. What if you want to trace what happens inside a system call or library call? What if you want to do more than just logging calls, e.g. you want to compile statistics on certain behavior? What if you want to trace multiple processes and correlate data from multiple sources?
In 2019, there’s finally a decent answer to that on Linux: bpftrace, based on eBPF technology. Bpftrace allows you to write small programs that execute whenever an event occurs.
This article shows you how to setup bpftrace and teaches you its basic usage. I’ll also give an overview of how the tracing ecosystem looks like (e.g. “what’s eBPF?”) and how it came to be what it is today.
In 2018, I wrote a series of blog posts on building a multi-region, active-active, serverless architecture on AWS [1, 2, 3 and 4]. The solution was built using DynamoDB Global Tables, Lambda, the regional API Gateway feature, and Route 53 routing policies. It worked well as a resiliency pattern and as a disaster recovery (DR) strategy . But there was an issue.