If you’ve ever tried to run operations on a large number of objects in S3, you might have encountered a few hurdles. Listing all files and running the operation on each object can get complicated and time consuming as the number of objects scales up. Many decisions have to be made: is running the operations from my personal computer fast enough? Or should I run it from a server that’s closer to the AWS resources, benefiting from AWS’s fast internal network? If so, I’ll have to provision resources (e.g. ec2 instance, lambda functions, containers, etc) to run the job.
Thankfully, AWS has heard our pains and announced AWS S3 Batch Operations preview during the last AWS Reinvent conference. This new service (which you can access by asking AWS politely) allows you to easily run operations on very large numbers of S3 objects in your bucket. Curious to know how it works? Let’s get going.
We have been using web frameworks to develop web applications since long before serverless came around, and middlewares are stable in these web frameworks. Express.js, for instance, lets you create middlewares at several stages of the request handling pipeline, and even ships with a few common middlewares out of the box.
As our code moves into Lambda functions and we move away from these web frameworks, are middlewares still relevant? If so, how might they look in this new world of serverless?
In this post, we’ll revisit the idea of middlewares, their role in application development with AWS Lambda, and how we can use middlewares to enforce consistent error handling across all of our Lambda functions.
Many Linux distributions use systemd to manage the system’s services (or daemons), for example to automatically start certain services in the correct order when the system boots.
Writing a systemd service in Python turns out to be easy, but the complexity of systemd can be daunting at first. This tutorial is intended to get you started.
When you feel lost or need the gritty details, head over to the systemd documentation, which is pretty extensive. However, the docs are distributed over several pages, and finding what you’re looking for isn’t always easy. A good place to look up a particular systemd detail is systemd.directives, which lists all the configuration options, command line parameters, etc., and links to their documentation.
Aside from this
README.md file, this repository contains a basic implementation of a Python service consisting of a Python script (
python_demo_service.py) and a systemd unit file (
The systemd version we’re going to work with is 229, so if you’re using a different version (see
systemctl --version) then check the systemd documentation for things that may differ.
In this post, I have presented the architecture behind an open-source project called “Beenion”.
It’s structured using Event Sourcing and CQRS patterns and is written in TypeScript.
The mechanism for uploading files from a browser has been around since the early days of the Internet. In the server-full environment it’s very easy to use Django, Express, or any other popular framework. It’s not an exciting topic — until you experience the scaling problem.
Imagine this scenario — you have an application that uploads files. All is well until the site suddenly gains popularity. Instead of handling a gigabyte of uploads a month, usage grows to 100Gb an hour for the month leading up to tax day. Afterwards, the usage drops back down again for another year. This is exactly the problem we had to solve.
File uploading at scale gobbles up your resources — network bandwidth, CPU, storage. All this data is ingested through your web server(s), which you then have to scale — if you’re lucky this means auto-scaling in AWS, but if you’re not in the cloud you’ll also have to contend with the physical network bottleneck issues.
You can also face some difficult race conditions if your server fails in the middle of handling the uploaded file. Did the file make to its end destination? What was the state of the processing? It can be very hard to replay the steps to failure or know the state of transactions when the server is overloaded.
Fortunately, this particular problem turns out to be a great use case for serverless — as you can eliminate the scaling issues entirely. For mobile and web apps with unpredictable demand, you can simply allow the application to upload the file directly to S3. This has the added benefit of enabling an https endpoint for the upload, which is critical for keeping the file’s contents secure in transit.
All this sounds great — but how does this work in practice when the server is no longer there to do the authentication and intermediary legwork?
Similarity computation is a very common task in real-world machine learning and data mining problems such as recommender systems, spam detection, online advertising etc. Consider a tweet recommendation problem where one has to find tweets similar to the tweet user previously clicked. This problem becomes extremely challenging when there are billions of tweets created each day.
In this post, we will discuss the two most common similarity metric, namely Jaccard similarity and Cosine similarity; and Locality Sensitive Hashing based approximation of those metrics.
Commission Free API Trading Can Open Up Many Possibilities
Alpaca provides commission-free stock trading API for individual algo traders and developers, and now almost 1,000 people hang around in our community Slack talking about many different use cases. Among other things, like automated long-term value investing and Google Spreadsheet trading, high-frequency trading (“HFT”) often came up as a discussion topic among our users.
Is High-Frequency Trading (“HFT”) That Special?
Maybe because I don’t come from a finance background, I’ve wondered what’s so special about hedge funds and HFTs that those “Wallstreet” guys talk about. Since I am a developer who always looks for ways to make things work, I decided to do research and to figure out myself on how I could build similar things to what HFTs do.
I am fortunate to work with colleagues who used to build strategies and trade at HFTs, so I learned some basic know-how from them and went ahead to code a working example that trades somewhat like an HFT style (please note that my example does not act like the ultra-high speed professional trading algorithms that collocate with exchanges and fight for nanoseconds latency). Also, because this working example uses real-time data streaming, it can act as a good starting point for users who want to understand how to use real-time data streaming.
The code of this HFT-ish example algorithm is here, and you can immediately run it with your favorite stock symbol. Just clone the repository from GitHub, set the API key, and go!