In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.
My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.
In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.
s3-lambda enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark.
At Littlstar, we use
s3-lambda for all sorts of data pipelining and analytics.
This is a companion blog post to a talk I gave to the Boulder Python Meetup group about the infrastructure that runs Compelling Science Fiction. Slides from that talk can be found here.Hopefully you can use some of these tools to create something new as well!
Compelling Science Fiction is run entirely on extremely inexpensive Amazon Web Services (AWS). There are currently three primary use cases that I have:
- Serving web pages that contain the site. This is easily achieved by using the Amazon S3 feature that allows you to serve static web pages from an S3 bucket.
- Accepting and managing submissions from authors.
- Reading through the queue (“slush”) of stories that authors submit.
It’s the last two items on that list that I’ll be talking about today, because they both use the same basic infrastructure. That infrastructure is diagrammed below:As you can see, I use four different Amazon Web Services: the Simple Email Service (SES), Simple Storage Service (S3), Lambda, and DynamoDB. I’ll touch on all of the ways we use these services, but AWS Lambda is the most important, because it allows us to glue together all the services with Python without provisioning any servers.
In an industry where so many people want to change the world, it’s fair to say that low cost object storage has done just that. Building a business that requires flexible low-latency storage is now affordable in a way we couldn’t imagine before.
When building the Exoscale public cloud offering, we knew that a simple object storage service, protected by Swiss privacy laws, would be crucial. After looking at the existing object storage software projects, we decided to build our own solution: Pithos.
Pithos is an open source S3-compatability layer for Apache Cassandra, the column database. In other words, it allows you to use standard S3 tools to store objects in your own Cassandra cluster. If this is the first time that you’ve looked at object storage software then you may wonder why Pithos is built on top of a NoSQL database but it’s not all that unusual.