A Guide to S3 Batch on AWS

AWS just announced the release of S3 Batch Operations. This is a hotly-anticpated release that was originally announced at re:Invent 2018. With S3 Batch, you can run tasks on existing S3 objects. This will make it much easier to run previously difficult tasks like retagging S3 objects, copying objects to another bucket, or processing large numbers of objects in bulk.

In this post, we’ll do a deep dive into S3 Batch. You will learn when, why, and how to use S3 Batch. First, we’ll do an overview of the key elements involved in an S3 Batch job. Then, we’ll walkthrough an example by doing sentiment analysis on a group of existing objects with AWS Lambda and Amazon Comprehend.


Use Lambda Layers To Post To Slack

I have tried a few different ways of reporting Lambda errors to Slack, but haven’t found a reusable solution that gave all of the information I desired. I decided to solve that problem by creating my own Lambda layer. This solution doesn’t highlight the use of error logging, but is dynamic enough that you can just pass an error message into the layer.

For this to be useful to you, make sure you are familiar with the following:
1. AWS Lambda
2. Node JS
3. NPM
4. Slack


Building Serverless Pipelines with Amazon CloudWatch Events

Events and serverless go together like baked beans and barbecue. The serverless mindset says to focus on code and configuration that provide business value. It turns out that much of the time, this means working with events: structured data corresponding to things that happen in the outside world. Rather than maintaining long-running server tasks that chew up resources while polling, I can create serverless applications that do work only in response to event triggers.

I have lots of options when working with events in AWS: Amazon Kinesis Data Streams, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), and more, depending on my requirements. Lately, I’ve been using a service more often that has the word ‘event’ right in the name: Amazon CloudWatch Events.



AWS introduced Lambda Layers at re:invent 2018 as a way to share code and data between functions within and across different accounts. It’s a useful tool and something many AWS customers have been asking for. However, since we already have numerous ways of sharing code, including package managers such as NPM, when should we use Layers instead?

In this post, we will look at how Lambda Layers works, the problem it solves and the new challenges it introduces. And we will finish off with some recommendations on when to use it.


How Twitch monitors its services with Amazon CloudWatch

Twitch is the leading service and community for multiplayer entertainment and is owned by Amazon. Twitch also provides social and features and micro-transaction features that drive content engagement for its audiences. These services operate at a high transaction volume.

Twitch uses Amazon CloudWatch to monitor its business-critical services. It emits custom metrics then visualizes and alerts based on predefined thresholds for these key metrics. The high volume of transactions handled by the Twitch services makes it difficult to design a metric ingestion strategy that provides sufficient throughput of data while balancing the cost of data ingestion.

Amazon CloudWatch client-side aggregations is a new feature of the PutMetricData API service that helps customers to aggregate data on the client-side, which increases throughput and efficiency. In this blog post we’ll show you how Twitch uses client-side data aggregations to build a more effective metric ingestion architecture while achieving substantial cost reductions.


AWS Lambda Power Tuning

Step Functions state machine generator for AWS Lambda Power Tuning.

The state machine is designed to be quick and language agnostic. You can provide any Lambda Function as input and the state machine will estimate the best power configuration to minimize cost. Your Lambda Function will be executed in your AWS account (i.e. real HTTP calls, SDK calls, cold starts, etc.) and you can enable parallel execution to generate results in just a few seconds.


How to build a serverless clone of Imgur using Amazon Rekognition and DynamoDB

In a previous article, we managed to build a very simple and somewhat primitive Imgur clone — using Amazon Cognito for registration and login before uploading images to the site for all to see.

Now, it had a few issues and these must be addressed before we go on to any funding rounds. We don’t want to scare away any potential investors with a few teething issues.

The issues preventing funding

Let’s go through the issues that need to be resolved prior to a round of Series A funding from any potential investors.

  1. In order to render the home page, it would hit the s3 bucket storing all of these images and then return them as a big JSON list. No pagination, no smaller images. If this thing is going to scale in any real sense then this will have to be addressed. We will have to introduce a database and proper pagination of results.
  2. It doesn’t really do anything “cool”. In order to address this, I thought I’d play around with AWS Rekognition and see if we could add some machine learning image recognition to the site. We can then browse images based on type should we so wish!
  3. There were a couple of frontend things that could have been improved upon, like for instance, you can’t click on an image to view just that one image by itself. We need to add a single page that will fetch the image location and its tags from a database. I won’t cover how I fixed this, but feel free to browse the code which I link to at the bottom of the article!

Once we have addressed these we should hopefully be in a far better place to attract big-money investors. Our finished product after we’re finished with our updates should look something like this:

Notice the tags — these were generated using Amazon Rekognition


Design patterns for high-volume, time-series data in Amazon DynamoDB

Time-series data shows a pattern of change over time. For example, you might have a fleet of Internet of Things (IoT) devices that record environmental data through their sensors, as shown in the following example graph. This data could include temperature, pressure, humidity, and other environmental variables. Because each IoT device tracks these values over regular periods, your backend needs to ingest up to hundreds, thousands, or millions of events every second.

Graph of sensor data

In this blog post, I explain how to optimize Amazon DynamoDB for high-volume, time-series data scenarios. I do this by using a design pattern powered by automation and serverless computing.


AWS S3 Batch Operations: Beginner’s Guide

If you’ve ever tried to run operations on a large number of objects in S3, you might have encountered a few hurdles. Listing all files and running the operation on each object can get complicated and time consuming as the number of objects scales up. Many decisions have to be made: is running the operations from my personal computer fast enough? Or should I run it from a server that’s closer to the AWS resources, benefiting from AWS’s fast internal network? If so, I’ll have to provision resources (e.g. ec2 instance, lambda functions, containers, etc) to run the job.

Thankfully, AWS has heard our pains and announced AWS S3 Batch Operations preview during the last AWS Reinvent conference. This new service (which you can access by asking AWS politely) allows you to easily run operations on very large numbers of S3 objects in your bucket. Curious to know how it works? Let’s get going.