Lambda functions over S3 objects with concurrency control (forEach, map, reduce, filter)

s3-lambda enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark.

At Littlstar, we use s3-lambda for all sorts of data pipelining and analytics.

https://github.com/littlstar/s3-lambda

Advertisements

AWS Storage Gateway provides a file interface to objects in your Amazon S3 buckets

AWS Storage Gateway now provides a virtual on-premises file server, which enables you to store and retrieve Amazon S3 objects through standard file storage protocols. With file gateway, existing applications or devices can use secure and durable cloud storage without needing to be modified. File gateway simplifies moving data into S3 for in-cloud workloads, provides cost-effective storage for backup and archive, or expands your on-premises storage into the cloud.

File gateway is available as a virtual machine image which you download from the AWS Management Console. Once deployed in your data center and associated with your AWS account, your configured S3 buckets will be available as Network File System (NFS) mount points. Your applications read and write files and directories over NFS, interfacing to the gateway as a file server. In turn, the gateway translates these file operations into object requests on your S3 buckets. Like existing volume and tape gateways, your most recently used data is cached on the gateway for low-latency access, and data transfer between your data center and AWS is fully managed and optimized by the gateway. Once in S3, you can access the objects directly or manage them using features such as S3 Lifecycle Policies, object versioning, and cross-region replication.

To start using the new AWS Storage Gateway, click here. There are no up-front commitments required and you pay only for what you use. To learn more, click here.

https://aws.amazon.com/about-aws/whats-new/2016/11/aws-storage-gateway-provides-a-file-interface-to-objects-in-your-amazon-s3-buckets/

Going Serverless: AWS and Compelling Science Fiction

This is a companion blog post to a talk I gave to the Boulder Python Meetup group about the infrastructure that runs Compelling Science Fiction. Slides from that talk can be found here.Hopefully you can use some of these tools to create something new as well!

Compelling Science Fiction is run entirely on extremely inexpensive Amazon Web Services (AWS). There are currently three primary use cases that I have:

  1. Serving web pages that contain the site. This is easily achieved by using the Amazon S3 feature that allows you to serve static web pages from an S3 bucket.
  2. Accepting and managing submissions from authors.
  3. Reading through the queue (“slush”) of stories that authors submit.

It’s the last two items on that list that I’ll be talking about today, because they both use the same basic infrastructure. That infrastructure is diagrammed below:As you can see, I use four different Amazon Web Services: the Simple Email Service (SES), Simple Storage Service (S3), Lambda, and DynamoDB. I’ll touch on all of the ways we use these services, but AWS Lambda is the most important, because it allows us to glue together all the services with Python without provisioning any servers.

http://compellingsciencefiction.com/blog/2016-11-10.html

S3-compatible object storage on Cassandra

In an industry where so many people want to change the world, it’s fair to say that low cost object storage has done just that. Building a business that requires flexible low-latency storage is now affordable in a way we couldn’t imagine before.

When building the Exoscale public cloud offering, we knew that a simple object storage service, protected by Swiss privacy laws, would be crucial. After looking at the existing object storage software projects, we decided to build our own solution: Pithos.

Pithos is an open source S3-compatability layer for Apache Cassandra, the column database. In other words, it allows you to use standard S3 tools to store objects in your own Cassandra cluster. If this is the first time that you’ve looked at object storage software then you may wonder why Pithos is built on top of a NoSQL database but it’s not all that unusual.

https://www.linkedin.com/pulse/s3-compatible-object-storage-cassandra-matthew-revell