Orchestrating backend services with AWS Step Functions

The problem

In many use cases, there are processes that need to execute multiple tasks. We build micro-services or server-less functions like AWS Lambda functions to carry out these tasks. Almost all these services are stateless functions and there is need of queues or databases to maintain the state of individual tasks and the process as a whole. Writing code that orchestrates these tasks can be both painful and hard to debug and maintain. It’s not easy to maintain the state of a process in an ecosystem of micro-services and server-less functions.

https://medium.com/engineering-zemoso/orchestrating-backend-services-with-aws-step-functions-8648c90e5a57?hss_channel=tw-920289756919074817

Advertisements

Python for NLP: Introduction to the TextBlob Library

This is the seventh article in my series of articles on Python for NLP. In my previous article, I explained how to perform topic modeling using Latent Dirichlet Allocation and Non-Negative Matrix factorization. We used the Scikit-Learn library to perform topic modeling.

In this article, we will explore TextBlob, which is another extremely powerful NLP library for Python. TextBlob is built upon NLTK and provides an easy to use interface to the NLTK library. We will see how TextBlob can be used to perform a variety of NLP tasks ranging from parts-of-speech tagging to sentiment analysis, and language translation to text classification.

https://stackabuse.com/python-for-nlp-introduction-to-the-textblob-library/

AWS Lambda Power Tuning

Step Functions state machine generator for AWS Lambda Power Tuning.

The state machine is designed to be quick and language agnostic. You can provide any Lambda Function as input and the state machine will estimate the best power configuration to minimize cost. Your Lambda Function will be executed in your AWS account (i.e. real HTTP calls, SDK calls, cold starts, etc.) and you can enable parallel execution to generate results in just a few seconds.

https://github.com/alexcasalboni/aws-lambda-power-tuning

122+ announcements from Google Cloud Next ‘19

Hybrid Cloud

  • 2. Anthos (the new name for Cloud Services Platform) is now generally available on Google Kubernetes Engine (GKE) and GKE On-Prem, so you can deploy, run and manage your applications on-premises or in the cloud. Coming soon, we’ll extend that flexibility to third-party clouds like AWS and Azure. And Anthos is launching with the support of more than 30 hardware, software and system integration partners so you can get up and running fast.
  • 3. With Anthos Migrate, powered by Velostrata’s migration technology, you can auto-migrate VMs from on-premises or other clouds directly into containers in GKE with minimal effort.
  • 4. Anthos Config Management lets you create multi-cluster policies out of the box that set and enforce role-based access controls, resource quotas, and namespaces—all from a single source of truth.

Serverless

  • 5. Cloud Run, our fully managed serverless execution environment, offers serverless agility for containerized apps.
  • 6. Cloud Run on GKE brings the serverless developer experience and workload portability to your GKE cluster.
  • 7. Knative, the open API and runtime environment, brings a serverless developer experience and workload portability to your existing Kubernetes cluster anywhere.
  • 8. We’re also making new investments in our Cloud Functions and App Engine platforms with new second generation runtimes, a new open-sourced Functions Framework, and additional core capabilities, including connectivity to private GCP resources.

https://cloud.google.com/blog/topics/inside-google-cloud/100-plus-announcements-from-google-cloud-next19

How to build a serverless clone of Imgur using Amazon Rekognition and DynamoDB

In a previous article, we managed to build a very simple and somewhat primitive Imgur clone — using Amazon Cognito for registration and login before uploading images to the site for all to see.

Now, it had a few issues and these must be addressed before we go on to any funding rounds. We don’t want to scare away any potential investors with a few teething issues.

The issues preventing funding

Let’s go through the issues that need to be resolved prior to a round of Series A funding from any potential investors.

  1. In order to render the home page, it would hit the s3 bucket storing all of these images and then return them as a big JSON list. No pagination, no smaller images. If this thing is going to scale in any real sense then this will have to be addressed. We will have to introduce a database and proper pagination of results.
  2. It doesn’t really do anything “cool”. In order to address this, I thought I’d play around with AWS Rekognition and see if we could add some machine learning image recognition to the site. We can then browse images based on type should we so wish!
  3. There were a couple of frontend things that could have been improved upon, like for instance, you can’t click on an image to view just that one image by itself. We need to add a single page that will fetch the image location and its tags from a database. I won’t cover how I fixed this, but feel free to browse the code which I link to at the bottom of the article!

Once we have addressed these we should hopefully be in a far better place to attract big-money investors. Our finished product after we’re finished with our updates should look something like this:

Notice the tags — these were generated using Amazon Rekognition

https://read.acloud.guru/building-an-imgur-clone-part-2-image-rekognition-and-a-dynamodb-backend-abc9af300123

Design patterns for high-volume, time-series data in Amazon DynamoDB

Time-series data shows a pattern of change over time. For example, you might have a fleet of Internet of Things (IoT) devices that record environmental data through their sensors, as shown in the following example graph. This data could include temperature, pressure, humidity, and other environmental variables. Because each IoT device tracks these values over regular periods, your backend needs to ingest up to hundreds, thousands, or millions of events every second.

Graph of sensor data

In this blog post, I explain how to optimize Amazon DynamoDB for high-volume, time-series data scenarios. I do this by using a design pattern powered by automation and serverless computing.

https://aws.amazon.com/pt/blogs/database/design-patterns-for-high-volume-time-series-data-in-amazon-dynamodb/

The Illustrated Word2vec

I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2).

Word2vec is a method to efficiently create word embeddings and has been around since 2013. But in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. Companies like AirbnbAlibabaSpotify, and Anghami have all benefitted from carving out this brilliant piece of machinery from the world of NLP and using it in production to empower a new breed of recommendation engines.

In this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. But let’s start with an example to get familiar with using vectors to represent things. Did you know that a list of five numbers (a vector) can represent so much about your personality?

https://jalammar.github.io/illustrated-word2vec/