I think for many people out there, Blockchain is this phenomenon, which is hard to get your head around. I started watching videos and reading articles, but for me personally, it wasn’t until I wrote my own simple Blockchain, that I truly understood what it is and the potential applications for it.
The way I think about blockchain is it is an encrypted database that is public. If you were Amazon and you wanted to use the technology to track your stock levels, would using Blockchain make sense? Probably not, since your customers won’t want to expend their resources verifying your blockchain, since they state on their website, ‘Only 1 left!’, anyway.
I’ll leave you to think about future applications. So without further ado, lets set up our 7 functions!
In a pair of previous posts, we first discussed a framework for approaching textual data science tasks, and followed that up with a discussion on a general approach to preprocessing text data. This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools.
Preprocessing, in the context of the textual data science framework.Our goal is to go from what we will describe as a chunk of text (not to be confused with text chunking), a lengthy, unprocessed single string, and end up with a list (or several lists) of cleaned tokens that would be useful for further text mining and/or natural language processing tasks.
In the previous posts, we explored availability and reliability and the needs and means of building a multi-region, active-active architecture on AWS. In this blog post, I will walk you through the steps needed to build and deploy a serverless multi-region, active-active backend.
This actually blows my mind — since I will be able to explain this in one blog post and you should be able to deploy a fully functional backend in about an hour. In contrast, few years ago it would have required a lot more expertise, work, time and money!
Earlier this year, Amazon DynamoDB released Time to Live (TTL) functionality, which automatically deletes expired items from your tables, at no additional cost. TTL eliminates the complexity and cost of scanning tables and deleting items that you don’t want to retain, saving you money on provisioned throughput and storage. One AWS customer, TUNE, purged 85 terabytes of stale data and reduced their costs by over $200K per year.
Today, DynamoDB made TTL better with the release of a new CloudWatch metric for tracking the number of items deleted by TTL, which is also viewable for no additional charge. This new metric helps you monitor the rate of TTL deletions to validate that TTL is working as expected. For example, you could set a CloudWatch alarm to fire if too many or too few automated deletes occur, which might indicate an issue in how you set expiration time stamps for your items.
In this post, I’ll walk through an example of a serverless application using TTL to automate a common database management task: moving old data from your database into archival storage automatically. Archiving old data helps reduce costs and meet regulatory requirements governing data retention or deletion policies. I’ll show how TTL—combined with DynamoDB Streams, AWS Lambda, and Amazon Kinesis Firehose—facilitates archiving data to a low-cost storage service like Amazon S3, a data warehouse like Amazon Redshift, or to Amazon Elasticsearch Service.
Every concurrency API needs a way to run code concurrently. Here’s some examples of what that looks like using different APIs:
go myfunc(); // Golang
pthread_create(&thread_id, NULL, &myfunc); /* C with POSIX threads */
spawn(modulename, myfuncname, ) % Erlang
threading.Thread(target=myfunc).start() # Python with threads
asyncio.create_task(myfunc()) # Python with asyncio
There are lots of variations in the notation and terminology, but the semantics are the same: these all arrange for myfunc to start running concurrently to the rest of the program, and then return immediately so that the parent can do other things.
Another option is to use callbacks:
QObject::connect(&emitter, SIGNAL(event()), &receiver, SLOT(myfunc())) // C++ with Qt
g_signal_connect(emitter, "event", myfunc, NULL) /* C with GObject */
deferred.addCallback(myfunc) # Python with Twisted
future.add_done_callback(myfunc) # Python with asyncio
Again, the notation varies, but these all accomplish the same thing: they arrange that from now on, if and when a certain event occurs, then myfunc will run. Then once they’ve set that up, they immediately return so the caller can do other things. (Sometimes callbacks get dressed up with fancy helpers like promise combinators, or Twisted-style protocols/transports, but the core idea is the same.)
And… that’s it. Take any real-world, general-purpose concurrency API, and you’ll probably find that it falls into one or the other of those buckets (or sometimes both, like asyncio).
Researchers from NVIDIA, led by Guilin Liu, introduced a state-of-the-art deep learning method that can edit images or reconstruct a corrupted image, one that has holes or is missing pixels.
The method can also be used to edit images by removing content and filling in the resulting holes.
The method, which performs a process called “image inpainting”, could be implemented in photo editing software to remove unwanted content, while filling it with a realistic computer-generated alternative.
“Our model can robustly handle holes of any shape, size location, or distance from the image borders. Previous deep learning approaches have focused on rectangular regions located around the center of the image, and often rely on expensive post-processing,” the NVIDIA researchers stated in their research paper. “Further, our model gracefully handles holes of increasing size.”
You can use Amazon S3 to implement a data lake architecture as the single source of truth for all your data. Taking this approach not only allows you to reliably store massive amounts of data but also enables you to ingest the data at a very high speed and do further analytics on it. Ease of analytics is important because as the number of objects you store increases, it becomes difficult to find a particular object—one needle in a haystack of billions.
Objects in S3 contain metadata that identifies those objects along with their properties. When the number of objects is large, this metadata can be the magnet that allows you to find what you’re looking for. Although you can’t search this metadata directly, you can employ Amazon Elasticsearch Service to store and search all of your S3 metadata. This blog post gives step-by-step instructions about how to store the metadata in Amazon Elasticsearch Service (Amazon ES) using Python and AWS Lambda.
The requests library is arguably the mostly widely used HTTP library for Python. However, what I believe most of its users are not aware of is that its current stable version happily accepts responses whose length is less than what is given in the Content-Length header. If you are not careful enough to check this by yourself, you may end up using corrupted data without even noticing. I have witnessed this first-hand, which is the reason for the present blog post. Lets see why the current requests version does not do this checking (spoiler: it is a feature, not a bug) and how to check this manually in your scripts.
When I discuss AWS Lambda cold starts with folks in the context of API Gateway, I often get responses along the line of:
Meh, it’s only the first request right? So what if one request is slow, the next million requests would be fast.
Unfortunately that is an oversimplification of what happens.
Cold start happens once for each concurrent execution of your function.
API Gateway would reuse concurrent executions of your function that are already running if possible, and based on my observations, might even queue up requests for a short time in hope that one of the concurrent executions would finish and can be reused.
Location and navigation using global positioning systems (GPS) is deeply embedded in our daily lives, and is particularly crucial to Uber’s services. To orchestrate quick, efficient pickups, our GPS technologies need to know the locations of matched riders and drivers, as well as provide navigation guidance from a driver’s current location to where the rider needs to be picked up, and then, to the rider’s chosen destination. For this process to work seamlessly, the location estimates for riders and drivers need to be as precise as possible.
Since the (literal!) launch of GPS in 1973, we have advanced our understanding of the world, experienced exponential growth in the computational power available to us, and developed powerful algorithms to model uncertainty from fields like robotics. While our lives have become increasingly dependent on GPS, the fundamentals of how GPS works have not changed that much, which leads to significant performance limitations. In our opinion, it is time to rethink some of the starting assumptions that were true in 1973 regarding where and how we use GPS, as well as the computational power and additional information we can bring to bear to improve it.
While GPS works well under clear skies, its location estimates can be wildly inaccurate (with a margin of error of 50 meters or more) when we need it the most: in densely populated and highly built-up urban areas, where many of our users are located. To overcome this challenge, we developed a software upgrade to GPS for Android which substantially improves location accuracy in urban environments via a client-server architecture that utilizes 3D maps and performs sophisticated probabilistic computations on GPS data available through Android’s GNSS APIs.
In this article, we discuss why GPS can perform poorly in urban environments and outline how we fix it using advanced signal processing algorithms deployed at scale on our server infrastructure.