You can use Amazon S3 to implement a data lake architecture as the single source of truth for all your data. Taking this approach not only allows you to reliably store massive amounts of data but also enables you to ingest the data at a very high speed and do further analytics on it. Ease of analytics is important because as the number of objects you store increases, it becomes difficult to find a particular object—one needle in a haystack of billions.
Objects in S3 contain metadata that identifies those objects along with their properties. When the number of objects is large, this metadata can be the magnet that allows you to find what you’re looking for. Although you can’t search this metadata directly, you can employ Amazon Elasticsearch Service to store and search all of your S3 metadata. This blog post gives step-by-step instructions about how to store the metadata in Amazon Elasticsearch Service (Amazon ES) using Python and AWS Lambda.