“This post outlines ten things you need to know for operating MongoDB at scale based on my experience working with MongoDB customers and open source users:…”
“Server Density processes over 30TB/month of incoming data points from the servers and web checks we monitor for our customers, ranging from simple Linux system load average to website response times from 18 different countries. All of this data goes into MongoDB in real time and is pulled out when customers need to view graphs, update dashboards and generate reports…”
“The driver is Apache licensed, and is implemented purely in Rust without relying on any C extensions or libraries (with the exception of an embedded copy of md5, which can be removed when Rust offers a native implementation).
Using the driver will feel familiar to users of current MongoDB drivers. Basic BSON types, such as strings, numbers, and objects, have a built-in internal representation based around Rust’s algebraic data types. Users can also implement a trait (analogous to a Haskell typeclass, similar to a Java interface) which allows them to treat types native to their codebase as though they were native to BSON as well.
Once a user’s data is formatted, interacting with the database is done through familiar objects like Collection, DB, Cursor, and Client. These objects have similar APIs to their counterparts in other languages, and presently offer CRUD, indexing, as well as administrative functionality in a framework that is Rustically abstracted and balances the philosophy of Rust’s static guarantees with the flexibility of MongoDB. A small example is offered below…”
“With open-source technologies proliferating as “Big Data” and analytics explode, we thought it would be beneficial to let our users and friends utilize a script that takes care of the nitty gritty and allows them to explore what makes MongoDB great. We’re excited to present Twitter-Harvest, a Python script that utilizes the Twitter REST API v1.1 to retrieve tweets from a user’s timeline and inserts them into a MongoDB database…”
“Distributed systems are characterized by exchanging state over high-latency or unreliable links. The system must be robust to both node and network failure if it is to operate reliably–however, not all systems satisfy the safety invariants we’d like. In this article, we’ll explore some of the design considerations of distributed databases, and how they respond to network partitions.
IP networks may arbitrarily drop, delay, reorder, or duplicate messages send between nodes–so many distributed systems use TCP to prevent reordered and duplicate messages. However, TCP/IP is still fundamentally asynchronous: the network may arbitrarily delay messages, and connections may drop at any time. Moreover, failure detection is unreliable: it may be impossible to determine whether a node has died, the network connection has dropped, or things are just slower than expected…”
“MongoDB is a document-oriented database with a similar distribution design to Redis. In a replica set, there exists a single writable primary node which accepts writes, and asynchronously replicates those writes as an oplog to N secondaries. However, there are a few key differences…”
“MongoDB was used early on at Scrapinghub to store scraped data because it’s convenient. Scraped data is represented as (possibly nested) records which can be serialized to JSON. The schema is not known ahead of time and may change from one job to the next. We need to support browsing, querying and downloading the stored data. This was very easy to implement using MongoDB (easier than the alternatives available a few years ago) and it worked well for some time.
Usage has grown from a simple store for scraped data used on a few projects to the back end of our Scrapy Cloud platform. Now we are experiencing limitations with our current architecture and rather than continue to work with MongoDB, we have decided to move to a different technology (more in a later blog post). Many customers are surprised to hear that we are moving away from MongoDB, I hope this blog post helps explain why it didn’t work for us…”