KVM creators open-source fast Cassandra drop-in replacement Scylla

“Two key figures behind popular open-source hypervisor KVM are today unveiling a new NoSQL database that they describe as a far faster drop-in replacement for Apache Cassandra.

The Scylla database, from KVM inventor Avi Kivity and the man who oversaw the hypervisor’s development, Dor Laor, offers what they say is 10 times better throughput and latency than wide column store Cassandra, while maintaining complete compatibility.”…

“Scylla has been written in C++ 14 – together with the project’s Seastar programming model. The Seastar C++ application framework is designed for high concurrency server applications and described on GitHub as “an event-driven framework allowing you to write non-blocking, asynchronous code in a relatively straightforward manner”…

http://www.zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/
https://github.com/scylladb/scylla
http://www.seastar-project.org/

TITAN Distributed Graph Database

“Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.

In addition, Titan provides the following features:

http://thinkaurelius.github.io/titan/

Understanding the Impact of Cassandra Compact Storage

“At Librato, our primary data store for time-series metrics is Apache Cassandra built using a custom schema we’ve developed over time. We’ve written and presented on it several times in the past. We store both real-time metrics and historical rollup time-series in Cassandra. Cassandra storage nodes have the largest footprint in our infrastructure and hence drive our costs, so we are always looking for ways to improve the efficiency of our data model.

As part of our ongoing efficiency improvements and development of new backend functionality, we recently took the time to reevaluate our storage schema. Coming from the early days of Cassandra 0.8.x, our schema has always been built atop the legacy Thrift APIs, and whenever we stood up a new ring, we migrated it using the `nodetool` command. We’ve been closely following the development of CQL and had already moved parts of our read-path to the new native interface in 2.0.x. However, we wanted to take a closer look at fully constructing our schema migrations (creating the CQL tables, or “column families” as they were called) using the native CQL interface…”

http://blog.librato.com/posts/cassandra-compact-storage

Building a Distributed Fault-Tolerant Key-Value Store

“First of all, let’s discuss briefly what a Key-Value store is, and how it compares to a relational database.

Key-Value Stores offer a simple abstraction over your data, working as a dictionary data-structure. Such database provides a mechanism for storage and retrieval of data that is modeled in manipulated by means of basic CRUD operations (create, read, update, delete). The API of these databases is usually kept simple, and even if they provide an SQL-like language like Cassandra’s Query Language, it’s intentionally kept much simpler than full-blown SQL.

SQL/NoSQL

This simpler functionality means that Key-Value Stores, and NoSQL databases in general, often give more responsibility to the user, who now needs to manually do a lot of the work that the system takes care of automatically in a relational database. They sacrifice the expressivity brought by an expressive language like SQL, and the integrity checks brought by these schema-based models. This in turn means that NoSQL systems are free to choose other trade-offs that will result in higher availability, performance, scalability or other specific qualities.

One important thing regarding RDBMS and NoSQL is their respective theoretical models, which establishes the guarantees that such a system provides to the end user. They are known as the ACID and BASE models, in one of those fancy metaphors made out of acronyms.

So, why are NoSQL systems popular nowadays? Well, not without some controversy, but the main selling points could be summarized as:

  • Speed
  • Single Point of Failure (SPOF) avoidance
  • Better support for Large amounts of unstructured data
  • Lower TCO (Total cost of operation, sysadmins)
  • Incremental scalability

This are the sort of qualities that define Google, Facebook and the other big players and their business models, so it makes perfect sense to them. Whether it makes sense for your particular situation (probably not), well, it’s the core of that controversy, and it’s not really the intention of this post to dig into that…”

http://blog.fourthbit.com/2015/04/12/building-a-distributed-fault-tolerant-key-value-store

Migrating or expanding a Cassandra cluster

“Recently I was tasked with migrating an existing DataStax Cassandra cluster to a different availability zone (AZ) in AWS EC2. The existing cluster’s nodes were m1.xlarge instances running the DataStax Community AMI, with Cassandra 1.2.8-1 installed.

The migration strategy is fairly simple. The cluster can be migrated one node at a time, by setting up an equivalent number of nodes in the other AZ, adding them into the existing cluster, and then decommissioning each node in the original AZ one by one. This way, the migration is performed gradually while the cluster is online, and without any interruption in service…”

https://engineering.gosquared.com/expanding-a-cassandra-cluster

Getting Started with Time Series Data Modeling

“Cassandra’s data model works well with data in a sequence. That data can be variable in size, and Cassandra handles large amounts of data excellently. When writing data to Cassandra, data is sorted and written sequentially to disk. When retrieving data by row key and then by range, you get a fast and efficient access pattern, due to minimal disk seeks. Time series data is an excellent fit for this type of pattern. For these examples, we’ll use a weather station that is creating temperature data every minute. You will see how using the row key and sequence can be a powerful data modeling tool…”

http://www.planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling

From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix

“There will come a time in the life of most systems serving data, when there is a need to migrate data to a more reliable, scalable and high performance data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems…”

http://nosql.mypopescu.com/post/43387882910/from-simpledb-to-cassandra-data-migration-for-a-high

http://techblog.netflix.com/2013/02/netflix-queue-data-migration-for-high.html?m=1

Up and running with Cassandra

Cassandra is a hybrid non-relational database in the same class as Google’s BigTable. It is more featureful than a key/value store like Riak, but supports fewer query types than a document store like MongoDB.

Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

This post is both a tutorial and a “getting started” overview. You will learn about Cassandra’s features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service…”