Discord continues to grow faster than we expected and so does our user-generated content. With more users comes more chat messages. In July, we announced 40 million messages a day, in December we announced 100 million, and as of this blog post we are well past 120 million. We decided early on to store all chat history forever so users can come back at any time and have their data available on any device. This is a lot of data that is ever increasing in velocity, size, and must remain available. How do we do it? Cassandra!
- Write and Read
- Data Model
- Best Combos!
- How to get started
In an industry where so many people want to change the world, it’s fair to say that low cost object storage has done just that. Building a business that requires flexible low-latency storage is now affordable in a way we couldn’t imagine before.
When building the Exoscale public cloud offering, we knew that a simple object storage service, protected by Swiss privacy laws, would be crucial. After looking at the existing object storage software projects, we decided to build our own solution: Pithos.
Pithos is an open source S3-compatability layer for Apache Cassandra, the column database. In other words, it allows you to use standard S3 tools to store objects in your own Cassandra cluster. If this is the first time that you’ve looked at object storage software then you may wonder why Pithos is built on top of a NoSQL database but it’s not all that unusual.
At Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million songs. Our users have a choice to follow thousands of artists and hundreds of their friends and create their own music graph. On our service they also discover new and existing content by experiencing a variety of music promotions (album releases, artist promos), which get served over our ad platform. These options have empowered our users and made them really engaged. Over time they have created over 1.5 billion playlists and just last year they streamed over 7 billion hours worth of music.
But at times an abundance of options has also made our users feel a bit lost. How do you find that right playlist for your workout from over a billion playlists? How do you discover new albums which are relevant to your taste? We help our users discover and experience relevant content by personalizing their experience on our platform.
Personalizing user experience involves learning their tastes and distastes in different contexts. A metal genre listener might not enjoy an announcement for a metal genre album when they are trying to put their kid to sleep and playing kid’s music at night. Serving them a recommendation for a kid’s music album might be more relevant in that context. But this experience might not be relevant for another metal genre listener who doesn’t mind receiving metal genre album recommendations during any context. These two users with similar listening habits might have different preferences. Personalizing their experiences on Spotify according to their respective taste in different contexts helps us make them more engaged.
Given these product insights we set out to build a personalization system which could analyze both real-time and historic data to understand user’s context and behavior respectively. Over time we’ve evolved our personalization tech stack due to a flexible architecture and ensured we used the right tools to solve the problem at scale.
Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversalsin real time.
In addition, Titan provides the following features:
- Elastic and linear scalability for a growing data and user base.
- Data distribution and replication for performance and fault tolerance.
- Multi-datacenter high availability and hot backups.
- Support for ACID and eventual consistency.
- Support for various storage backends:
- Support for global graph data analytics, reporting, and ETL through integration with big data platforms:
- Support for geo, numeric range, and full-text search via:
- Native integration with the TinkerPop graph stack:
- Open source with the liberal Apache 2 license.
Deletes in Cassandra
Cassandra uses a log-structured storage engine. Because of this, deletes do not remove the rows and columns immediately and in-place. Instead, Cassandra writes a special marker, called a tombstone, indicating that a row, column, or range of columns was deleted. These tombstones are kept for at least the period of time defined by the gc_grace_seconds per-table setting. Only then a tombstone can be permanently discarded by compaction.
This scheme allows for very fast deletes (and writes in general), but it’s not free: aside from the obvious RAM/disk overhead of tombstones, you might have to pay a certain price when reading data back if you haven’t modelled your data well.
Specifically, tombstones will bite you if you do lots of deletes (especially column-level deletes) and later perform slice queries on rows with a lot of tombstones.
Symptoms of a wrong data model
To illustrate this scenario, let’s consider the most extreme case – using Cassandra as a durable queue, a known anti-pattern, e.g.
CREATE TABLE queues (
PRIMARY KEY (name, enqueued_at)
Having enqueued 10000 10-byte messages and then dequeued 9999 of them, one by one, let’s peek at the last remaining message using cqlsh with TRACING ON:
SELECT enqueued_at, payload FROM queues WHERE name = 'queue-1' LIMIT 1;