Open Stack Swift is Enterprise Ready! No more triple copies, EC Erasure Correction Brings it Home!

“The most exciting develpment in Swift since its start! Brings Swift to the Enterprise.

As you might already know I enjoy talking and discussing technology both Storage and data protection.

In this case I like to share with you my findings on Swift one of the primary projects of Openstack.

First, Swift is 100% python so if you are doing or learning python already. AWESOME! Good for you.

In the beginning of swift, there was no Erasure code and the impact is that additional space is taken up to bring redundancy to data by making 3 copies or 2.

So, Why is this important?

Huge impact! With erasure code we gain back space without the concern of losing data and having to mirror the data 3 times.

In Triple replication we copy an object 3 times in separate locations.

With reduced replication we copy the object in 2 separate locations

With Erasure coding we have fragments of the object in N # of locations…”

Havana Design Summit: Benchmarking Swift

“Depending on your goal, you may want a Realistic Benchmark or a Targeted Benchmark. Both approaches require benchmarking tools that scale to avoid any bottlenecks in the benchmarking code during load generation. Because of Swift’s fantastic horizontal scalability, avoiding bottlenecks in benchmarking code can be very challenging. Benchmarking Swift means generating tens of thousands of concurrent requests and utilizing many benchmarking servers to allow hundreds of gigabits per second of available client throughput. Both approaches to benchmarking also benefit from fine-grained collection of total request latency, time-to-first-byte latency, and Swift transaction IDs for every request. But they do have different goals, and that should inform load generation and results analysis.

Realistic Benchmarking, asks, “What happens when the cluster sees a particular client load?” or “How many clients, ops-per-second, or throughput can my cluster really support?” You are more interested in simulating a production workload than you are in isolating a particular action. This kind of benchmarking can benefit from simulating parametric mixed client workloads (proportion of object sizes, operation types, etc.) or replaying a workload based on some kind of capture or “trace” from another cluster.

With Targeted Benchmarking, you want to generate a very specific, controlled load on the cluster to identify problems and test potential improvements. Data collected during a synthetic workload will be less noisy than a more realistic, mixed workload. This is useful for testing the effectiveness of tweaks to networking, node hardware, tuning/configuration, and Swift code…”

Data Placement in Swift

“One of the hard problems that needs to be solved in a distributed storage system is to figure out how to effectively place the data within the storage cluster. Swift has a “unique-as-possible” placement algorithm which ensures that the data is placed efficiently and with as much protection from hardware failure as possible.

Swift places data into distinct availability zones to ensure both high durability and high availability. An availability zone is a distinct set of physical hardware with unique failure mode isolation. In a large deployment, availability zones may be defined as unique facilities in a large data center campus. In a single-DC deployment, the availability zones may be unique rooms, separated by firewalls and powered with different utility providers. A multi-rack cluster may choose to define availability zones as a rack and everything behind a single top-of-rack switch. Swift allows a deployer to choose how to define availability zones based on the particular details of the available infrastructure…”

Implementing Encryption Architecture with Cisco Webex for OpenStack Swift object storage

“One of the requirements for data center security is protection of “at rest” data. Usually this protection is about encrypting client-generated contents, including objects stored in the Swift cluster. In most cases, clients themselves could carefully encrypt their data; however, this requires the client to establish and support encryption infrastructure. A cloud provider can create value by offering transparent server-side on-disk encryption.

We have been working on this design as a part of our current engagement with Cisco Webex. Their requirements include encryption of data stored on Swift devices, and clear separation of code to simplify code base maintenance. These are the requirements we aim to address with our proposal at the forthcoming OpenStack design conference for the forthcoming Grizzly release of OpenStack…”

Object Storage approaches for OpenStack Cloud: Understanding Swift and Ceph

“Many people confuse object storage with block-level storage such as iSCSI or FibreChannel (SAN), but there is a great deal of difference between them. While SAN exposes only block devices to the system (the /dev/sdb linux device name is a good example), object storage can be accessed only with a specialized client app (e.g., the client app).

Block storage is an important part of cloud infrastructure. Its main use cases are storing images for virtual machines or storing a user’s files (e.g., backups of all sorts, documents, pictures). The main advantage of object storage is very low implementation cost  compared to enterprise-grade storage, while ensuring scalability and data redundancy. There seem to be a couple of widely recognizable implementations of object storage. Here we’ll compare two of them that can be interfaced with OpenStack…”

How swift is your Swift? Benchmarking OpenStack Swift

“The OpenStack Swift project has been developing at a tremendous pace. The version 1.6.0 was released in August followed by 1.7.4 (Folsom) just after two months!  In these two recent releases, many important features have also been implemented, for example the optimization for using SSD, object versioning, StatsD logging and much more – many of these features have significant implications for performance planning for the cloud builders and operators.

As an integral part of deploying a cloud storage platform based on OpenStack Swift, benchmarking a Swift cluster implementation is essential before the cluster is deployed for production use. Preferably the benchmark should simulate the eventual workload that the cluster will be subjected to.

In this blog, we discuss following Swift benchmarking concepts:
(1)    Benchmark Dimensions for Swift cluster: performance, scalability and degraded-mode performance (e.g. when hardware and software failures happen).
(2)    Sample workloads for Swift cluster…”

The Top 3 New Swift Features in OpenStack Folsom

“There has been a ton of activity in and around Swift throughout the Folsom release cycle. Swift has moved from version 1.4.8 in the Essex release to version 1.7.4 in the Folsom release. Some of the new features added in the Folsom release include the integration of Keystone middleware, the separation of the Swift CLI and client library so Glance can more easily integrate with Swift to store Nova images.

Swift has also added many new features to its core storage engine. Below I’ve described what I think are the three most significant new features in Swift in the Folsom release…”