“Consul has multiple components, but as a whole, it is a tool for discovering and configuring services in your infrastructure. It provides several key features:
- Service Discovery: Clients of Consul can provide a service, such as
mysql, and other clients can use Consul to discover providers of a given service. Using either DNS or HTTP, applications can easily find the services they depend upon.
- Health Checking: Consul clients can provide any number of health checks, either associated with a given service (“is the webserver returning 200 OK”), or with the local node (“is memory utilization below 90%”). This information can be used by an operator to monitor cluster health, and it is used by the service discovery components to route traffic away from unhealthy hosts.
- Key/Value Store: Applications can make use of Consul’s hierarchical key/value store for any number of purposes including: dynamic configuration, feature flagging, coordination, leader election, etc. The simple HTTP API makes it easy to use.
- Multi Datacenter: Consul supports multiple datacenters out of the box. This means users of Consul do not have to worry about building additional layers of abstraction to grow to multiple regions.
Consul is designed to be friendly to both the DevOps community and application developers, making it perfect for modern, elastic infrastructures…”
“What is ACID anyway, and why is everyone so uptight about it? When can’t we have eventual consistency, and what do we sacrifice in exchange for stronger models? I’ve spent the last year trying to wrap my head around consistency in distributed systems, and testing databases to see how those consistency models play out in practice. In this talk we’ll explore linearizability–one of the strongest consistency models for a concurrent system–move from an academic definition to an intuitive understanding, and see the ways in which databases succeed–and fail–to live up to their consistency claims…”
“A key advantage of Spark is that its machine learning library (MLlib) and its library for stream processing (Spark Streaming) are built on the same core architecture for distributed analytics. This facilitates adding extensions that leverage and combine components in novel ways without reinventing the wheel. We have been developing a family of streaming machine learning algorithms in Spark within MLlib. In this post we describe streaming k-means clustering, included in the recently released Spark 1.2…”
Introducing streaming k-means in Apache Spark 1.2
“After several days, a lot of Googling, reading of (mostly unhelpful) support mailing lists and much experimentation I felt I had accumulated a pretty solid understanding of how things had gotten the way they did in our customer’s database. What’s more, unlike most of the other blog articles and other web pages you’ll probably read on this, I felt I had discovered a relatively simple procedure for getting out of this situation. Since I wasn’t able to find any other authoritative source on this on the internet (and in fact, most of the other sources I’ve seen have said you really don’t want to be in this situation– while offering little help as to what to do about it if you’re already there), I thought writing a public document on the subject might help some other systems administrators out there who do find themselves unexpectedly in the middle of MySQL Character Set Hell…”
“Clojure is a great language that is continuing to improve itself and expand its user base year over year. The Clojure ecosystem has many great libraries focused on being highly composable. This composability allows developers to easily build impressive applications from seemingly simple parts. Once you have a solid understanding of how Clojure libraries fit together, integration between them can become very intuitive. However, if you have not reached this level of understanding, knowing how all of the parts fit together can be daunting. Fear not, this series will walk you through start to finish, building a tested compojure web app backed by a Postgres Database…”
“Python (and its libraries) are enormous. It is used for system automation, web applications, big data, analytics, and security software. This article aims to show off some lesser-known tricks to put you on the path to faster development, easier debugging, and general fun.
“As I was browsing the web and catching up on some sites I visit periodically, I found a cool article from Tom Hayden about using Amazon Elastic Map Reduce (EMR) and mrjob in order to compute some statistics on win/loss ratios for chess games he downloaded from the millionbase archive, and generally have fun with EMR. Since the data volume was only about 1.75GB containing around 2 million chess games, I was skeptical of using Hadoop for the task, but I can understand his goal of learning and having fun with mrjob and EMR. Since the problem is basically just to look at the result lines of each file and aggregate the different results, it seems ideally suited to stream processing with shell commands. I tried this out, and for the same amount of data I was able to use my laptop to get the results in about 12 seconds (processing speed of about 270MB/sec), while the Hadoop processing took about 26 minutes (processing speed of about 1.14MB/sec)…”
(a side-by-side reference sheet)