Probabilistic Data Structures for Go

“Imagine you had access logs for a very high traffic website. How would you determine how many different IP addresses accessed your site? Or how many hits from a particular IP? Or which ones accessed it the most? Assuming IPv4 addresses, you could use a map[uint32]int to maintain the counts, but that could end up using a lot of memory. It’s certainly possible to have a map with 4 billion entries, and a real log server wouldn’t have accesses from every single valid IP address, but the problem still exists.

Luckily, there’s a class of algorithms that lets you trade memory usage for accuracy. In many cases the reduction in memory can be significant and the drop in accuracy is minimal. The go-probably library by Dustin Sallings implements a number of these basic data structures and algorithms. (It’s almost always better to process things exactly, if you have the memory though. These algorithms can be slower than exact answers for small data sets.)…”