Efficient Counter that uses a limited (bounded) amount of memory regardless of data size

Bounter is a Python library, written in C, for extremely fast probabilistic counting of item frequencies in massive datasets, using only a small fixed memory footprint.

Why Bounter?

Bounter lets you count how many times an item appears, similar to Python’s built-in dict or Counter:

from bounter import bounter

counts = bounter(size_mb=1024)  # use at most 1 GB of RAM
counts.update([u'a', 'few', u'words', u'a', u'few', u'times'])  # count item frequencies

print(counts[u'few'])  # query the counts
2

However, unlike dict or Counter, Bounter can process huge collections where the items would not even fit in RAM. This commonly happens in Machine Learning and NLP, with tasks like dictionary building or collocation detection that need to estimate counts of billions of items (token ngrams) for their statistical scoring and subsequent filtering.

Bounter implements approximative algorithms using optimized low-level C structures, to avoid the overhead of Python objects. It lets you specify the maximum amount of RAM you want to use. In the Wikipedia example below, Bounter uses 31x less memory compared to Counter.

Bounter is also marginally faster than the built-in dict and Counter, so wherever you can represent your items as strings(both byte-strings and unicode are fine, and Bounter works in both Python2 and Python3), there’s no reason not to use Bounter instead.

https://github.com/RaRe-Technologies/bounter

Advertisements

Goroutines, Nonblocking I/O, And Memory Usage

I am generally a fan of Go’s approach to concurrency: writing code with goroutines is a lot easier than writing traditional nonblocking network servers in a language like C or C++. However, while working on a highly concurrent network proxy I came across an interesting realization about how the Go concurrency model makes it harder to write programs that do a lot of concurrent I/O with efficient memory usage.

The program in question is a network proxy akin to HAProxy or Envoy. Typically the proxy has a very large number of clients connected, but most of those clients are actually idle with no outstanding network requests. Each client connection has a read buffer and a write buffer. Therefore the naive memory usage of such a program is at least: #connections * (readbuf_sz + writebuf_sz).

There’s a trick you can do in a C or C++ program of this nature to reduce memory usage. Suppose that typically 5% of the client connections are actually active, and the other 95% are idle with no pending reads or writes. In this situation you can create a pool of buffer objects. When connections are actually active they acquire buffers to use for reading/writing from the pool, and when the connections are idle they release the buffers back to the pool. This reduces the number of allocated buffers to approximately the number of buffers actually needed by active connections. In this case using this technique will give a 20x memory reduction, since only 5% as many buffers will be allocated compared to the naive approach.

The reason this technique works at all is due to how nonblocking reads and writes work in C. In C you use a system call like select(2) or epoll_wait(2) to get a notification that a file descriptor is ready to be read/written, and then after that you explicitly call read(2) or write(2) yourself on that file descriptor. This gives you the opportunity to acquire a buffer after the call to select/epoll, but before making the read call…

https://eklitzke.org/goroutines-nonblocking-io-and-memory-usage

Basics of Making a Rootkit: From syscall to hook!

WARNING: This tutorial is for educational purposes only, and by NO MEANS should you actually be malicious when (or after) making a rootkit. I thought I’d share how to do this for any security minded people who would like to learn more on how to prevent or look for rootkits. This will be done in C on Linux, probably using libraries and functions you’ve never seen. It is also advisable to do this in a VM to get the hang of compiling and loading modules. Messing with the kernel can cause things to go crazy, if not break- you have been warned.

 Jump to:

Basics of Making a Rootkit: From syscall to hook!

Tutorial – Write a System Call

A while back, I wrote about writing a shell in C, a task which lets you peek under the covers of a tool you use daily. Underneath even a simple shell are many operating system calls, like read, fork, exec, wait, write, and chdir (to name a few). Now, it’s time to continue this journey down another level, and learn just how these system calls are implemented in Linux.

What is a system call?

Before we start implementing system calls, we’d better make sure we understand exactly what they are. A naive programmer—like me not that long ago—might define a system call as any function provided by the C library. But this isn’t quite true. Although many functions in the C library align nicely with system calls (like chdir), other ones do quite a bit more than simply ask the operating system to do something (such as fork or fprintf). Still others simply provide programming functionality without using the operating system, such as qsort and strtok.

In fact, a system call has a very specific definition. It is a way of requesting that the operating system kernel do something on your behalf. Operations like tokenizing a string don’t require interacting with the kernel, but anything involving devices, files, or processes definitely does.

System calls also behave differently under the hood than a normal function. Rather than simply jumping to some code from your program or a library, your program has to ask the CPU to switch into kernel mode, and then go to a predefined location within the kernel to handle your system call. This can be done in a few different ways, such as a processor interrupt, or special instructions such as syscall or sysenter. In fact, the modern way of making a system call in Linux is to let the kernel provide some code (called the VDSO) which does the right thing to make a system call. Here’s an interesting SO question on the topic.

Thankfully, all that complexity is handled for us. No matter how a system call is made, it all comes down to looking up the particular system call number in a table to find the correct kernel function to call. Since all you need is a table entry and a function, it’s actually very easy to implement your own system call. So let’s give it a shot!

https://brennan.io/2016/11/14/kernel-dev-ep3/