The premise behind autoscaling in AWS is simple: you can maximize your ability to handle load spikes and minimize costs if you automatically scale your application out based on metrics like CPU or memory utilization. If you need 100 Docker containers to support your load during the day but only 10 when load is lower at night, running 100 containers at all times means that you’re using 900% more capacity than you need every night. With a constant container count, you’re either spending more money than you need to most of the time or your service will likely fall over during a load spike.
Linux has two well-known tracing tools:
- strace allows you to see what system calls are being made.
- ltrace allows you to see what dynamic library calls are being made.
Though useful, these tools are limited. What if you want to trace what happens inside a system call or library call? What if you want to do more than just logging calls, e.g. you want to compile statistics on certain behavior? What if you want to trace multiple processes and correlate data from multiple sources?
This article shows you how to setup bpftrace and teaches you its basic usage. I’ll also give an overview of how the tracing ecosystem looks like (e.g. “what’s eBPF?”) and how it came to be what it is today.
In response to my last post about
dd, a friend of mine noticed that GNU
cp always uses a 128 KB buffer size when copying a regular file; this is also the buffer size used by GNU
cat. If you use
strace to watch what happens when copying a file, you should see a lot of 128 KB read/write sequences:
$ strace -s 8 -xx cp /dev/urandom /dev/null ... read(3, "\x61\xca\xf8\xff\x1a\xd6\x83\x8b"..., 131072) = 131072 write(4, "\x61\xca\xf8\xff\x1a\xd6\x83\x8b"..., 131072) = 131072 read(3, "\xd7\x47\x8f\x09\xb2\x3d\x47\x9f"..., 131072) = 131072 write(4, "\xd7\x47\x8f\x09\xb2\x3d\x47\x9f"..., 131072) = 131072 read(3, "\x12\x67\x90\x66\xb7\xed\x0a\xf5"..., 131072) = 131072 write(4, "\x12\x67\x90\x66\xb7\xed\x0a\xf5"..., 131072) = 131072 read(3, "\x9e\x35\x34\x4f\x9d\x71\x19\x6d"..., 131072) = 131072 write(4, "\x9e\x35\x34\x4f\x9d\x71\x19\x6d"..., 131072) = 131072 ...
As you can see, each copy is operating on buffers 131072 bytes in size, which is 128 KB. GNU
cp is part of the GNU coreutils project, and if you go diving into the coreutils source code you’ll find this buffer size is defined in the file src/ioblksize.h. The comments in this file are really fascinating. The author of the code in this file (Jim Meyering) did a benchmark using
dd if=/dev/zero of=/dev/null with different values of the block size parameter,
bs. On a wide variety of systems, including older Intel CPUs, modern high-end Intel CPUs, and even an IBM POWER7 CPU, a 128 KB buffer size is fastest. I used gnuplot to graph these results, shown below. Higher transfer rates are better, and the different symbols represent different system configurations.
BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above.
eBPF was described by Ingo Molnár as:
One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively.
BCC makes BPF programs easier to write, with kernel instrumentation in C (and includes a C wrapper around LLVM), and front-ends in Python and lua. It is suited for many tasks, including performance analysis and network traffic control.
Go is a great technology stack for building scalable, web-based, back-end systems for web applications.
When you think about building web applications and web APIs, or simply building HTTP servers in Go, does your mind go to the standard net/http package? Then you have to deal with some common situations like dynamic routing (a.k.a parameterized), security and authentication, real-time communication and many other issues that net/http doesn’t solve.
The net/http package is not complete enough to quickly build well-designed back-end web systems. When you realize this, you might be thinking along these lines:
- Ok, the net/http package doesn’t suit me, but there are so many frameworks, which one will work for me?!
- Each one of them tells me that it is the best. I don’t know what to do!
I did some deep research and benchmarks with ‘wrk’ and ‘ab’ in order to choose which framework would suit me and my new project. The results, sadly, were really disappointing to me.
I started wondering if golang wasn’t as fast on the web as I had read… but, before I let Golang go and continued to develop with nodejs, I told myself:
‘Makis, don’t lose hope, give at least a chance to Golang. Try to build something totally new without basing it off the “slow” code you saw earlier; learn the secrets of this language and make others follow your steps!‘.
These are the words I told myself that day [13 March 2016].
The same day, later the night, I was reading a book about Greek mythology. I saw an ancient goddess’ name and was inspired immediately to give a name to this new web framework (which I had already started writing) – Iris.
Two months later, I’m writing this intro.
memleax attachs a running process, hooks memory allocate/free APIs, records all memory blocks, and reports the blocks which live longer than 5 seconds (you can change this time by -e option) in real time.
So it is very convenient to use. There is no need to recompile the program or restart the target process. You run
memleax to monitor the target process, wait the real-time memory leak report, and kill it (e.g. by Ctrl-C) to stop monitoring.
memleax does not run along with the whole life of target process, it assumes the long-live memory blocks are memory leak. Downside is you have to set the expire threshold by -e option according to your scenarios; while the upside is the memory allocation for process initialization is skipped, besides of the convenience.
Dynamic tracing technology is a kind of post-modern advanced debugging techniques. It can help software engineers at a very low cost in a very short period of time, to answer some difficult questions about the software systems to more quickly troubleshoot and resolve problems. It is the rise of a large and prosperous background, we are in a rapid growth of the Internet age, as an engineer, faced with the challenge of two aspects: First, the number of size, regardless of the size of the user or the size of the room, are in the machine the rapid growth era. A second aspect of the challenge is the complexity. Our business logic more complex, we run the software systems are becoming more complex, and we know it will be divided into many, many levels, including the operating system kernel and above is a variety of system software, such as database and Web server, and then up virtual machines high-level scripting language or other language interpreter and real-time (JIT) compiler, various levels of abstraction on top of it is the business logic of the application level and a lot of complex code logic.
First off, let me start with a big thank you to all of you for your interest in sysdig! We have been overwhelmed by the positive response from the community, and by the quality of the comments, questions, and contributions we’re receiving.
For the uninitiated, sysdig is a system-level exploration and troubleshooting tool for Linux with native support for containers. In this post, I want to try to answer two important and recurring questions we’ve received:
- “How does sysdig work?”
- “How is this different from the plethora of tools already available to analyze a Linux system or the processes that run on top of it (SystemTap, LTTng, DTrace, strace, ktap to name few of them)?”
I’ll address both questions by providing a technical breakdown of sysdig’s architecture. But before doing that, let’s look at two very well-known tools: strace and DTrace.
“Improving web application performance is more critical than ever. The share of economic activity that’s online is growing; more than 5% of the developed world’s economy is now on the Internet (see Resources below for statistics). And our always-on, hyper-connected modern world means that user expectations are higher than ever. If your site does not respond instantly, or if your app does not work without delay, users quickly move on to your competitors.
For example, a study done by Amazon almost 10 years ago proved that, even then, a 100-millisecond decrease in page-loading time translated to a 1% increase in its revenue. Another recent study highlighted the fact that that more than half of site owners surveyed said they lost revenue or customers due to poor application performance.
How fast does a website need to be? For each second a page takes to load, about 4% of users abandon it. Top e-commerce sites offer a time to first interaction ranging from one to three seconds, which offers the highest conversion rate. It’s clear that the stakes for web application performance are high and likely to grow.
Wanting to improve performance is easy, but actually seeing results is difficult. To help you on your journey, this blog post offers you ten tips to help you increase your website performance by as much as 10x. It’s the first in a series detailing how you can increase your application performance with the help of some well-tested optimization techniques, and with a little support from NGINX. This series also outlines potential improvements in security that you can gain along the way…”