Efficient File Copying On Linux

In response to my last post about dd, a friend of mine noticed that GNU cp always uses a 128 KB buffer size when copying a regular file; this is also the buffer size used by GNU cat. If you use strace to watch what happens when copying a file, you should see a lot of 128 KB read/write sequences:

$ strace -s 8 -xx cp /dev/urandom /dev/null
...
read(3, "\x61\xca\xf8\xff\x1a\xd6\x83\x8b"..., 131072) = 131072
write(4, "\x61\xca\xf8\xff\x1a\xd6\x83\x8b"..., 131072) = 131072
read(3, "\xd7\x47\x8f\x09\xb2\x3d\x47\x9f"..., 131072) = 131072
write(4, "\xd7\x47\x8f\x09\xb2\x3d\x47\x9f"..., 131072) = 131072
read(3, "\x12\x67\x90\x66\xb7\xed\x0a\xf5"..., 131072) = 131072
write(4, "\x12\x67\x90\x66\xb7\xed\x0a\xf5"..., 131072) = 131072
read(3, "\x9e\x35\x34\x4f\x9d\x71\x19\x6d"..., 131072) = 131072
write(4, "\x9e\x35\x34\x4f\x9d\x71\x19\x6d"..., 131072) = 131072
...

As you can see, each copy is operating on buffers 131072 bytes in size, which is 128 KB. GNU cp is part of the GNU coreutils project, and if you go diving into the coreutils source code you’ll find this buffer size is defined in the file src/ioblksize.h. The comments in this file are really fascinating. The author of the code in this file (Jim Meyering) did a benchmark using dd if=/dev/zero of=/dev/null with different values of the block size parameter, bs. On a wide variety of systems, including older Intel CPUs, modern high-end Intel CPUs, and even an IBM POWER7 CPU, a 128 KB buffer size is fastest. I used gnuplot to graph these results, shown below. Higher transfer rates are better, and the different symbols represent different system configurations.

buffer size

https://eklitzke.org/efficient-file-copying-on-linux

Advertisements