Today I noticed a nice thing about using “raw” hard disk images with QEMU: the images are sparse files. What does it mean? Sparse files are an useful feature of many file system formats that saves space in case of files that contain chunks of zero data. My host’s file system is formatted with ext4, so it does support sparse files. The real file size can be checked with “-s” option of “ls” command; for example, when creating the image:
$ qemu-img create -f raw debian-testing.img 10G Formatting 'debian-testing.img', fmt=raw size=10737418240 $ ls -lsh debian-testing.img 0 -rw-r--r-- 1 francesco francesco 10G 2011-05-08 15:16 debian-testing.img
The first number (zero) is the actual size of the file on disk; it is still zero because nothing has been written on it yet. I used this image to install Debian “squeeze” testing distribution on a guest QEMU, and this is the same command after a full installation:
$ ls -lsh debian-testing.img 4.2G -rw-r--r-- 1 francesco francesco 10G 2011-05-08 15:12 debian-testing.img
When the guest operating system writes into its emulated hard disk, real space is used on the host file system. If the guest tries to read a block of the emulated hard disk that has not been written yet, it will return zeros.
The used disk space inside the guest system is actually smaller than 4.2G because, after a while, the guest operating system will have written and removed many files on its own file system, but the real space will remain written and allocated even for removed files. Some of that space can be reclaimed with a trick that temporarily uses more space on the host. In the guest operating system, run as root:
# dd if=/dev/zero of=/tmp/tmpzeros
It will run for a while, and fill the emulated hard disk with zero values. The real size of the file on the host filesystem will grow to reach its maximum. It should stop when there’s no more disk space, but it can also be interrupted with Ctrl-C. Then run:
# sync # rm /tmp/tmpzeros
Shut down the guest, then on the host run:
$ qemu-img convert -f raw -O raw debian-testing.img debian-testing-0.img
It will create a new sparse file that will be smaller than the original because the all-zeros blocks will not be written in the new image. The guest system can now use “debian-testing-0.img” as its hard disk image without noticing any change, and the original “debian-testing.img” can be removed.
The tradeoff between qcow2 format and raw format is about image size vs performance, but I have never took into consideration the fact that raw images are sparse files.