Openstack: unexplained high CPU load on compute nodes

Recently I encountered a weird issue: the CPU load of my hosts was quite high with a load average of 20 and picks to 40. First, I had a look at the guest system since the high load was generated by guests. Unfortunately the VM was not doing anything, no IOPS and no CPU.

I. Issue

The setup is based a huge LUN mapped on the compute nodes. Nova uses the libvirt_image_type = lvm disk driver, so the VMs disks are backed by LVM blocks. The same goes for Cinder (LVM + iSCSI driver).

OpenStack guest disks (through libvirt) are configured with the following default options:

type='raw' cache='none'

Thus all the guest’s IOs go with o_direct to the host disks.

I assumed it was a storage issue. The only significant course of action available was to take a look at the interrupts. Indeed the number were interrupts quite high.

In the end, I had not found anything particularly relevant. However while digging into libvirt KVM options, I came across an option called io, which defines the IO behavior. Then I decided to give it a shot. KVM (then OpenStack) defaut is io='threads', so I opted for the second value available:io='native' which according to Google seems to bring better performance and less IOs overhead.

Surprisingly this option did the trick as it dramatically decreased the load on the computes.


II. Quick and dirty hack for OpenStack

Around the line 450 in /usr/lib/python2.7/dist-packages/nova/virt/libvirt/config.py append:


self.disk_total_iops_sec = None
self.io = "native"

def format_dom(self):
dev = super(LibvirtConfigGuestDisk, self).format_dom()

dev.set("type", self.source_type)
dev.set("device", self.source_device)
if (self.driver_name is not None or
self.driver_format is not None or
self.driver_cache is not None):
drv = etree.Element("driver")
if self.driver_name is not None:
drv.set("name", self.driver_name)
if self.driver_format is not None:
drv.set("type", self.driver_format)
if self.driver_cache is not None:
drv.set("cache", self.driver_cache)
if self.io is not None:
drv.set("io", self.io)
dev.append(drv)

Usually both options are recommended for:

  • io=native for block device based VMs.
  • io=threads for file-based VMs.

W Important note from Red Hat:

Direct Asynchronous IO (AIO) that is not issued on filesystem block boundaries, and falls into a hole in a sparse file on ext4 or xfs filesystems, may corrupt file data if multiple I/O operations modify the same filesystem block. Specifically, if qemu-kvm is used with the aio=native IO mode over a sparse device image hosted on the ext4 or xfs filesystem, guest filesystem corruption will occur if partitions are not aligned with the host filesystem block size. Generally, do not use aio=native option along with cache=none for QEMU. This issue can be avoided by using one of the following techniques:

  • Align AIOs on filesystem block boundaries, or do not write to sparse files using AIO on xfs or ext4 filesystems.
  • KVM: Use a non-sparse system image file or allocate the space by zeroing out the entire file.
  • KVM: Create the image using an ext3 host filesystem instead of ext4.
  • KVM: Invoke qemu-kvm with aio=threads (this is the default).
  • KVM: Align all partitions within the guest image to the host’s filesystem block boundary (default 4k).


We submitted a patch upstream https://review.openstack.org/#/c/117442/

Comments