OpenStack High Availability 1/??

First article of a long serie to build an highly available OpenStack platform. This one is more a state of art about the OpenStack HA.

The main idea is to build a clustered cloud. A clustered cloud? This can be really useful particulary if you don’t have a lot of servers at your disposal. It’s really important to keep the KISS principle. A clustered cloud is just an infrastructure based on a cloud operating system (like OpenStack) which uses a clustered management software such as Pacemaker and a replication layer like DRBD. Pacemaker and corosync are 2 amazing solution for building cluster. Pacemaker is a cluster management and Corosync manages the communication layer.

#I. Nova components

Bringing high-availability to the nova componentis is not an easy task. Specially because some of them are really critical.

nova-api
nova-scheduler
nova-consoleauth
nova-cert

Since there is currently no resource agent available I thought about started with LSB agent and maybe writing a resource agent later.

There is three remaining components:

nova-compute: at the moment the main idea is to setup at least 2 compute node and use the live migration. It’s not high-availability. Since the cloud is design for failure, simply trust this mechanism.
nova-network: idealy hosted on the same node as the nova-compute service.
nova-volume: see the table below

nova-volume

	Object storage	Block storage	POSIX filesystem	HA	Scale-in	Scale-out	OpenStack driver	Production ready
Local LVM
Nexenta
NFS
SAN
Sheepdog
Swift
Ceph
GlusterFS

#II. Identity service: Keystone

The company hastexo provides a resource agent compatible pacemaker for Keystone.

#III. Dashboard

The Hoziron dashboard is based on the Django framework and natively hosted on Apache. Pacemaker provides a resource agent for apache.

#II. Glance

First recommandation here is to setup a 2 nodes pacemaker cluster active/passive with the resource agent available. Thoses ra are provided by hastexo, many thanks.

I didn’t try them yet, but soon enough!

#III. Queues

RabbitMQ offers a native active/active built-in clustering system which is really easy to setup. For more information take a look to the rabbitmq article. I will realease an article about the rabbitmq HA soon. I already test it on bare-metal.

#IV. Databases

I’m a pretty big fan of the Galera replicator. It’s also supported by Percona, I’m using it for most of my setups. I think Galera is currently the best master-master replication solution. Check my previous article about it

I think it’s a good way to start!

Sébastien Han

OpenStack High Availability 1/??

nova-volume

Comments