A gentle introduction to the erasure coding
The erasure coding is currently a very hot topic for distributed storage systems. It has been part of the Ceph roadmap for almost a year and Swift guys recently brought the discussion to the table. Both of them have planned to implement the erase code functionality so I thought it might be interesting to give a high level overview about the erasure coding principles. Before we start, I’d like to point out that I don’t take any credit for this article. I just read the wonderful white paper “Erasure Coding vs. Replication: A Quantitative Comparison” written by Hakim Weatherspoon and John D. Kubiatowicz from Computer Science Division University of California, Berkeley. Many thanks to them. While ready the paper, I found their introduction to the erasure code easy to understand for a novice like me :).
An erasure code provides redundancy without the overhead of strict repli-cation.
Erasure codes divide an object into
m fragments and recode them into
n fragments, where
r = m/n < 1 the rate of encoding.
r code increases the storage cost by a factor of
The key property of erasure codes is that the original object can be reconstructed from any
For example, using an
r = 1/4 encoding on a block divides the block into
m = 16 fragments and encodes the original
m fragments into
n = 64 fragments; in-creasing the storage cost by a factor of four.
Erasure codes are a superset of replicated and RAID systems.
For example, a system that creates four replicas for each block can be described by an (
m = 1,
n = 4) erasure code.
RAID level 1, 4, and 5 can be described by an (
m = 1,
n = 2, (
m = 4,
n = 5) and (m = 4
,n = 5) erasure code, respectfully.
Erasure coding in a malicious environment requires the precise identification of failed or corrupted fragments.
Without the ability to identify try to reconstruct the block; that is,
(n, m) combinations.
As a result, the system corrupted fragments, there is potentially a factorial combination of fragments to needs to detect when a fragment has been corrupted and discard it.
A secure ver- ification hashing scheme can serve the dual purpose of identifying and verifying each fragment.
It is necessarily the case that any
m correctly verified fragments can be used to reconstruct the block.
Such a scheme is likely to increase the bandwidth and storage requirements, but can be shown to still be many times less than replication.