31 October, 2022
Explaining SymplyPERIFERY: Elastic Content Protection
Welcome to Part 4 of our short series explaining SymplyPERIFERY.
Hello! And welcome. This is the fourth video in our short series explaining the workings of our S3 native scalable storage solution: SymplyPERIFERY. Let’s have a quick recap: SymplyPERIFERY is a physical appliance that you install in your own facility. It works using DataCore’s Perifery software to run a hyper-efficient storage architecture that’s resilient to ransomware, and can be scaled as needed.
In this video we’re going to talk about Elastic Content Protection and how it works to protect your data.
Elastic Content Protection is a feature that combines automated management of data replication and erasure coding, with continuous integrity checks and fast volume recovery. An installation of SymplyPERIFERY comprises as many nodes as needed to meet your required capacity. And since all those nodes work together using distributed algorithms, Elastic Content Protection gets faster and more efficient the larger your cluster grows. Essentially you have more computerised brains looking through your data.
So what’s the difference between data replication and erasure coding? And why support both?
Let’s start with the easy one: Replication. Also known as Copy-Based Protection.
Roll back the clock to the very early days of object storage systems, and replication is the simplest form of protection for data redundancy.
Replication protects data by maintaining two or more copies - replicas - of every object on different nodes in a cluster. For instance, if your replication policy says that three replicas of data must exist, any two drives or nodes can fail at any time without loss of data or even availability. Which is good, but costly. It means that to have three replicas of your data you need three times the storage capacity. So if you want to store a petabyte of content you would need three petabytes of storage! Although replicas are a secure way of protecting data, the process comes at a high capacity cost, although it is high performance.
And this is where Erasure Coding comes in.
You might be familiar with RAID - Redundant Array of Inexpensive Disks. This is block based parity protection (as in blocks on the disk) where data is distributed across an array where one or more drives can fail safely. Erasure Coding is similar to RAID, but works on a per-object level.
The advantage of Erasure Coding is that it provides enterprise-grade data protection at a lower storage footprint. Using Erasure Coding, SymplyPERIFERY breaks a file into several parts and computes parity segments that relate to the data. The resulting total number of segments uses less capacity and operational resources than creating a replica or multiple replicas of a file. So there’s a capacity saving in storing the data over replication, with the added benefit of a higher level of data protection.
So files are split into segments and then redundant parity segments are calculated based on the content. All the resulting segments are distributed to different disks on different nodes. Should any drives or even nodes fail then the remaining nodes work together to heal the data - providing full protection once again. The cluster can be configured to meet any requirements for uptime, with as much redundancy (or “durability” in object speak) as required.
With SymplyPERIFERY there’s a wide choice of Erasure Coding schemes for any object in the cluster. These include 4+2, 6+3, 5+2, and 7+3 - pretty much any scheme you want. But those are just numbers. What do they mean?
Let’s take 5+2 or a “5 out of 7” scheme. Your object is broken into five data segments with two calculated parity segments. This data is then distributed across seven nodes. If drives fail or data gets damaged, any five segments can be used to rebuild the original object. So two simultaneous disk or node failures can be survived without data loss.
So having made an excellent case for how great Erasure Coding is, why do we also support replicas as part of our Elastic Content Protection method?
Erasure Coding is less efficient at working with small files. Typically, for anything under 1MB in size the CPU overhead for calculating parity information is high and so is the storage capacity required to keep object segments. So for small files it’s more efficient to create multiple copies. Replication.
But we can also use replication to create an offsite disaster recovery cluster or even create multi-way replication for collaboration and data locality. With SymplyPERIFERY data is replicated on a domain-by-domain basis; you can choose what data to replicate and to where - you can even replicate to public cloud services like SymplyNEBULA.
So Elastic Content Protection enables true flexibility for simultaneous and synergistic use of both Replication and Erasure Coding to protect data. SymplyPERIFERY can set a threshold for object size, underneath which small objects will automatically be replicated rather than erasure coded. This allows for maintaining a suitable degree of durability and space utilisation regardless of object size. Replication and Erasure Coding policies in SymplyPERIFERY can be freely selected cluster wide at initialization, can be set on a per bucket or object basis, and can automatically change over time.
In summary, Elastic Content Protection is the name we give to a combined use of Erasure Coding and Replication. You can break apart large files and store them across nodes for redundancy, and at the same time make copies of smaller files for the same reason. It’s all about using the right tool for the right job.
Next up we’ll talk about how SymplyPERIFERY meets key compliance regulations.