31 October, 2022
Explaining SymplyPERIFERY: Object Storage
Welcome to Part 2 of our short series explaining SymplyPERIFERY.
Object storage. It’s just… the best. Welcome to the second video in our series explaining SymplyPERIFERY - our S3 native scalable storage system.
Within SymplyPERIFERY our unit of content is the object. But what is object storage? In traditional file systems data is stored in a hierarchical structure. You put a file in a folder inside another folder inside another folder and your applications have to go through that structure to find what they need. With object storage the structure is flat. Think of it as a table. Onto our table we place objects - these can be videos, photos, animations, music, documents, Premiere Pro projects, Maya files - basically anything you work with on your computer. These objects are kept with a set of metadata - like a Post It note that describes the object. And along with that there’s a unique identifier assigned to the object so you know it’s name. So we have a table with a bunch of things on it with labels. Great. At the end of the table is a journal - this is a small file that contains a list of the unique IDs and corresponding metadata. Using the journal you can look for some metadata (for example, “man jumping off a boat”) and find all the object IDs with the metadata associated with them. Then you just grab those numbered objects from the table. This is what makes object storage so powerful. Descriptive items in a flat structure so they’re not going to get lost. Slow for a few pieces of data, incredibly fast at scale.
In SymplyPERIFERY our table is a disk. Many disks make up a node and many nodes form a cluster to create one, massive, flat table on which we put our objects.
There are no traditional file systems inside of SymplyPERIFERY. File Systems just aren’t robust or fast enough to scale into the hundreds of billions of objects. Physically, an object stored in SymplyPERIFERY is a contiguous sequence of bytes on a raw disk, not dissimilar to how data is written to LTO tape. I.e in a linear fashion.
Data streams written to the disks - either segments of erasure coded data or replicas - are stored in similar ways. Consisting of two parts, the data and metadata, these are written strictly once and always remain physically encapsulated together. The metadata part contains two classes of headers: those that drive SymplyPERFIERY behaviour and custom ones that are inserted by an application. Objects are written to the disk on a single “writing front” of free space. Deleted objects are simply marked as such and space is reclaimed asynchronously afterwards by the health checking process.
When objects are written, a record is also added to a small journal in the front section of the disk, this makes the disk totally self describing. At boot-up time, that journal serves to repopulate an extremely fast RAM index that holds pointers where a given object is located. Using our table example we have a big directory at the front and can look for any object in it. That big directory is made up of the journals kept on all the individual tables. This explains one of the most surprising and appealing characteristics of the SymplyPERIFERY architecture: the capability of knowing, within a couple of milliseconds, on which node, disk, and sector a given object is located among hundreds of billions of others in the cluster.
Armed with this knowledge SymplyPERFIERY can position the right disk drive arm at the front of the object and read it from the disk, often in one single I/O operation. Executing a comparable operation on a traditional file system typically takes tens of IOPS. This way, importantly, the response time to access an object is entirely independent of the number of nodes or the object count in the cluster, which is truly exceptional.
This unique approach combined with our parallel architecture means that performance from only a handful of nodes can scale rapidly into gigabytes per second. But what do we mean by parallel architecture?
Think of it like straws in a glass: the more straws you add the more data can flow into and out of the glass. In a traditional file system there’s only one straw. The client side has to break up the files and send them in lots of smaller parts to the storage. SymplyPERIFERY, on the other hand, uses many straws to suck up up all the parts at the same time and stitch them back together. In this manner hundreds of terabytes per day can easily be uploaded to the cluster.
So SymplyPERIFERY is an object store with a parallel, linearly scalable architecture. Add more nodes and you get more storage and more speed. This is a feature that’s pretty unique to our solution.
We’re nearly done with our little mini series here - just one more video to go. Next time we’ll talk about multi tenancy and how SymplyPERIFERY can operate within your storage management conventions.