31 October, 2022
Explaining SymplyPERIFERY: How Nodes Work
Welcome to Part 3 of our short series explaining SymplyPERIFERY.
Welcome to the third video in our short series explaining SymplyPERIFERY - our S3 native in-facility answer to scalable storage. Over the next few minutes we’re going to talk about what makes SymplyPERIFERY totally secure, and how it maximises usable capacity. Let’s get started.
SymplyPERIFERY was designed from day one to include only the fundamental components needed to perform its tasks. Those tasks are things like transferring data, finding asset locations, and moving files. To make the storage nodes faster we load software to perform those tasks into the RAM of each node - not onto its disks. This software image is loaded at boot time onto every node in your storage cluster through a process of network booting. This is where an instance of the software running the nodes is taken from a central location and loaded when you turn everything on. It also means that you’ll never have to patch or upgrade the software thanks to that booting process. So once you’ve installed a new node, that’s it. You’re ready to use its capacity.
After booting up, the nodes are locked down so they only understand the system’s RESTful HTTP subset commands and SNMP for management. There are no accounts on a storage node, so no SSH or other remote logins that could be used to compromise data security.
Because we load node software into RAM it leaves the entire raw disk space available for storage. Let’s pause here and take a step back to look at what makes up an individual node. The term “node” is what we use to describe a single rack-mounted server chassis full of drives, memory, and networking ports. The disks make up your storage capacity, the memory is used to run software, and the ports - naturally - are used to connect nodes to each other and to your wider network.
When you combine multiple nodes together you get a cluster. If we were to use disk space to store software for running the nodes (which are essentially computers) we would lose overall space for your data. That’s not great. Which is why running software from RAM is so important. And it’s efficient, too.
It also means that there’s no file system to corrupt, limit performance, or compromise storage efficiency. This may seem like an obvious thing to do, but many storage manufacturers use file systems in their object solutions - this wastes space and adds complexity - including databases and the need to add expensive flash storage. With SymplyPERFIERFY, holding hundreds of millions of small objects on a single disk is no problem at all because the information about them is held in RAM. Lets go back to our table example from the last video: a large, flat surface full of labelled items. Your storage software knows what objects are on which part of the table and where to go looking for them. When you include a file system it adds boxes to the table for us to put objects in - and that takes up space. What we do with SymplyPERIFERY is to have a method where your software just picks up the object you’re looking for straight from the table.
Right. Information in a storage cluster can be located in microseconds using a system called Zero IOPS, a process by which a distributed RAM-based index is populated at boot time from journals held on the disks. That sounds complicated so let’s break it down.
Objects are stored with their metadata. This means that our disks are totally self-describing, rather like an LTFS tape, and disks can actually be moved between clusters if required with all the data remaining intact. When you turn on a node the software loaded into its RAM has a rummage through the storage inside it and creates a list of what’s there. The journal. Then it shares that journal with the rest of the nodes in your cluster to create a full index of what objects are stored where. So the journal is a partition on each disk that holds the data about the objects on that disk. At boot time the journal is read and that data is used to populate the RAM index. The RAM index is shared across the cluster. This means you can ask your storage system for an object and it will immediately know where it is - no asking every file system on every node to have a look around.
Nodes themselves are interconnected with high performance 25 Gigabit Ethernet to form the cluster. No configuration is required as the nodes feature auto-discovery, auto-provisioning, auto-loading and capacity self-balancing that’s governed by a patented smart algorithm. So scaling large clusters is a breeze. Nodes can be added to a cluster in a matter of minutes, with the system taking care of understanding the extra space.
Now we should talk a bit about the form factor of those nodes. SymplyPERIFERY is available in 1U, 2U, and 5U configurations. All of which can be mixed and matched to provide the best storage strategy for your data needs. The cluster you build can support a mixture of these nodes as required and can grow organically, re-balancing data on the fly across all the nodes in the cluster.
And since all the nodes are running the same code and fulfilling the same tasks, any node can be asked to perform read, write, delete, or info operations on any object in the cluster. Back to our tables - lots of tables storing lots of objects and each knowing where everything is. Every object has a universally unique identifier (a UUID). So you can ask Node A for Object X and it will find out for you that Node B is holding the most accessible instance of X, redirect your request to B and get out of the way. There’s no proxy stuck in the middle adding latency and complexity.
Write operations happen in a similar way: talk to node A and it finds a convenient node on which to store your object based on available resources.
Essentially, SymplyPERIFERY does the hard work for you. It’ll keep an eye on every single object stored in your cluster and locate new objects in order to make the best use of available space. Each node runs its own version of DataCore’s Perifery software in its RAM rather than taking up space on the valuable disks that store your data. Which is what you buy storage for in the first place. And because there’s no file system there’s even more storage space available and that space simply can’t be infected by malware or ransomware because there’s no file system to infect or encrypt.
Next time we’ll talk about Elastic Content Protection and why it’s good for object storage.