Nuts and Bolts Storage Protocols Lead image: Photo Johannes Plenio on Unsplash
Photo Johannes Plenio on Unsplash
 

Storage protocols for block, file, and object storage

Evolutionary Theory

The future of flexible, performant, and highly available storage. By Norbert Deuschle

Current developments such as computational storage or storage class memory as future high-performance storage are receiving a great deal of attention, but you still must understand whether, and to what extent, block and file storage, storage area network (SAN), network-attached storage (NAS), object storage, or global clustered filesystems continue to provide the basis for the development of new technologies – especially to assess possible implications correctly for your own IT environment.

In the storage sector in particular, experts and manufacturers bandy about abbreviations and technical terms, and the momentum in the pace of development also seems undiminished. Above all, the speed at which innovations enter the market is surprising. On the other hand, the storage protocols that guarantee data access at the block or file level are extremely old, although they still provide the technological basis for being able to use data storage sensibly at all. At the same time, a number of new questions are emerging: Will there be a radical break in the transport layer at some point? Will we have to deal with more and more technology options that exist in parallel? To find an answer, an assessment of further development in storage protocols is helpful.

Block Storage as SAN Foundation

Classic block-level storage protocols are used for data storage on storage networks – typically Fibre Channel (FC) storage area network (SAN) – or cloud-based storage environments – typically Internet SCSI (iSCSI). Nothing works in the data center without block storage, but how does block storage itself work?

Storage data is divided into blocks that are assigned specific identifiers as separate units. The storage network then deposits the data blocks where it is most efficient for the respective application. When users subsequently request their data from a block storage system, the underlying storage system reassembles the blocks and presents them to the user and the application.

The blocks can be stored on different systems, and each block can be configured to work with different operating systems (e.g., Linux, Windows). Also, one block can be formatted for NFS and one for SMP. Block storage thus decouples data from user environments, adding a layer of abstraction. Because the information can be distributed flexibly across multiple environments with this method, multiple data access paths exist that allow users to retrieve data more quickly. In principle, however, this procedure is more complex than a relatively easy-to-configure NAS system.

Direct-attached storage (DAS) has some advantages but also has limitations depending on the application profile. Depending on the implementation, the advantages relate to reduced latency times by block-level access, uncomplicated operation, and relatively low costs because of limited management overhead. Disadvantages relate to limited scaling of capacity and performance, as well as limited application availability.

For a boost, a second host must be connected. The data availability on the JBOD (Just a Bunch of Disks) or array level can be improved by RAID. In addition to SCSI, SATA and serial-attached SCSI (SAS) are usually found as common protocols in the DAS environment. With DAS, the server always controls access to storage. Server-based storage is a growing trend, which you can observe in combination with non-volatile memory express (NVMe) flash, big data apps, NoSQL databases, in-memory computing, artificial intelligence (AI) applications, and software-defined storage.

The SAN, unlike DAS, is a specialist high-speed network that provides access to connected storage devices and their data from block-level storage. Current SAN implementations comprise servers and hosts, intelligent switches, and storage elements interconnected by specialized protocols such as Fibre Channel or SCSI. SANs can span multiple sites to improve business continuity for critical application environments.

A SAN uses virtualization to present storage to the connected server systems as if it were connected locally. A SAN array provides a consolidated storage resource pool, typically based on virtual LUNs, which are shared by multiple hosts in cluster environments.

SANs are mostly still based on the Fibre Channel protocol. However, Fibre Channel over Ethernet (FCoE) and convergence of storage and IP protocols over one connection are also options. With SANs, you can use gateways to move application data between different storage network technologies as needed.

Evergreen iSCSI, Newcomer NVMe

iSCSI executes the SCSI storage protocol over an Ethernet network connection with TCP. Mostly, iSCSI is used locally or in the private cloud environment for secondary block storage applications that are not very business critical. Really critical applications typically use robust and low-latency FC SANs that are consistently separated from the application network – or they already use NVMe for particularly performance-intensive I/O workload profiles, but more on this later.

High data integrity, low-latency transmission performance, and features such as buffer flow control enable Fibre Channel to define critical business objectives and consistently meet quality of service levels. The protocol is also suitable as an NVMe transport layer, supporting both SCSI and NVMe traffic on a fabric simultaneously. Existing Gen5 (16Gbps) and Gen6 (32Gbps) FC SANs can run FC NVMe over existing SAN fabrics with little change, because NVMe meets all specifications, according to the Fibre Channel Industry Association (FCIA).

The situation is different in the hyperscaling data centers of large cloud providers, which for cost reasons alone (standardization, capacities, etc.) are currently (still) relying on iSCSI block storage and Ethernet protocols with 25, 50, or 100Gbps, although NVMe is also becoming more attractive for more performance and new service offerings. In the context of software-defined infrastructures, Ethernet will remain the first choice for the foreseeable future in the breadth of all installations for reasons of standardization and cost.

In the highly specialized HPC environment, on the other hand, InfiniBand is often used on premises; it is significantly more powerful in terms of latency times and scalable throughput, but also costs more. Additionally, support for hypervisors and operating systems, as well as drivers and firmware, is limited. iSCSI as block-level storage runs most frequently over Ethernet with TCP but can also be set up over InfiniBand.

iSCSI runs on standard network cards or special host bus adapters, either with iSCSI extensions for remote direct memory access (iSER) or with the help of a TCP offload engine that has implemented not only the IP protocol but also parts of the SCSI protocol stack to accelerate data transfer. iSCSI workload support has been expanded with network adapters for I/O performance purposes with iSCSI (hardware) offload, TCP offload, or both engines. In the first case, the host bus adapter offloads all iSCSI initiator functions from the host CPU. In the second case, the adapter offloads TCP processing from the server kernel and CPU. The most important advantage of iSCSI in practice is that all common operating systems or hypervisor implementations and storage systems support it, which is currently not yet the case for NVMe over fabrics (NMVeOF).

NVMeOF and Block-Level Storage

As I/O protocols, NVMe and NVMeOF are significantly leaner than SCSI or iSCSI in terms of overhead and are therefore also faster. If significantly more performance in the form of lowest I/O latencies is required, NVMe is the optimized protocol for the server connection with native PCIe flash storage with DAS.

NVMeOF as a scalable network variant enables data to be transferred between hosts and flash storage over a storage network based on Ethernet (the corresponding protocols are called RoCE and iWARP), Fibre Channel, or InfiniBand (Table 1). Currently, as with iSER, NVMeOF Ethernet remote direct memory access (RDMA) end nodes can only interoperate with other NVMeOF Ethernet end nodes that support the same Ethernet RDMA transport. NVMeOF end nodes are not able to interoperate with iSCSI or iSER end nodes.

Tabelle 1: Storage Protocol Performance Criteria

Protocol

Latency

Scalability

Performance

Distribution

Fibre Channel

Low

Yes

High

Common

RoCEv2*

Very low

Yes

High

Insignificant

iWARP

Medium

Yes

Medium

Insignificant

TCP

High

Yes

Medium

Sometimes (with iSCSI)

InfiniBand

Very low

Restricted

High

Rare

*RoCE, remote direct memory access over converged Ethernet.

NVMe(OF) eliminates SCSI as a protocol and has lower latencies than iSCSI. Although hard disk and SSD arrays often still use the common SCSI protocol, performance is dramatically improved without the SCSI overhead. For example, command queuing in SCSI supports only one queue for I/O commands, whereas NVMe allows up to 64,000. Each queue, in turn, can service up to 64,000 commands simultaneously. Additionally, NVMe simplifies commands on the basis of 13 specific NVMe command sets designed to meet the unique requirements of NVM devices.

NVMe latency was already about 200ms less than 12Gb SAS when the technology was introduced. Additionally, the more efficient instruction set made it possible to reduce CPU load by more than 50 percent compared with SCSI. The situation is similar for sequential reads and writes: Because of the high bandwidth, six to eight times higher I/O performance values can usually be achieved for sequential reads and writes compared with SATA SSDs.

Block storage based on NVMeOF can be implemented over Ethernet TCP/IP, Fibre Channel, Ethernet RDMA, or InfiniBand fabrics. The RDMA option provides the fastest performance, but all versions of NVMeOF are already faster than iSCSI, which is why flash storage vendors are increasingly starting to move to NVMeOF. Ultimately, it remains to be seen which technology options gain widespread acceptance over time. NVMeOF/RDMA that are still being developed are iWARP, InfiniBand, NVMeTCP, and RoCEv2 ("Rocky").

Future of Block-Level Storage

At first glance, questions relating to the future of block storage may sound strange in the context of what has been said so far, but they are justified in the longer term considering the innovation dynamics mentioned at the beginning. The reason is simple: If you analyze the extremely rapid growth of semi-structured and unstructured data, you find that Internet of Things (IoT), artificial intelligence, and machine learning data; video files; images; audio files; and the like are growing disproportionately.

This development is just the beginning of an almost explosive trend. According to IDC, more than 80 percent of all data stored worldwide could be in archives by 2024 [1]. The growth of file and object storage systems significantly faster than iSCSI, Fibre Channel, and NVMe-based block-level storage would not be without consequences for the further development and potential market growth of such systems.

On the other hand, experience to date shows that, with few exceptions, complementary technologies coexist on the market for a long or even very long time, not least to avoid radical breaks in the critical IT infrastructure. In the meantime, the probability that NVMeOF will displace the iSCSI protocol for high-performance block storage access to flash media is relatively high, whereas the increasing importance of file and object storage will negatively affect both block storage-based FC SANs and iSCSI storage networks. This trend is already visible today.

Plus and Minus of NAS and Object Storage

File-based systems configured as a DAS solution or as NAS are easy to implement and operate, which explains their great popularity for decades. However, file storage (filers) can only scale arbitrarily by adding more systems, not by simply adding more capacity. The inherent disadvantage of NAS approaches is the almost linear increase in complexity and cost as unstructured data volumes increase sharply.

One reason is that, with common NAS systems, data is organized within a fixed folder hierarchy whose paths quickly become complex and long. The architecture is therefore not a very good match for rapidly growing unstructured data volumes in the double-digit multi-petabyte or exabyte range. For this reason, development in the direction of clustered scale-out filesystems has been going on for some time. In general, high-performance filesystems are the platform for scalable storage infrastructures, whether as software-defined storage such as Ceph on an open source basis or as a manufacturer-specific implementation, of which many are on the market today.

In contrast to NAS, object-based storage has a flat structure for data management. Files are divided into individual areas and distributed by server systems (nodes). These objects are not stored as files in folders or even blocks on servers, but in a repository, and are linked with the associated metadata (i.e., Global Namespace architecture), which allows scaling to very large data volumes and is ideal for storing unstructured data formats. However, object storage is not suitable for classic database environments because writing takes far too long compared with block storage. Native programming of a cloud-based application in conjunction with Amazon Simple Storage Service (S3) as the object storage API can also be far more complex than using file storage.

Unified Storage

Global filesystem implementations go one step further and are designed as hybrid cloud storage per se. Although the cloud is used as a central data repository, the system is logically presented as if it were an on-premises NAS system. This approach offers cost advantages over traditional on-premises solutions, as well as compared with popular public cloud storage. Today, powerful global filesystems are already capable of storing file data in the cloud as objects. This approach allows users to access files as they would on a standard NAS while the majority of the data resides on a cost-optimized, highly scalable object backend.

The limitations of traditional NAS and SAN systems for the aforementioned workloads are increasingly driving enterprises today to look for object-based storage solutions that support (global) filesystem capabilities. An object-based storage solution with integrated native file management capabilities therefore makes the transition from NAS to object storage interesting and opens up a wealth of new application possibilities, from backup and disaster recovery to compliance-compliant archiving and highly secure, centrally consolidated cloud services for global file access and file synchronization. Compared with the file services offered by large cloud providers, they are then a genuine economic alternative and ideally lead to greater IT acceptance in the company.

Conclusions

Today, developers and users demand more flexibility, performance, and availability from the IT infrastructure. The reason is the increasing dynamism in the area of new applications, data formats, users, and changing workload profiles as they emanate from IoT or AI projects in the course of digitalization. Because of the strong growth of semi-structured and unstructured datasets at multi-petabyte orders of magnitude, the classic SAN and NAS architecture is reaching its technical and economic limits.

The storage protocols mentioned here will of course be affected in different ways, and choices are also likely to depend significantly on the dominant use cases in the long term. However, one thing is clear: Consolidation on system architectures with integrated "unified" cloud protocol capabilities (file, block, object) on software-defined platforms and with clustered filesystems will continue at an increasing pace.