Tools Storage Spaces Direct Lead image: Lead Image © Twwx, photocase.com

Storage Spaces Direct with different storage media

Colorful Mix

In Windows Server 2016, software-defined storage lets you combine several volumes to create a central storage pool; then, you can use a combination of HDD, SSD, and NVMe SSD storage media to divide the pool into different volumes. By Thomas Joos

In Windows Server 2016, a Storage Spaces Direct (S2D) can comprise several hard disks, but also several servers, that can be connected to a cluster to increase data storage flexibility. S2D is certainly the most important innovation in the Windows Server 2016 storage universe. The system lets you combine locally allocated storage space from cluster nodes to create shared virtual storage in a cluster. The storage can then be used as a shared data carrier (e.g., for storing data from virtual servers).

The local volumes of the cluster nodes are used as data storage in a cluster with S2D. S2D requires a cluster with at least three hosts. With four hosts, the technology supports mirrored resiliency. If you want parity-based resiliency, at least four or more hosts are required. S2D is protected against host failure by default. The technology can even cope with the failure of an entire rack, including the servers it houses, depending on the configuration and the number of servers in the cluster.

Three Storage Technologies

In Windows Server 2016, three storage tiers can be used in the storage spaces: NVMe-based solid-state drives (SSDs) (NVM Express), SSDs, and hard disk drives (HDDs). Windows Server 2012 R2 supports only two storage tiers in storage pools and storage spaces. NVMe memory is used to cache data, and SSDs and HDDs are used for traditional data storage and archiving. You can also create different combinations of storage tiers with these three data carrier types.

S2D sees Microsoft competing with VMware's Virtual SAN, in which drives of several servers in a cluster are used as shared data storage. The focus is primarily on virtualization environments. For example, the storage locations of the virtual hard disks belonging to the virtual machines (VMs) in a Hyper-V cluster no longer represent a single point of failure if they are located on S2D, whose hard disks also replicate to different servers. If a server fails, the data is still available. On this basis, VMs can be saved, and companies can use Hyper-V replication in combination with S2D and volume replication. In this case, the data is not just kept highly available in the S2D, the entire storage pool is replicated to another data center. As a filesystem, you will want to use the ReFS filesystem rather than NTFS, because it is more stable and is already set up for storage spaces.

Basics of S2D

As mentioned before, S2D lets you combine conventional data carriers and bundle different storage technologies to create more storage space with higher speeds. NVMe memory can be mixed with traditional SSDs and HDDs in S2D. Windows Server 2016 distributes the data ideally – and even automatically in some cases. If you also use a scale-out file server (SOFS) as a cluster service, you can store shares on S2D, manage them within the SOFS, and make them available on the network. Microsoft's Storage Replica [1] can in turn replicate the data from S2D, for example, in other data centers and to other clusters.

To create an S2D, Windows Server 2016 merges the physical hard disks into a virtual storage pool. On the basis of this cross-device storage pool, virtual hard disks can be created and used for data storage in the cluster. The installation therefore involves several steps.

S2D is initially based on a cluster with nodes that have different physical data carriers. Servers allow SSDs, NVMe disks, and traditional HDDs to work together and across different filesystems. Communication between the data carriers of the various clusters is handled by the SMB protocol, including SMB Multichannel and SMB Direct. The connection is established by the Software Storage Bus in Windows Server 2016. Storage pools, which in turn combine the physical hard disks of the individual cluster nodes into one or more virtual stores, build on this.

The next layer is Storage Spaces, virtual disks that build on the storage pools, which in turn are based on the physical disks of the cluster nodes. The C:\ClusterStorage cluster shared volume is also connected to the S2D, and the data in this directory of cluster nodes is stored in the S2D.

Adding Hard Disks

The internal hard disks of the cluster nodes represent the elementary part of the data storage. For S2D, servers should have at least two additional hard disks. You must install the File Server server role and the cluster features on all servers that are to become members of the cluster for S2D. The easiest way to do this is in PowerShell:

> Install-WindowsFeature -Name File-Services, Failover-Clustering -IncludeManagementTools

Disks for S2D also must be displayed as online and initialized in disk management. Partitions should not be created. After installing the role and cluster feature and initializing the hard disks, the servers can be restarted, but they do not have to be. If a restart is required, PowerShell will display the appropriate message.

If you enter the Get-Physicaldisk command on a cluster node in PowerShell, it displays the disks of all cluster nodes and indicates that they are suitable for pooling. However, this does not work until you have enabled S2D in the cluster. As mentioned before, the hard disks should not have their own partitions. As part of the configuration verification wizard in the Failover Cluster Manager, you will find a separate test that ensures that S2D can be used in the cluster (Figure 1). You will want to run this before proceeding with the setup to ensure that everything can be configured correctly. Alternatively, you can use PowerShell.

Figure 1: In the Failover Cluster Manager, you can test the suitability of the cluster nodes for S2D.

Another advantage of S2D is support for nano servers with Windows Server 2016, which you can use to build clusters and create very small but efficient Hyper-V clusters. Once the cluster is set up, the function can be activated in PowerShell:

Enable-ClusterStorageSpacesDirect

As part of the setup, you can enter

> Test-Cluster -Node <Node1,Node2,Node3,Node4> -Include "Storage Spaces Direct",Inventory,Network,"System Configuration"

to test cluster nodes in PowerShell for cluster capability and support for S2D.

Automated Configuration

When you use the Enable-ClusterStorageSpacesDirect cmdlet [2], PowerShell creates an automated configuration based on the hardware grouped in S2D. For example, the cmdlet creates the storage pool and the appropriate storage tiers when SSDs and traditional HDDs are added to the system. In such a configuration, the NVMe part is used for hot data storage, while SSDs and HDDs are available for storing less frequently used data (cold data).

To use S2D in production environments, you need network cards that support RDMA. In test environments, virtual servers, virtual hard disks, and virtual network cards can also be used without special hardware. Once the storage is set up, create one or more storage pools in the Failover Cluster Manager in the Storage | Pools area (Figure 2). These include the various volumes of the cluster nodes. The PowerShell syntax is shown in Listing 1.

Listing 1: Creating New Storage Pools

> New-StoragePool -StorageSubSystemName <FQDN of the Subsystems> -FriendlyName <StoragePoolName> -WriteCacheSizeDefault 0 -FaultDomain-AwarenessDefault StorageScaleUnit -ProvisioningTypeDefault Fixed -ResiliencySettingNameDefault Mirror -PhysicalDisk (Get-StorageSubSystem -Name <FQDN of the Subsystems> | Get-PhysicalDisk)

Figure 2: In the Failover Cluster Manager, the physical hard disks of the cluster nodes can be combined into a single Storage Spaces Direct.

The cluster does not initially have a shared data store. Once you have created the cluster and activated S2D, the necessary storage pool is created and then the storage spaces are created. Next, you use the storage pool to create virtual hard disks, also known as "storage spaces." The cluster manages the underlying storage structure; therefore, file servers and Hyper-V hosts do not need to know on which physical disks the data is actually stored.

Distributing Data by Storage Tiers

You can also create your own storage tiers within S2D. Windows detects and automatically stores frequently used files in the SSD/NVMe area of the storage space. Less frequently used files are offloaded to slower hard disks. Of course, you can also control manually what kind of files should be available on the fast disks.

In Windows Server 2016, you can use three storage tiers: NVMe, SSD, and HDD. However, you can also create different combinations of these three volume types and define corresponding storage tiers. The commands for this are:

> New-StorageTier -StoragePoolFriendlyName Pool -FriendlyName SSD-Storage -MediaType SSD
> New-StorageTier -StoragePoolFriendlyName Pool -FriendlyName HDD-Storage -MediaType HDD

As soon as the storage pool is available in the cluster, you can create new virtual hard disks – the storage spaces – using the context menu of the pools in the Failover Cluster Manager. The wizards also let you specify the availability and storage layout of the new storage space, which is based on the created storage pool. On the basis of the storage space, then, create a new volume, just as in a conventional storage pool. In turn, you can add volumes to the cluster through the context menu (e.g., to store your VM data).

Reliability of S2D

S2D is protected against host failure. Given the right number of cluster nodes, several can fail without affecting S2D. Entire enclosures or racks – or even entire data centers – can fail if the data can be replicated between a sufficient number of cluster nodes, and you can rely on storage replication under certain circumstances.

In Windows Server 2016, you can also use storage replication to replicate an entire S2D directly to other clusters and data centers. A TechNet blog post [3] explains how the fail-safe mechanism for S2D and the associated virtual hard disks work.

By default, high availability is enabled when creating a storage pool. The FaultDomainAwarenessDefault option and its StorageScaleUnit default value play an important role. I will return to the "fault domain" later. You can display the value for each storage pool at any time by typing:

> Get-StoragePool -FriendlyName <Pool-Name> | FL FriendlyName, Size, FaultDomainAwarenessDefault

Virtual disks (i.e., the storage spaces in the storage pool of the S2D environment) inherit high availability from the storage pool in which they are created (Figure 3). You can also view the high availability value of storage spaces with PowerShell:

> Get-VirtualDisk -FriendlyName <VirtualDiskName> | FL FriendlyName, Size, FaultDomainAwareness, ResiliencySettingName

Figure 3: Virtual hard disks, also known as storage spaces, are created on the basis of the storage pool that extends across the cluster's various physical hard disks.

A virtual hard disk comprises extents of 1GB, so a hard disk with 100GB comprises 100 extents. If you create a virtual hard disk with the "mirrored" high availability setting, the individual extents of the virtual disk are copied and stored on different cluster nodes.

Depending on the number of nodes used, two or three copies of the extents can be distributed to the data stores of the various cluster nodes. If you back up a 100GB virtual disk by creating triple copies, the disk requires 300 extents. Windows Server 2016 tries to distribute the extents as evenly as possible. For example, if extent A is stored on nodes 1, 2, and 3, and extent B, on the same virtual hard disk, is copied to nodes 1, 3, and 4, then a virtual hard disk and its data and extents are distributed across all the nodes in the entire cluster.

Microsoft also offers the option of building an environment for S2D with just three hosts, which is of interest for small businesses or in test environments. With four hosts, the technology supports mirrored resiliency. If you want parity-based resiliency, four or more hosts are required. S2D is protected against host failure by default.

Windows Server 2016 works with "fault domains," which are groups of cluster nodes that share a single point of failure. A fault domain can be a single cluster node, cluster nodes in a common rack or housing, or all cluster nodes in a data center. You can manage the fault domains with the new Get, Set, New, and Remove cmdlets of the ClusterFaultDomain command. For example, to display information on existing fault domains, you can use:

> Get-ClusterFaultDomain
> Get-ClusterFaultDomain -Type Rack
> Get-ClusterFaultDomain -Name "server01.contoso.com"

You can also work with different types. For example, the following commands are available to create your own fault domains:

> New-ClusterFaultDomain -Type Chassis -Name "Chassis 007"
> New-ClusterFaultDomain -Type Rack -Name "Rack A"
> New-ClusterFaultDomain -Type Site -Name "Shanghai"

You can also link fault domains or subordinate fault domains to other fault domains:

> Set-ClusterFaultDomain -Name "server01.contoso.com" -Parent "Rack A"
> Set-ClusterFaultDomain -Name "Rack A", "Rack B", "Rack C", "Rack D" -Parent "Shanghai"

In larger environments, you can also specify the fault domains in an XML file and then integrate them into the system. You can also use the new Get-ClusterFaultDomainXML cmdlet; for example, to save the current fault domain infrastructure in an XML file, enter:

> Get-ClusterFaultDomainXML | Out-File <Path>

You can also customize XML files and import them as a new infrastructure:

> $xml = Get-Content <Path> | Out-String Set-ClusterFaultDomainXML -XML $xml

Optimizing Storage Pools

When used for longer periods of time, individual hard disks in the storage pool can be exposed to more stress than others. Additionally, new physical hard disks that are integrated into the system need to be integrated appropriately. Microsoft has new cmdlets that let you optimize storage pools to distribute the data more efficiently,

Optimize-StoragePool <Pool-Name>

and you can query the current status of the action:

Get-StorageJob | ? Name -eq Optimize

Conclusions

Thanks to S2D, you can combine the volumes of all cluster nodes into a single data store and work with three different storage tier types. Windows Server 2016 distinguishes between NVMe SSDs, SSDs, and HDDs. With the use of these three media, access to data carriers can be accelerated by many times.