Design the network for a Storage Spaces Direct cluster

In a Storage Spaces Direct cluster, the network is the most important part. If the network is not well designed or implemented, you can expect poor performance and high latency. All Software-Defined are based on a healthy network whether it is Nutanix, VMware vSAN or Microsoft S2D. When I audit S2D configuration, most of the time the issue comes from the network. This is why I wrote this topic: how to design the network for a Storage Spaces Direct cluster.

Network requirements

The following statements come from the Microsoft documentation:

Minimum (for small scale 2-3 node)

  • 10 Gbps network interface
  • Direct-connect (switchless) is supported with 2-nodes

Recommended (for high performance, at scale, or deployments of 4+ nodes)

  • NICs that are remote-direct memory access (RDMA) capable, iWARP (recommended) or RoCE
  • Two or more NICs for redundancy and performance
  • 25 Gbps network interface or higher

As you can see, for a 4-Node S2D cluster or more, Microsoft recommends 25 Gbps network. I think it is a good recommendation, especially for a full flash configuration or when NVMe are implemented. Because S2D uses SMB to establish communication between nodes, RDMA can be leveraged (SMB Direct).

RDMA: iWARP and RoCE

You remember about DMA (Direct Memory Access)? This feature allows a device attached to a computer (like an SSD) to access to memory without passing by CPU. Thanks to this feature, we achieve better performance and reduce CPU usage. RDMA (Remote Direct Memory Access) is the same thing but across the network. RDMA allows a remote device to access to the local memory directly. Thanks to RDMA the CPU and latency is reduced while throughput is increased. RDMA is not a mandatory feature for S2D but it’s recommended to have it. Last year Microsoft stated RDMA increases S2D performance about 15% in average. So, I recommend heavily to implement it if you deploy a S2D cluster.

Two RDMA implementation is supported by Microsoft: iWARP (Internet Wide-area RDMA Protocol) and RoCE (RDMA over Converged Ethernet). And I can tell you one thing about these implementations: this is war! Microsoft recommends iWARP while lot of consultants prefer RoCE. In fact, Microsoft recommends iWARP because less configuration is required compared to RoCE. Because of RoCE, the number of Microsoft cases were high. But consultants prefer RoCE because Mellanox is behind this implementation. Mellanox provides valuable switches and network adapters with great firmware and drivers. Each time a new Windows Server build is released, a supported Mellanox driver / firmware is also released.

If you want more information about RoCE and iWARP, I suggest you this series of topics from Didier Van Hoye.

Switch Embedded Teaming

Before choosing the right switches, cables and network adapters, it’s important to understand what is the software story. In Windows Server 2012R2 and prior, you had to create a teaming. When the teaming was implemented, a tNIC was created. The tNIC is a sort of virtual NIC but connected to the Teaming. Then you were able to create the virtual switch connected to the tNIC. After that, the virtual NICs for management, storage, VMs and so on were added.

In addition to complexity, this solution prevents the use of RDMA on virtual network adapter (vNIC). This is why Microsoft has improved this part with Windows Server 2016. Now you can implement Switch Embedded Teaming (SET):

This solution reduces the network complexity and vNICs can support RDMA. However, there are some limitations with SET:

  • Each physical network adapter (pNIC) must be the same (same firmware, same drivers, same model)
  • Maximum of 8 pNIC in a SET
  • The following Load Balancing mode are supported: Hyper-V Port (specific case) and Dynamic. This limitation is a good thing because Dynamic is the appropriate choice for most of the case.

For more information about Load Balancing mode, Switch Embedded Teaming and limitation, you can read this documentation. Switch Embedded Teaming brings another great advantage: you can create an affinity between vNIC and pNIC. Let’s think about a SET where two pNICs are member of the teaming. On this vSwitch, you create two vNICs for storage purpose. You can create an affinity between one vNIC and one pNIC and another for the second vNIC and pNIC. It ensures that each pNIC are used.

The design presented below are based on Switch Embedded Teaming.

Network design: VMs traffics and storage separated

Some customers want to separate the VM traffics from the storage traffics. The first reason is they want to connect VM to 1Gbps network. Because storage network requires 10Gbps, you need to separate them. The second reason is they want to dedicate a device for storage such as switches. The following schema introduces this kind of design:

If you have 1Gbps network port for VMs, you can connect them to 1Gbps switches while network adapters for storage are connected to 10Gbps switches.

Whatever you choose, the VMs will be connected to the Switch Embedded Teaming (SET) and you have to create a vNIC for management on top of it. So, when you will connect to nodes through RDP, you will go through the SET. The physical NIC (pNIC) that are dedicated for storage (those on the right on the scheme) are not in a teaming. Instead, we leverage SMB MultiChannel which allows to use multiple network connections simultaneously. So, both network adapters will be used to establish SMB session.

Thanks to Simplified SMB MultiChannel, both pNICs can belong to the same network subnet and VLAN. Live-Migration is configured to use this network subnet and to leverage SMB.

Network Design: Converged topology

The following picture introduces my favorite design: a fully converged network. For this kind of topology, I recommend you 25Gbps network at least, especially with NVMe or full flash. In this case, only one SET is created with two or more pNICs. Then we create the following vNIC:

  • 1x vNIC for host management (RDP, AD and so on)
  • 2x vNIC for Storage (SMB, S2D and Live-Migration)

The vNIC for storage can belong to the same network subnet and VLAN thanks to simplified SMB MultiChannel. Live-Migration is configured to use this network and SMB protocol. RDMA are enabled on these vNICs as well as pNICs if they support it. Then an affinity is created between vNICs and pNICs.

I love this design because it really simple. You have one network adapter for BMC (iDRAC, ILO etc.) and only two network adapters for S2D and VM. So, the physical installation in datacenter and the software configuration are easy.

Network Design: 2-node S2D cluster

Because we are able to direct-attach both nodes in a 2-Node configuration, you don’t need switch for storage. However, Virtual Machines and host management vNIC requires connection so switches are required for these usages. But it can be 1Gbps switches to drastically reduce the solution cost.

Post a Comment

0 Comments