Updated 3/28/14: More changes
Updated 3/27/14: Clarified tweak applicability
Updated 12/17/15: Added Windows Server 2012 info
Last time, I took a look at 10GbE NAS performance, courtesy of QNAP's TS-470 Pro. In this installment, I'll be focusing on SMB3.
SMB (server message block) is the core of Microsoft’s and, more recently, Apple’s networking. Qnap, like many NAS manufacturers, uses open source SAMBA code to provide SMB services on its NASes.
It is important to know that SAMBA’s implementation of SMB3 does not have all the bells and whistles found in Windows 8.1 and MS Server 2012's SMB3 implementation. The single most significant difference is that Microsoft’s SMB3 provides for multi-channel communication. I”ll discuss this difference in more detail later in this series. For now, there are two key implications.
First, performance increases are significant as you compare SMB1, 2 and SMB3. In my tests, this difference is quite obvious using 10GbE. The implication is that Windows 8.1 or Server 2012 clients will see faster transfer speeds from NASes like QNAP's TS-870 Pro and TS-470 Pro running QTS 4.1 (which supports SAMBA’s SMB3 implementation) than previous versions of Windows. For now, MacOS Mavericks appears to be using SMB2, so will not benefit from SMB3 enhancements.
Second, Microsoft’s SMB3 multichannel is perhaps the most significant feature added to the SMB protocol in over 25 years. In simple terms, an older SMB2 connection in Windows 7 can be compared to a single lane highway and data traveling on this highway loaded into a 1995 half ton pick-up truck. A video file uploaded over 10GbE to a server is limited by this single lane highway and the older truck.
SAMBA’s SMB3 gives us the 2013 turbo-charged truck that can travel faster, carry a larger load and use less fuel doing it. Microsoft’s SMB3 multichannel takes this up a notch by adding multiple lanes, so now we can have more of our turbo charged trucks (threads) on the highway simultaneously. Our video file upload carried by these trucks moves much quicker as a result.
The number of channels increases with the computer’s number of CPU cores and network connections. If we upgrade from a four-core i5 processor to an i7 with 8 logical cores, we effectively increase from 4 to 8 lanes on our single 10GbE Ethernet link. If we add a second network link, Microsoft’s SMB3 implementation detects this and automatically adds more lanes. You can read more about SMB3 multichannel here.
In an impressive example of Microsoft’s SMB3 multichannel, the screen grab below shows 1480 MB/s in a real world copy/paste between two Windows 8.1 workstations, each sharing an 8 GB RAM drive. How was this possible? In a hands-down easiest network configuration ever, I simply connected the workstations with two 10GbE Ethernet cables using Intel X-540-T2 dual-port 10GbE network interface cards. Zero configuration was required on either the workstations or 10GbE switch.
Wowza! 1480 MB/s transfer
For anyone who has configured NIC teaming or LACP link aggregation, the result above should amaze you. For really inexpensive high speed networking, two eBay quad port 1GbE cards plus four CAT5e LAN cables and you’re at 480MB/s with zero configuration! In my 20 plus years of networking, I’ve never seen something just work like this.
Setting up Windows SMB3 10GbE
First of all, while testing, uninstall antivirus software. Don’t just disable it, uninstall it. Add it back in only once you are done with all other testing. Bitdefender, for example, in default configuration will dramatically slow down 10GbE connections. Microsoft Security Essentials seems to have little effect, but it’s not my recommended AV recommendation either.
Jumbo frames and driver tuning should be on your list of things to do for best 10GbE performance. I typically never touched these settings in 1GbE environments, however the images below make it very clear that jumbo frames and driver tuning are well worth the effort to configure. In test one I’ve used default network driver settings and standard frames.
Performance with no tweaks
The second result shows clearly the performance increase to be had by tweaking network driver settings, and enabling jumbo frames.
Performance with all tweaks
As far as driver tweaks on Intel 10GbE NICs in the Windows environment, you may want to look at RSS queues, which according to Intel, should be set to match your computers logical core count. This may make a difference in SMB3 multichannel performance in Windows 8 and Server 2012. This Intel page describes Intel's 10GbE adapter advanced settings.
SMB3 has default values that can be tuned using Powershell commands. By default, each NIC will get four TCP/IP sessions, up to a maximum of 8 channels per client/server connection. RSS tuning in turn will affect how this workload is distributed over the available CPU cores. I would suggest that you set up a few ram drive shares and experiment with file copies in your environment to see what works best. If you are just setting up a 10GbE NAS in a MacOS environment, start with jumbo frames.
I also increased the transmit and receive buffers to their maximums, and enabled most performance options. These tweaks generally will use more RAM and CPU resources, but increase 10GbE transfer speeds. On a heavily loaded server, you may want to dial your settings back.