Updated 3/28/14: More changes
Updated 3/27/14: Clarified tweak applicability
Updated 12/17/15: Added Windows Server 2012 info
Last time, I took a look at 10GbE NAS performance, courtesy of QNAP's TS-470 Pro. In this installment, I'll be focusing on SMB3.
SMB (server message block) is the core of Microsoft’s and, more recently, Apple’s networking. Qnap, like many NAS manufacturers, uses open source SAMBA code to provide SMB services on its NASes.
It is important to know that SAMBA’s implementation of SMB3 does not have all the bells and whistles found in Windows 8.1 and MS Server 2012's SMB3 implementation. The single most significant difference is that Microsoft’s SMB3 provides for multi-channel communication. I”ll discuss this difference in more detail later in this series. For now, there are two key implications.
First, performance increases are significant as you compare SMB1, 2 and SMB3. In my tests, this difference is quite obvious using 10GbE. The implication is that Windows 8.1 or Server 2012 clients will see faster transfer speeds from NASes like QNAP's TS-870 Pro and TS-470 Pro running QTS 4.1 (which supports SAMBA’s SMB3 implementation) than previous versions of Windows. For now, MacOS Mavericks appears to be using SMB2, so will not benefit from SMB3 enhancements.
Second, Microsoft’s SMB3 multichannel is perhaps the most significant feature added to the SMB protocol in over 25 years. In simple terms, an older SMB2 connection in Windows 7 can be compared to a single lane highway and data traveling on this highway loaded into a 1995 half ton pick-up truck. A video file uploaded over 10GbE to a server is limited by this single lane highway and the older truck.
SAMBA’s SMB3 gives us the 2013 turbo-charged truck that can travel faster, carry a larger load and use less fuel doing it. Microsoft’s SMB3 multichannel takes this up a notch by adding multiple lanes, so now we can have more of our turbo charged trucks (threads) on the highway simultaneously. Our video file upload carried by these trucks moves much quicker as a result.
The number of channels increases with the computer’s number of CPU cores and network connections. If we upgrade from a four-core i5 processor to an i7 with 8 logical cores, we effectively increase from 4 to 8 lanes on our single 10GbE Ethernet link. If we add a second network link, Microsoft’s SMB3 implementation detects this and automatically adds more lanes. You can read more about SMB3 multichannel here.
In an impressive example of Microsoft’s SMB3 multichannel, the screen grab below shows 1480 MB/s in a real world copy/paste between two Windows 8.1 workstations, each sharing an 8 GB RAM drive. How was this possible? In a hands-down easiest network configuration ever, I simply connected the workstations with two 10GbE Ethernet cables using Intel X-540-T2 dual-port 10GbE network interface cards. Zero configuration was required on either the workstations or 10GbE switch.
Wowza! 1480 MB/s transfer
For anyone who has configured NIC teaming or LACP link aggregation, the result above should amaze you. For really inexpensive high speed networking, two eBay quad port 1GbE cards plus four CAT5e LAN cables and you’re at 480MB/s with zero configuration! In my 20 plus years of networking, I’ve never seen something just work like this.
Setting up Windows SMB3 10GbE
First of all, while testing, uninstall antivirus software. Don’t just disable it, uninstall it. Add it back in only once you are done with all other testing. Bitdefender, for example, in default configuration will dramatically slow down 10GbE connections. Microsoft Security Essentials seems to have little effect, but it’s not my recommended AV recommendation either.
Jumbo frames and driver tuning should be on your list of things to do for best 10GbE performance. I typically never touched these settings in 1GbE environments, however the images below make it very clear that jumbo frames and driver tuning are well worth the effort to configure. In test one I’ve used default network driver settings and standard frames.
Performance with no tweaks
The second result shows clearly the performance increase to be had by tweaking network driver settings, and enabling jumbo frames.
Performance with all tweaks
As far as driver tweaks on Intel 10GbE NICs in the Windows environment, you may want to look at RSS queues, which according to Intel, should be set to match your computers logical core count. This may make a difference in SMB3 multichannel performance in Windows 8 and Server 2012. This Intel page describes Intel's 10GbE adapter advanced settings.
SMB3 has default values that can be tuned using Powershell commands. By default, each NIC will get four TCP/IP sessions, up to a maximum of 8 channels per client/server connection. RSS tuning in turn will affect how this workload is distributed over the available CPU cores. I would suggest that you set up a few ram drive shares and experiment with file copies in your environment to see what works best. If you are just setting up a 10GbE NAS in a MacOS environment, start with jumbo frames.
I also increased the transmit and receive buffers to their maximums, and enabled most performance options. These tweaks generally will use more RAM and CPU resources, but increase 10GbE transfer speeds. On a heavily loaded server, you may want to dial your settings back.
10GbE Tweaking Details
The following tweaks can increase 10GbE throughput by up to 200 MB/s. First the adjustments on Intel's X540-T1 NIC. If you have a -T2 NIC and are using both channels, be sure to make the changes on both sets of properties.
First, set jumbo packets (frames) to the maximum 9014 Bytes.
Intel X540 NIC tweak - jumbo frame
Then set Receive Side Scaling (RSS) queues to match the CPU logical core count. On an i7 based computer with hyper-threading enabled (you may have to turn this on via computer’s BIOS), you should see 8 cores.
Intel X540 NIC tweak - RSS queues
And make sure RSS is enabled.
Intel X540 NIC tweak - RSS enable
Next, disable Virtual Machine Queues if you see this option (Server 2012 only).
Intel X540 NIC tweak - Virtualization
Then increase receive buffers to the maximum (4096).
Intel X540 NIC tweak - RCV buffers
And also increase transmit buffers. Their maximum is 16384.
Intel X540 NIC tweak - XMIT buffers
Once you have made these changes, you should see some impressive speeds. Remember to create and share RAM disks on both your target and host machines while testing, to eliminate disk speed limitations.
As a teaser, your 10GbE workstation will also work quite nicely as a 4K player just in case you have a 4K display handy. For many of you, just purchasing a 10GbE enabled NAS unit is all you’ll need for a server. Just remember to enable jumbo frames and check that SMB3 is enabled! On QNAP’s lastest firmware, you’ll find those settings here:
QNAP Jumbo Frame setting
QNAP SMB3 enable
In the next installment of this series, I’ll show you how to build a Windows 8.1 video editing workstation and Windows 2012 shared storage server affordably capable of maxing out a 10GbE connection.
Dennis Wood is Cinevate’s CEO, CTO, as well as Chief Cook and Bottle Washer. When not designing products, he’s likely napping quietly in the LAN closet.