In my last post, I described some reasonably-priced hardware to use for your first 10GbE network. But before you build up a 10GbE network, you’ll want to assemble a toolkit of programs to help test, configure and verify your 10GbE performance.
Used on two workstations, NTttcp has a server and client mode to quickly test 10GbE speeds. Once you have downloaded the tool and copied it to a directory—my example uses c:\nettest—you can use the commands below on each workstation.
On the two machines, run a command prompt with administrative privileges. You can run this command on the “server”:
c:\nettest\ntttcp.exe -s -m 16,*,192.168.0.140 -l 128k -a 2 -t 20
And this command on the “receiver”:
c:\nettest\ntttcp.exe -r -m 16,*,192.168.0.140 -l 128k -a 2 -t 20
Note that the receiver machine in this case has an IP address of 192.168.0.140. You must specify the receiver’s IP address on the server instance. There are many options to set with NTttcp. However, the commands listed above will produce output shown below.
NTttcp test result
You can see from the output, that we’re using jumbo frames, average throughput is 1181MB/s (the maximum I have observed for a single 10GbE connection), and CPU usage was at 10% or so during the tests.
If you see numbers less than this, a problem exists with a network cable, connection or NIC. If you are testing 10GbE performance over Cat5e cable, NTttcp will be quite effective as a test tool.
10GbE networks can move a lot of data, quickly. Consequently, unless you are running a large hard disk RAID array (8 or more disks), or an SSD RAID (4 or more disks), your computer’s hard disk will likely limit 10GbE speeds if you are testing transfer rates. Ramdisk software allows you to create a small, but extremely fast “hard disk” using some of your computer’s RAM. On a computer with 16GB of RAM, you can assign 12GB to a ramdisk, format the ramdisk and then share it like any other hard drive.
SoftPerfect RAM Disk
With two workstations, both configured with ramdisks, you will be able to accurately assess 10GbE performance without the limitations of local hard drives. I found simply copy/paste between host and target ramdisks a quick easy test for network driver tuning. A ramdisk on a workstation is also an easy way to quickly expose physical disk performance limits of a target NAS or server RAID over 10GbE.
Softperfect Ramdisk is easy to use and fast. It’s important to note that ramdisk software is not created equal and will differ in performance. I found ImDisk for example about 20% slower than SoftPerfect’s free offering.
I found HWiNFO worked best to check a motherboard’s PCIe slot actual speed and version. Why is this important? PCIe is the highway to your computer’s CPU. Plugging a 10GbE card into the wrong slot in your computer could significantly limit its performance..and drive you a bit crazy trying to figure out why. If you are adding a 10GbE card to a workstation, or building new, the information HWiNFO can provide is important!
There are limited PCIe slots available on PC motherboards, and these run at different speeds, often depending on what is plugged into them. Video cards, PCIe SSD drives, RAID cards and 10GbE cards all require PCIe slots on your computer’s motherboard. These cards will all require x4, x8, or x16 PCIe lanes to get information to the CPU.
Depending on your motherboard chipset, you may only have 16 total “lanes” of PCIe highway available to your CPU. If you plan on running dual video cards and a 10GbE card, you should consider an X79 chipset motherboard, like Asus’ P9X79 WS board, which has 32 PCIe lanes available.
You will notice that this board has six PCIe slots, and also requires an LGA 2011 socket chip. This processor has more pins than the typical LGA 1150/55 if you were to examine its underside, and therefore can support 32 PCIe lanes vs the typical 16.
PCIe slots can be a bit confusing, as there are several specs (PCIe 1, 2, and 3) that dictate how fast each lane can run and different slot widths. To add insult to injury, slots with the same widths (like x16 used by most video cards these days) can and do often run at different speeds if multiple slots are populated. Note that any x1, x2, x4, x8 card will work just fine plugged into a x16 slot.
Based on testing, and research, here are a few helpful tidbits:
- Most video cards are quite happy running with a PCIe v2 or v3 slot at x8 speed, even though they are sold as x16 speed cards
- A single port X540 PCIe NIC requires a PCIe v2 slot running at minimum x4 speed
- A dual port X540 PCIe NIC requires a PCIe v2 slot running at minimum x8 speed
So why all the techno-babble above? If your computer only has 16 PCIe lanes available to the CPU and has two PCIe 16x slots, your video card will be in one slot, normally running quite happily at x16 speed. Now you add an X540 card to the second PCIe slot, and suddenly your video card will be running at x8 speed, since your motherboard has dutifully cut the video card slot speed to x8, to give your 10GbE NIC x8 lanes.
You might despair at the thought of your fancy video card running at x8 instead of x16. However, many gaming/rendering tests online show little to no performance difference. Conversely, let’s assume you have two video cards and a RAID card installed already. You may not be able to run a 10GbE network card without PCIe speed issues.
Your motherboard manual will provide all of this information and HWiNFO will report your actual slot speeds and capabilities. Keep in mind that very often you will need to access the computer BIOS to set slot speeds, particularly with shared PCIe slot configurations. Make sure you check your PCIe slot speeds!
Parkdale is simple utility for measuring drive write and read performance and runs on mapped networked drives. It copies data back and forth between targets so is not a synthetic benchmark. However, it is limited to fixed file sizes. I found it quick and useful for comparative testing.
Here’s a look at a local SSD RAID 0 drive array using Parkdale.
Parkdale test result
ATTO Disk Benchmark
ATTO Disk Benchmark is commonly used for hard disk benchmarking, so the question is, why are we talking about it here? The program can be fooled into thinking a remote network drive is local and therefore provides a good idea of how a RAID array accessed over 10GbE will perform.
The SUBST command can be used to create a drive mapping that ATTO will see as local. Assuming you have an m: drive not already assigned, the command below (run from the command prompt) will do the trick:
subst m: \\NAS_name\disk_share
The screengrab below shows a six disk RAID 0 array analyzed by ATTO Disk Benchmark over a 10GbE connection.
ATTO test result
Intel NAS Performance Toolkit (NASPT)
There are many NAS tests already published (including those here on SmallNetBuilder) using Intel’s NASPT. It uses real world (although dated) program traces to evaluate network storage performance. Here’s an analysis of Qnap’s TS-470 Pro over 10GbE:
Intel NASPT test result
One of the problems with this tool is that system RAM over 2 GB will skew some benchmark results considerably. Therefore you need to limit your system RAM to 2 GB using tools like msconfig (type in “msconfig” to run the command as illustrated below) if you want realistic results using NASPT:
Setting Windows memory limit
Adobe PPBM5 and PPBM6
If you are interested in evaluating how well video editing and encoding might work over a 10GbE network using Adobe products, Bill Gehrke and Harm Millaard’s Premiere Pro Benchmark for CS5 (PPBM5) and Premiere Pro Benchmark for CS6 (PPBM6) tools are worth checking out. I’ll cover my results testing with these tools later in this series.
You can download the older test series at http://ppbm5.com/Instructions.html or the newer (takes longer to run) version at http://ppbm7.com/index.php/homepage/instructions
That’s it for now. Next time, we’ll look at how QNAP’s TS-470 Pro does as a 10GbE-connected NAS.
Dennis Wood is Cinevate’s CEO, CTO, as well as Chief Cook and Bottle Washer. When not designing products, he’s likely napping quietly in the LAN closet.