2/27/2009: Revisions from vendor feedback
If you have or are thinking of buying a BYOD (Bring Your Own Drive) NAS, you might think that you can throw any ol’ drive that can fit into it. But NAS vendors spend a lot of time and effort to come up with lists of qualified or approved drives for their products.
But these lists don’t include every possible drive. So you are left with the question: Should I stick with my NAS vendor’s list of approved drives?
I queried three vendors who put their products’ reputations on the line by producing only BYOD NASes: Synology; Thecus and QNAP. I also contacted NETGEAR, whose ReadyNAS line depends on a mix of factory and user installed drives.
I asked them all three questions:
- Why do vendors need to qualify drives for NASes?
- What does their drive qualification consist of?
- What risk do users take from not using vendor qualified drives?
The main reasons given for the need to qualify drives were compatibility, reducing troubleshooting and support variables, reliability and eliminating known problem drives. Of these, compatibility and reliability are probably the chief concerns. The next section describing the qualification procedures will clarify these further.
NETGEAR and QNAP also pointed out the need to ensure a match between a drive’s intended application and its use. NETGEAR noted that its ReadyNAS home (Duo, NV+ and Pro Pioneer Edition) and business versions carry three and five year warranties respectively. So they choose their factory-installed drives accordingly. This can result in noisier business systems that run hotter, but that’s the price paid for performance.
A visit to the ReadyNAS Hard Disk Compatibility List page, however, doesn’t break out the drives according to product line. NETGEAR said the list is applicable to both product lines, but typically is used by people buying the empty chassis "home" products. They plan to establish separate home and business approved drive lists as they broaden their "business" product line and qualify new drives and drive vendors also continue to separate their product lines.
QNAP said they take drive vendor application recommendations into account for their approved drive lists. They didn’t get into specifics, citing confidentiality issues. But they said that drive vendors’ application recommendations do affect QNAP’s drive choices.
Thecus cited four main reasons for qualification: drive firmware bugs; power consumption; thermal issues and SATA spec compliance. They mentioned two notable cases of problems with Western Digital and Seagate drives that had to be addressed via drive firmware upgrades. For power, startup power is a bigger concern than steady-state draw, since the power spike produced during drive spin-up can cause drives to not come up in time to be mounted and result in RAID failure.
The SATA spec compliance issue touches on issues caused by host-controller work-arounds for non-compliant drives. According to Thecus, while all SATA drives claim SATA I or II compatibility, "almost all" SATA host controller device drivers implement workarounds for some drives and don’t work at all with other "black list" drives.
If one of the black list drives is used, it may fail to work or be detected by some SATA host controllers. And even if a drive is in a controller’s " workaround table", it may not be able to operate at full speed with that controller. So to avoid possible data loss or performance degradation, these drives aren’t included in approved drive lists.
Qualification processes tended to reflect the reasons given for the need to qualify. Synology was the least forthcoming with specifics, saying that the information is proprietary. But they said they test "several behavioral and physical attributes such as RAID, hibernation, stress and power behavior, thermal generation and length of time the drive lasts under a data burn".
When I asked if they could provide specifics about problems or disqualifiers, Synology said that they have "yet to find a specific characteristic that disqualifies a drive" and that drives may have "new behaviors" including power requirements, disk parameter changes, and "green drive" spin up/down rates.
2/27/2009: Since no other vendor provided as much qual process detail as NETGEAR, I have revised the description at NETGEAR’s request.
NETGEAR, on the other hand, was the most forthcoming with information about their qual process. They provided a detailed list of 14 tests included in their process, which include checks for proper XRAID and XRAID2 initialization and expansion, operating temperature and noise level, hot-swap, drive spin-down and restart and long-term data integrity.
I asked for more detail about the XRAID and XRAID2 checks and found they include verifying that a secure erase command (done on previously-used drives) properly executes in a live RAID environment. They also check that partitions are created properly and that the RAID, LVM and file system layers are created correctly during the actual expansion process.
QNAP’s test process includes:
- Make a RAID array
- Checking HDD standby & resuming from standby
- Hot-swap check
- Checking HDD SMART and volume information/ status
- File copy test
The SMART test mainly checks that the information is correctly passed from the hard drive. QNAP doesn’t do anything with the information in its NAS OS besides report it. They also use "some popular PC-based SMART applications" to verify that SMART information is reported properly. They also said that they found in one case that their SMART testing found a drive that unmounted during the test.
Other issues that QNAP’s qual process has found are abnormal volume status reports, drives that won’t go into standby or come out of standby and drives that are consistently recognized during boot. The latter issue is caused by drives that take longer to spin up than the mount delay built into the NAS.
Thecus’ main qualification process seems to focus on RAID creation, migration and rebuild. Drives are put into a system and the following checks are run:
- Create a RAID 5 array
- Perform RAID migration to stress every disk drive
- Force a degraded RAID (pull and reinsert a drive) and let it auto rebuild
- Power cycle the NAS 50 times and verify that RAID status comes up healthy
- Create RAID 0 with highest capacity HDD to check RAID capacity limit
Thecus also does an "eat your own dog food" test (my description, not Thecus’) to ensure RAID stability by fully loading up a NAS with the highest capacity drives available (1.5 TB currently) and using it as their main file server.
So that hot new 1.5 TB drive isn’t on your NAS’ approved drive list. But you bought it anyway, installed it and it seems to work fine. So, no problem, right? Or even worse, you stuck to the drives on the approved list, but still had a mysterious RAID array failure. So why bother with the list?
The main reason is that drives on the list have made it through a series of checks and tests, while drives not on the list have not. Although I have shown that there is a wide variation in qual processes, NAS vendors do know where to look for problems and have better access to drive vendor technical resources than you or I. So by using an approved drive you are taking advantage of that knowledge and vetting process and improving your odds of trouble-free operation.
The main risk of using unapproved drives that all vendors pointed to is over the long term. Many drives can seem to work fine while they are new. But over time as they age or as volumes become more fragmented, unapproved drives can cause increased errors, slower performance and even RAID rebuilds if a drive drops out of an array. If you’re lucky, the errors will be recoverable. But if you’re not, and you don’t have a backup, your data may be gone.
The bottom line is that qualified or approved drive lists are not a guarantee of trouble-free operation. But they are a no-cost way of raising your odds of having a trouble-free NAS.