Data Recovery Tales: RAID Is Not Backup

Photo of author

Tim Higgins

RAID, as you certainly already know, stands for Redundant Array of Independent Disks. It is a technology designed to improve redundancy and performance of a storage subsystem by combining multiple disks into a single storage unit.

Nowadays, devices utilizing RAID are quite common. Intel ICHxR-based mainboards allow for cheap RAID in desktop computers. You can also get RAID in direct-attached external storage boxes and, of course networked storage devices, i.e. NASes.

Any of these devices can be configured to provide certain degree of fault tolerance, starting with RAID 1 for a two-disk device, to RAID 5, 6, 10 and beyond as the number of drives increases.

Using " fault-tolerant" RAID, the very name of which contains the word "redundant", often brings with it the temptation to ignore backups. This is wrong. In fact, RAID doesn’t by itself provide reliable data storage, but instead primarily reduces the downtime when a disk fails.

So why isn’t RAID acceptable as a backup strategy? Here are five reasons why.

1. Human error

There are many ways that you can do something wrong and lose data, but here are two common ones:

Accidental file deletion: We have all accidentally deleted a file or two (or more). In this case, deleting a file from a RAID array is no different than deleting it from a single drive. If you really need to recover the file and don’t have a backup, you can use try data recovery software and generally use the same approach as if it were a single drive. The success rate varies with the filesystem type and overall situation, but it is nowhere near 100%.

Making a mistake when working with a RAID: This can be as simple as pulling a good disk in a failed array by accident. Other failure "opportunities" arise during resync of a failed array, RAID level migration and/or RAID expansion. The latter is particularly error-prone since it involves multiple disk swaps and resyncs. One wrong step and your data is gone.

Even with products that provide "automatic" RAID recovery, success is not guaranteed. Poor documentation and badly designed user feedback mechanisms (status and progress displays) can cause users to do the wrong thing and the wrong time and mistakenly kill the recovery process.

2. RAID controller / software failure

RAID arrays can be managed by dedicated hardware RAID controllers, RAID software or a combination of both. Both can fail. Data can be recovered, however using a backup to recover data is significantly faster.

For example, if a controller fails, you need to either purchase exactly the same controller and try to recover array in the original configuration, or to recover array parameters using special RAID recovery software. In the latter case, you need to provide storage to copy recovered array data as well.

Keep in mind that in both cases, recovery takes from several days to several weeks. To repeat: recovering from a backup is significantly faster! Although you might say "Oh, that’s all right, we will wait as long as necessary", in practice, it always turns out that the data is very important and needed right away. Once the actual problem happens, no one will be willing to wait a week.

Of course, there are cases where a malfunctioning controller scrambles data so badly that it cannot be "cured" by data recovery software.

3. Fire, flood or other calamity

Your RAID can have redundancy, hot spare disk, be protected from a controller failure, be connected to an UPS, etc.. Nevertheless, your RAID—or data on a single drive—can be destroyed by fire or other calamity. In such a case, only regular backups stored off-site can recover lost data.

We had a case where a flood did not directly affect the storage arrays, but created enough humidity in the room to force a controller to initialize the disks without a command. Unusual, perhaps, but the data was still gone.

4. Theft, hacker attack, or other offensive action

Anything can be stolen and RAID is not an exception. Especially as modern data storage devices become smaller, they become easier to steal. Modern encryption systems may prevent a thief from accessing confidential data. But encryption doesn’t help you to get your data back. As in case of fire, flood or other calamity, the single thing that helps you to recover data is a backup.

If you have ever been hacked, or even caught someone messing with your computer or NAS, you are confronted with a choice. You can go through your files one by one looking for lost or modified data. Or simply recover the data from a backup and go on with life.

In this case, it’s important to have more than one backup, or use versioned backup, in case the hack is subtle and remains undiscovered for days, if not weeks.

5. Multiple disk failures and URE

A RAID5 array protects your data against a single disk failure, while RAID 6 can withstand up to two disk failures. If the disks fail independently, the probability that the second (or third) disk fails before the RAID is restored is negligible. In real life, however, disks can have much more in common than it might seem.

Disks used in a RAID are usually the same model, often from the same manufacturing batch and sometimes even with sequential serial numbers. All these disks work under the same load and are subjected to the same environment – temperature, vibration, and power spikes. More than that, if a disk has some factory defect, as in the Seagate 7200.11 disks, the entire set is likely to develop the defect nearly simultaneously.

In a RAID 5, you can encounter the so-called URE (Unrecoverable Read Error) problem associated with noticeable probability of a read error occurring when rebuilding an array after a disk failure. However, modern drives are so reliable that the URE issue is no higher than the third in the list of cases requiring RAID recovery after human error and multiple disk failure.

Help From The Cloud

Full-blown off-site backup are always burdensome to maintain and typically you do not have a dedicated person you can task with this. Instead, you might consider using one of the many cloud backup services for your absolutely-can’t-afford-to-lose data. The primary benefit of these services is that, once set up, they are continuous and automatic.

However, like all backups, cloud backup must be periodically checked. You should be sure that the backups are actually being created, and you are really able to restore data from them. Only then will you know that your data is safe.

Elena Pakhomova does both marketing and development for data recovery software company

Related posts

Why ZFS?: Pools & Deduplication

The second article of our short series on ZFS discusses storage pools and deduplication.

New To The Charts: NAS Backup and iSCSI Performance

Due to popular demand, we have added eight new NAS Charts. You can now easily compare attached backup and iSCSI Target performance for products that support those features.

Data Recovery Tales: Get Help For VM Recovery

This data recovery case study shows how recovery of Virtual Machines can be much more difficult than you think.