Why ZFS?: Pools & Deduplication

Photo of author

Tim Higgins

Editor’s Note: Infortrend is the only manufacturer we know of using ZFS in a commercially-produced NAS. So we asked them if they would like to make the case for their selection of ZFS, especially since our testing showed the performance penalty that this filesystem is known for.

We realize that this is providing Infortrend a “bully pulpit” that could be viewed as self-promotion. So we welcome opposing fact-based opinions in the Forums or even as its own article(s).


This is part two of a series that discusses ZFS and its applications for network attached storage servers. Previously, we did a quick overview of ZFS – explaining its main strengths such as ensuring total data integrity and also its weaknesses, such as CPU & RAM requirements. In this article, we will continue discussing two specific features of ZFS – Storage Pools and Data Deduplication.

Storage Pools

Traditional file systems are built around single storage devices, with volume managers, partitioning and provisioning used to manage storage space on multiple devices. ZFS fundamentally changes this aspect by using storage pools, which are comprised of the hard drives, partitions, files and other storage devices that are connected to the system.

Within a storage pool (or "zpool" as it is also referred to) there are "vdevs" that consist of files, hard drive partitions or even the entire hard drives themselves (the latter being the most recommended option). These vdevs can be configured in many different ways before being added to the zpool, such as with non-redundancy or mirroring depending on needs. By adding vdevs , zpools can be expanded at any time. Similar to a computer’s RAM, additional vdevs become automatically available without need for further configuration.

The benefits of storage pools are that they maximize storage space, speed and availability, while removing the complexities involved in volume managers. Storage pools can also contain hot spares that compensate for failing disks. However, individual vdevs should have redundancy, since if a single vdev is damaged, then the entire vpool will be lost.


Data deduplication is a safe and efficient method to optimize storage capacity, making it one of the key features in ZFS. Before writing data, ZFS generates a checksum of the data and then deletes any duplicates with matching checksums if they exist. This prevention of duplicate data saves space and improves system performance.

ZFS’s deduplication is an inline process – occurring when the data is written and not as a potentially timewasting post-process. ZFS’s innate data integrity measures also greatly reduce the likelihood that non-duplicate data will be corrupted. Moreover, data deduplication scales with the total size of the ZFS pool.

When used in an application where there is high potential for duplicated data, such as file sharing or email servers, the space for both saving storage space and improving performances can be quite significant.

The downside to data deduplication is the same as that of ZFS itself: CPU and memory requirements. In order for efficient implementation of data deduplication, some experts recommend between 1 and 2 GB of RAM for every 1 TB of storage. One solution to ensure the high-speed operation of data deduplication is to use SSDs to store the ZFS intent log and Adaptive Replacement Cache, as previously discussed in Part 1.

References for further reading:

William Chen is Director of SMB Product Strategy for Infortrend, a Taiwan-based manufacturer of high-performance networked storage systems.

Related posts

QNAP Probably Behind Cisco’s New Small Business NASes

Updated Neither company is talking. But the evidence says QNAP is most likely the OEM for Cisco's new NSS 300 Small Business NASes.

How To Build A Cheap Petabyte Server: Take Three

The third time around designing a Petabyte-capacity storage module shows that details still count. Like using desktop instead of "enterprise" drives.

Networked Storage Charts – August 2006 Update

It's been a busy month! Check out the latest changes and additions to our popular Networked Storage charts.