Data Deduplication and Storage Spaces in Windows Server 2012

Data Deduplication and Storage Spaces are new features in Windows Server 2012 – so, what do they do?

As Microsoft continues its mantra of “do more with less” in a difficult economic climate, Windows Server 2012 introduces some significant new features and improvements to existing capabilities that will help organizations reduce costs. In this first article covering new features in Windows Server 2012, I’m going to take a closer look at two of these features: Data Deduplication and Storage Spaces.

Data Deduplication

While many enterprise-class storage area networks (SANs) have included data deduplication for some time, Windows Server 2012 now includes data deduplication out-of-the-box. This shouldn’t be confused with Single Instance Storage (SIS) found in earlier versions of Windows Server. Data Deduplication works at block level, making it a more efficient system.

Data Deduplication identifies data on a volume that exists multiple times and reduces it to a single file, creating stub points that redirect users transparently to the actual data. Windows Server 2012 scans volumes for duplicated data in the background, using minimal CPU cycles to make sure the overall server workload isn’t affected.

Enabling data deduplication in Windows Server 2012

Data Deduplication in Windows Server 2012 is compatible with NTFS volumes and cannot be enabled on ReFS (Resilient File System) volumes or Cluster Shared Volumes (CSV). Considering that a lot of data today is stored either in Exchange or SharePoint databases, the biggest gain data deduplication will bring is reducing storage costs associated with virtualization libraries (VHD files), where Microsoft quotes a 20:1 compression ratio. That’s not to say that deduplication isn’t worth considering on file servers, where there is a potential 2:1 compression ratio. Data Deduplication also integrates with BranchCache, reducing the bandwidth required to send data over wide area network (WAN) links.

Storage Spaces

Improving on software RAID capabilities found in previous versions of Windows Server, Storage Spaces allows organizations to create resilient disk arrays from just a bunch of disks (JBOD), potentially removing the need to purchase expensive hardware RAID controllers. The disks used in a storage pool can be removable USB, SATA, SCSI or Serial Attached SCSI, but remember that overall I/O will be limited by the hardware that you use. Don’t expect a storage pool that consists of a parity volume and USB 2.0 disks to perform like dedicated hardware RAID.

Windows Server 2012 allows you to optionally choose two different resilience modes: mirroring or parity. Other resilience features built into Storage Spaces include reserving disks for use as hot spares, intelligent error correction, and data integrity in the case of a power outage or cluster failover.

Thin provisioning allows system administrators to create logical storage spaces that work independently of the physical hardware pool and to allocate more capacity than is physically available. This gives organizations the flexibility to make best use of the available physical space and expand storage pools as required.