Jump to content United States - English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
HP.com home
Storage

Stop redundancy with deduplication

» 

HP StorageWorks

HP strategy

» Converged Infrastructure
» Storage Virtualization

Storage products

» Disk Storage Systems
» Tape Storage & Media
» Storage Blades
» NAS Systems
» Storage Networking
» Storage Software
» Storage Solutions
» Information Management Software
» Browse by capacity or operating system
»

HP Storage

»

HP Servers

Customer Assistance

» How to buy
» Promotions
» Services
» Storage Training
» Get Connected:
eNewsletter, Alerts & more
Content starts here
One size does not fil all
Data deduplication is a hot, new storage technology for managing explosive data growth and providing data protection. This backup technique eliminates redundant data from storage, by saving a single copy of identical data and replacing any further instances with pointers back to that one copy.

A Simple Example

Here’s a simple example: Say 500 people receive a company-wide e-mail with a 1 megabyte attachment. If each recipient saves that attachment locally, it is replicated 500 times when those desktops are backed up—consuming 499 MB more backup space than necessary.

Data deduplication backs up just one instance of the attachment's data and replaces the other 499 instances with pointers back to that copy.

The technology also works at a second level: If a change is made to the original file, then data deduplication saves only the block or blocks of data actually altered. (A block is typically tiny, somewhere between 2 kilobytes and 10 KB of data.)

If the title of our 1 MB presentation changes, then deduplication would save only the new title, usually in a 4 KB data block, with pointers back to the first iteration of the file. Thus, only 4 KB of new back up data is retained.

When used in conjunction with other methods of data reduction, such as conventional data compression, data deduplication can cut data volume even further, helping you:

  • Save money with lower disk investments
  • Improve capacity utilization
  • Rely less on tape backup
  • Recover faster after an outage

A Little Myth-busting

It might seem that squeezing more data into less space would mean there's more room to cram in new data, but that's not how data deduplication works. Because the technology uses pointers to locate repeated data, the ratio of data you can store increases with each backup.

However, adding more unique data doesn't take advantage of the space-saving pointers. (See "Assessing Data Deduplication Efficiency" for more on ratios.) Therefore, deduplication makes it possible to store more backups for a longer time in the same amount of space.

That means a faster recovery when you need an older version of data (as opposed to retrieving a tape from a remote site). But it doesn't necessarily translate into freeing up room for more unique data.

Comparing Technologies

When it comes to data deduplication, one size does not fit all. That's why it is important to consider a solution’s approach from three levels before making a decision:

  • Where does data deduplication occur? Does it occur at the source, such as server, or at the target, such as a virtual library or other storage device? A source-based approach results in less data being backed up, potentially shortening backup windows. A target-based approach is well-suited for a virtual tape device that can augment archived tape backups by keeping a longer period of backups immediately available, speeding up data retrieval.
  • When does deduplication happen? In target-based implementations, data can either be backed up first, then deduplicated (postprocess); or deduplication can be executed during the backup process (inline). Each method has pros and cons. Post-process deduplication may result in a faster backup, but inline uses less disk space, since deduplication occurs before backup data is written to disk.
  • How does deduplication happen? Object-level differencing compares yesterday backup with today’s, saving only data that has changed. Hash-based chunking locates global redundancies among all files in a backup. Some technologies even differ at the file level, a technique that has less impact on space savings.

Which Approach Is Best for My Business?

The best approach to data deduplication depends on your company size and backup needs.

For small and midsize businesses hash-based chunking, or dynamic deduplication, is a superior method because it focuses on compatibility and cost. It delivers a low-cost, small footprint in a format-independent solution.

Many vendors offer only one method or the other—object-level differencing or hash-based chunking. However, the two methods offer different strengths and weaknesses in differing environments. So, be sure to choose a company that puts a range of deduplication options at your disposal.

Learn more

» Virtual Library Systems
» Disk-to-Disk Backup
» Contact HP

>> Worry-free data protection with data deduplication
Privacy statement Limited warranty statement Using this site means you accept its terms Feedback to Storage
© 2010 Hewlett-Packard Development Company, L.P.