 |
 |
 |
Data deduplication is a hot, new storage technology for managing explosive data growth and providing data protection.
This backup technique eliminates redundant data from storage, by saving a single copy of identical data and replacing any further instances with pointers back to that one copy.
|
 |
 |
 |
|
 |
 |
 |
Here’s a simple example: Say 500 people receive a company-wide e-mail with a 1 megabyte attachment. If each recipient saves that attachment locally, it is replicated 500 times when those desktops are backed up—consuming 499 MB more backup space than necessary.
Data deduplication backs up just one instance of the attachment's data and replaces the other 499 instances with pointers back to that copy.
The technology also works at a second level: If a change is made to the original file, then data deduplication saves only the block or blocks of data actually altered. (A block is typically tiny, somewhere between 2 kilobytes and 10 KB of data.)
If the title of our 1 MB presentation changes, then deduplication would save only the new title, usually in a 4 KB data block, with pointers back to the first iteration of the file. Thus, only 4 KB of new back up data is retained.
When used in conjunction with other methods of data reduction, such as conventional data compression, data deduplication can cut data volume even further, helping you:
- Save money with lower disk investments
- Improve capacity utilization
- Rely less on tape backup
- Recover faster after an outage
|
 |
 |
 |
 |
|
 |
 |
 |
When it comes to data deduplication, one size does not fit all. That's why it is important to consider a solution’s approach from three levels before making a decision:
- Where does data deduplication occur? Does it occur at the source, such as server, or at the target, such as a virtual library or other storage device? A source-based approach results in less data being backed up, potentially shortening backup windows. A target-based approach is well-suited for a virtual tape device that can augment archived tape backups by keeping a longer period of backups immediately available, speeding up data retrieval.
- When does deduplication happen? In target-based implementations, data can either be backed up first, then deduplicated (postprocess); or deduplication can be executed during the backup process (inline). Each method has pros and cons. Post-process deduplication may result in a faster backup, but inline uses less disk space, since deduplication occurs before backup data is written to disk.
- How does deduplication happen? Object-level differencing compares yesterday backup with today’s, saving only data that has changed. Hash-based chunking locates global redundancies among all files in a backup. Some technologies even differ at the file level, a technique that has less impact on space savings.
|
 |
 |
 |
 |
|
 |
 |
 |
The best approach to data deduplication depends on your company size and backup needs. |
 |
 |
 |
For small and midsize businesses hash-based chunking, or dynamic deduplication, is a superior method because it focuses on compatibility and cost. It delivers a low-cost, small footprint in a format-independent solution.
Many vendors offer only one method or the other—object-level differencing or hash-based chunking. However, the two methods offer different strengths and weaknesses in differing environments. So, be sure to choose a company that puts a range of deduplication options at your disposal.
|
 |
 |
|
 |
|