Data Deduplication
Avamar differs from traditional backup and restore solutions by identifying and storing only unique, sub-file data objects. Redundant data is identified at the source, drastically reducing the amount of backup data that travels across the network to be stored and managed by the backup host. When storing data objects, Avamar takes maximum advantage of inherent hard-disk characteristics. Avamar also creates and stores “trees” that link all data objects from a single backup. These “trees” are used to re-create files for restore.
Data deduplication, or single instance storage, reduces storage needs by identifying duplicate or redundant data. Only unique data is then stored on the storage media. The level at which data deduplication is employed determines the granularity of deduplication. Three levels of data deduplication are:
File level deduplication helps organizations reduce storage needs for file servers by identifying duplicate files within hard disk volumes and providing an efficient mechanism for consolidating them. The most common implementation of single instance storage is at the file level. With this method, a single change in a file results in the entire file being identified as unique. As shown in the example, if there were 5 versions of a file in a backup environment, the 5 files in their entirety are stored.
Fixed block deduplication, also called fixed length deduplication, is commonly employed in snapshot and replication technologies. This method breaks a file into fixed length sub-objects. However, even with small changes to the data, all fixed length segments in a dataset can change despite the fact that very little of the dataset has actually changed.
Variable block level deduplication uses an intelligent method of determining segment size that looks at the data itself to determine repeatable boundary points. Variable block level deduplication yields a greater granularity in identifying duplicate data, eliminating the inefficiencies of file level and fixed block level deduplication. With variable block level deduplication, a change in a file results in only the variable-sized block containing the change being identified as unique. Consequently, more data is identified as common data, and in the case of backup, there is less data to store as only the unique data is backed up. This is the method used by Avamar.