Client-Side Deduplication
Deduplication is an approach that implies multiple usage of the same data parts in various processes.
The backup format uses client-side deduplication. This approach brings the following benefits:
- Client-side deduplication is much faster compared to a server deduplication
- The absence of internet connection issues
- An internet traffic decrease
- Ability of unnecessary data purge
- A server deduplication database constantly grows, and this can cause a significant expense increase. Client-side deduplication uses local capacities only.
How It Works
Regardless of a backup type, the first backup is always a full backup. Bringing a routine to a backup job, a backup implies data updates, thus next backup jobs are usually incremental and depend on full backup and previous incremental backups as well.
The new backup format reckons for a full backup plan independence, so each separate backup plan has its own deduplication database. Moreover, backup plan generations also have their own deduplication databases.
Once a backup plan is run, the application reads backup data in batches aliquot to block size. Once a block is read, it is compared with deduplication database records. If a block is not found, it is delivered to storage and is assigned with a block ID, which becomes a new deduplication database record. The block scanning continues, and if a block matches any of the deduplication database records, a block with such ID is excluded from a backup plan.
This approach significantly decreases a backup size, especially in virtual environments with a large number of identical blocks.
Once a deduplication database is deleted or corrupted, a full backup type is always initiated
For image-based backup type, the approach is slightly different. Instead of cluster reading, Backup for Windows reads a Master File Table (MFT) and checks which files have been modified. This decreases source data reading exponentially.