Client-Side Deduplication

Deduplication is an approach that involves multiple usage of the same data parts in various processes.

The backup format uses client-side deduplication. This approach brings the following benefits:

  • Client-side deduplication is much faster compared to a server deduplication
  • The absence of internet connection issues
  • An internet traffic decrease
  • Ability of unnecessary data purge
  • A server deduplication database constantly grows, and this can cause a significant expense increase. Client-side deduplication uses local capacities only.

How It Works

Regardless of a backup type, the first backup is always a full backup. Bringing a routine to a backup, a backup implies data updates, thus next backup jobs are usually incremental and depend on full backup and previous incremental backups as well.

The new backup format reckons for a full backup plan independence, so each separate backup plan has its own deduplication database. Moreover, backup plan generations also have their own deduplication databases.

Once a backup plan is run, the application reads backup data in batches aliquot to block size. Once a block is read, it is compared with deduplication database records. If a block is not found, it is delivered to storage and is assigned with a block ID, which becomes a new deduplication database record. The block scanning continues, and if a block matches any of the deduplication database records, a block with such ID is excluded from a backup plan.

This approach significantly decreases a backup size, especially in virtual environments with a large number of identical blocks.

Once a deduplication database is deleted or corrupted, a full backup type is always initiated

For image-based backup type, the approach is slightly different. Instead of cluster reading, Backup for Windows reads a Master File Table (MFT) and checks which files have been modified. This decreases source data reading exponentially.