Diskeeper and De-Duplication

Data deduplication can generally operate at the file or block level. File deduplication eliminates duplicate files (as in the example above), but this is not a very efficient means of deduplication. Block deduplication looks within a file and saves unique iterations of each block. Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process generates a unique number for each piece which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or presentation are changed, only the changed blocks are saved; the changes don't constitute an entirely new file. This behavior makes block deduplication far more efficient. However, block deduplication takes more processing power and uses a much larger index to track the individual pieces.

With this in mind, because the De-Duplication can act upon changes at the block level, there are some things that can cause it to work more often or increase the number of stubs. When defragmentation enters the picture, the sheer number of changes can set off the workload for De-Duplication. So, it's not necessarily a good idea to defragment the environment, not because it wouldn't be beneficial to speed things up but due to this problem.

All defragmenters can suffer this fate however Diskeeper DOES offer a solution that others do not. Our IntelliWrite feature can actually prevent upwards of 85% of new fragments from arriving in the first place and therefore, since we don't need to change things on files AFTER they've been recorded to the File System, IntelliWrite won't invite the same problem.