Playbook for Diff of Huge files

Most diff tools cannot compare Huge files.  Either they run out of memory or abort with some sort of programmed variable limited to 32 bits.  This screenshot shows Guiffy working on Huge files beyond the 32 bits, 2GB limit:

Here’s a brief list of what it takes to diff huge files:

  1. 64 bit run-time I/O: The run-time I/O package needs to be able to access files over 2GB – beyond 32 bits.
  2. diff algorithm for huge files: Normal diff algorithms which require both files to be in memory won’t work – require too much memory and are way too slow.
  3. Context view: The compare view needs the ability to display the results within context – otherwise too much memory is required and its way too slow.
  4. Programmed 64 bit variables: The diff application needs to be programmed with 64 bit variables for things like line numbers, etc.

A 64 bit OS and 64 bit application is NOT needed unless the compare view GUI components require so much memory that 32 bits (2GB) isn’t enough.  In that case, the GUI compare view components need to be a “paged” implementation otherwise the GUI application would be so slow it would be useless anyway.  A “paged” implementation requires just a “page” around the current view to be loaded in memory.  If the GUI compare view component is programmed with 32 bit variables then that becomes the limit – 2GB in each file’s compare view.

In a future post I’ll provide the tool for generating huge files such as those in the screenshot above.

Leave a Reply

Your email address will not be published.