logo
  
Guiffy
Software   < The Advanced Cross-Platform Diff/Merge >  
  


Guiffy SureMerge
- A Trustworthy 3-Way Merge

Bill Ritcher
Guiffy Software
Rev. 1.0 September, 2004
Rev. 2.0 October, 2009: Hybrid Algorithm, See (4.3)
Rev. 3.0 March, 2011: "Eat a Closer" Avoidance, See (4.4)
Rev. 4.0 August, 2011: Studies find diff3 based merge tools not trustworthy, See the comments and references below
Rev. 5.0 November, 2011: Promote aware, See (4.5)

...

Abstract

SCM systems supporting concurrent development models have existed for a long time. Today, more than ever, concurrent development is crucial to productivity. Any paradigm supporting concurrent development depends upon a trustworthy 3-way merge.

Although 3-way merge tools have been available for a long time, their trustworthiness has been insufficient for a long time as well. This results in development environments with: a) concurrency forbidden, b) "gotta-do-a-merge" blues, and/or c) additional CM staff to manually review all merge results. All of these alternatives are productivity killers and feed the black hole where all theoretical ROI goes.

This paper describes an advanced 3-way merge solution. First, it discusses the many problems that have plagued 3-way merge implementations -- test case kits are provided to detect their existence. Then, it shows how solutions for these problems were designed and implemented in a working product, Guiffy SureMerge.

1. Introduction

Tools which support concurrent development are more important today than ever before. Open Source products, Offshore/Outsource development and maintenance, customer customization of packaged application's procedures and/or metadata are among the most challenging concurrent development environments. While a great deal of discussion about collaboration, XP, and Agility seems to assume concurrency is practical and productive, very little discussion has occurred regarding these other concurrent environments which are NOT collaborating. They are not collaborating in the sense that developers making changes are for the most part unaware of each other. And, another significant variance is the frequency of synchronization --- collaborating teams will tend to synchronize often while these other more challenging types of concurrent environments will tend to synchronize infrequently.

In collaborating environments, changes are sometimes shared and then improved upon. These parallel step-wise refinements can pose puzzling synchronization pattern challenges when they're "re-merged".

Any paradigm supporting concurrent development depends upon a trustworthy 3-way merge to automate the synchronization process. In collaborating environments, the frequency of synchronization and team awareness tend to minimize the concurrency problems which may occur. However, 3-way merges are used to automate the process, regardless of the Branching Model [6] environment.

3-way merges take the original version of a file and two separate versions, and merge both sets of changes back into a new file. The changes of each separate version are determined by comparing each version with the original. The 3-way merge will automatically apply all the changes (which are not overlapping) from each version. Then, if any changes were overlapping, those are brought to the attention of the merge operator for resolution.

3-way merge diagram

SCM systems such as SCCS [4] and RCS [5] (plus dozens more based upon these tools) supporting concurrent development models have existed for a long time. Although 3-way merge tools have been available for a long time, their trustworthiness has been insufficient for a long time as well. Almost every 3-way merge tool today is based on the diff3 algorithm. The research paper, "A Formal Investigation of Diff3" [2], published in 2007 found the diff3 algorithm to be surprisingly unpredictable. Another research project, "Testing Eclipse Compare's Algorithm for Three-way Merging" [1], completed in 2010 employed black box testing on the diff3 based merge tool in Eclipse and found it to be correct on only 86% of the test cases. These problems can cause development environments to forbid concurrency --- which is not always possible and counter-productive. When concurrency is supported in the environment, synchronization can become a dreadful, depressing, counter-productive process for everyone involved. In some environments the synchronization process is "managed" by the CM staff. This isolates the pain, but requires additional CM resources.

This paper describes an advanced 3-way merge solution. First, it discusses the many problems that have plagued 3-way merge implementations. Then, it shows how solutions for these problems were designed and implemented in a working product, Guiffy SureMerge. Test case kits are provided to detect the problems discussed and demonstrate the SureMerge solutions. It concludes with some observations on the benefits of a trustworthy merge tool.

2. "Conflict" Definition

3-way merges look for "conflicts" defined as changes that overlap. However, implementations vary on what is defined as an overlapping change. Some implementations do NOT treat lines inserted at the same point in both version files as a conflict. Most of the time this may not be a problem, but you would like to know and perhaps switch the order of inserted changes. On the other hand, if the inserted line(s) in both versions are defining a new indexed value (i.e., a function number, message type, or error code, etc.), both versions will be adding a new index definition with the same value. If these insertions are automatically merged, the merged code will probably build properly, but one of the new indexed items will NOT work. Because there is some central code like this in almost every product, it is important for any trustworthy 3-way merge to consider inserts as potential conflicts.

The most dangerous deficiency found in some 3-way merges results in losing one of the version's inserted changes.

The {1} Test Case Kit can be used to evaluate how a 3-way merge handles inserts at the same location.

3. Pathological Cases

The following scenarios pose challenging exceptions which are known to confuse some 3-way merges. Sometimes, the 3-way merge identifies the conflict and non-conflicts satisfactorily, but it just looks confusing. In the worst case scenario, the 3-way merge will become confused --- not identify a conflict, lose a change, or auto-merge the wrong change.

3.1. BOF/EOF Stingers

For a variety of "reasons", 3-way merges can get confused by conflicting changes at the beginning or end of the files. Some 3-way merges post-process their 3-way minimum lines of differences with heuristics in order to improve their appearance or avoid problems caused by shifting unique anchors. If they reach the beginning of the file while looking back or reach the end of file while looking forward, they can "give up".

The {2} Test Case Kit can be used to evaluate how a 3-way merge handles changes in both versions at the EOF.

3.2. Identical Twins

Most 3-way merges have been enhanced to recognize identical changes and not call them a conflict. If the tool's merge result view is in reverse, though, such changes are usually visible and can be distracting. 3-way reverse compare-merge result views may seem easier to understand in simple cases. However, these reverse views (displaying all changes by both versions to the original) become so confusing for complicated change patterns that users often quit the merge tool and do the job by hand.

wrong way 3-way diff view Using a 3-way compare reverse view for merging is like driving down an Exit ramp in reverse.

The {3} Test Case Kit can be used to evaluate how a 3-way merge handles identical changes and how they appear.

3.3. X-Tuplets

This change pattern usually started as identical twin changes shared in both versions. Then, some further revisions are made in one or both versions. As a result, the identical change block contains a (subset) change. Furthermore, it may even appear to be two or more subset changes within the area of the original identical change when comparing version 1 with version 2.

The {4} Test Case Kit can be used to evaluate how a 3-way merge handles such change patterns.

3.4. Hungry Blobs

This change pattern includes separate changes in each version which are close or adjacent to each other, resulting in one combined SureMerge "Attention". Such changes will usually not be detected as a "conflict" in other 3-way merges and will be auto-merged.

The {5} Test Case Kit can be used to evaluate how a 3-way merge handles such change patterns.

4. SureMerge "Attentions" --- Beyond "Conflicts"

Guiffy's SureMerge considers changes of any type (NOT just lines changed). Likewise, SureMerge's "Attention Focus" goes beyond looking for conflicts and catches changes which touch one another (but don't overlap) or changes which are in close proximity to one another.

4.1. Minimum Blocks of Difference

Guiffy's SureMerge is based upon a special proprietary algorithm designed for the problems inherent in smart merging. Other 3-way merges just apply the published compare algorithm [3] which they use 3 ways. Applying these "unique anchor" minimum lines of difference algorithms 3 ways is the fundamental cause of most of the pathological problems discussed above. Indeed, the authors of those algorithms have acknowledged that the "the 3-way merge is problematic and was not considered when designing the compare algorithm". In other words, it will not work sometimes, and any post-processing attempts will be "problematic" (not work in some cases).

4.2. 3-Way Auto Focus

SureMerge's Minimum Blocks of Difference algorithm tends to express the differences in fewer change blocks (by treating separate changes close to each other as one change block). This naturally results in expanded Attention Focus areas depending upon the size of the change. Moreover, a user control option is provided to expand the Attention Focus (if you want to be extra sure).

This screen shows how a typical 3-way merge would auto-merge two changes (one from each version) without seeing a "conflict".
merge no conflict

And, this screen shows how SureMerge's Auto Focus would identify the changes as an "Attention". Just one click of a button would keep the first two lines from the first version or the last two lines from the second version.
merge conflict

4.3. Hybrid Algorithm

As discussed in (4.1) above the SureMerge Minimum Blocks of Difference algorithm avoids the problems of other 3-way auto merge implementations. But, in some worse case scenarios, the Minimum Blocks of Difference algorithm would detect sequences it knew were confusing and avoid doing any damage by alerting the user that the merge would need to be done manually. That's better than blindly and silently producing a corrupt merge result (which is what can happen with other 3-way auto merge tools in these cases). But, "good enough" is NOT the idea behind SureMerge.

In 2008 we researched and developed an improved "Hybrid algorithm" which produces exceptional merge results in these worse case scenarios. The "Hybrid algorithm" was originally released as part of Guiffy SureMerge 8.4 in March, 2008. The implementation was further refined as part of Guiffy SureMerge 8.6 in March, 2009. The "Hybrid algorithm" continues to be driven by SureMerge's original "Minimum Blocks of Difference algorithm" and continues to include advanced self-checking techniques to detect and avoid making an auto-merge mistake (corruption). The "Hybrid algorithm" looks for unique anchors during the 3-way auto merge synchronization. The algorithm applies some simple rules such as unique anchors can not be part of a change. This simple idea implemented within the Minimum Blocks of Difference algorithm is what we came to call the "Hybrid algorithm".

Just how exceptional are SureMerge's Hybrid algorithm results? {6} Test Case Kit is a "real world", example of a 3-way merge involving what we call "shifted unique anchors". This "real world" example is from one of Guiffy Software's customers. At the customer's request the company and product names were changed to protect their privacy. This example involves XML project files generated by VisualStudio. The same issue could arise with any IDE or tool that keeps its information in XML files. The same issue could arise in non-XML files, in your code. Anytime changes are made which move unique anchors (lines unique in both files being compared), these auto merge mistakes can become an issue. This example should be completely auto merged (no conflicts). You'll see SureMerge does it right while most other 3-way merge tools will report a confusing conflict and/or make an auto merge (corruption) mistake.

4.4. "Eat a Closer" Avoidance

As discussed in (2) above, its important to treat inserts at the same point in the parent file as a "conflict". Sometimes the inserted changes in each version have a similar structure and the last line of each inserted change may match.

In 2010 we researched and developed an improvement for SureMerge's algorithm which detects these matching lines at the end of Attention inserts and avoids treating them as matching. The improved SureMerge algorithm was released as part of Guiffy SureMerge 9.4 in March, 2011.

What do these "Eat a Closer" merge cases look like? {7} Test Case Kit is an example of a 3-way merge involving what we call "Eat a Closer". This example was made using Guiffy's Release Notes html document. The matching closer lines contain </LI>. The same issue could arise in any type of structured XML files. And the same issue could arise in non-XML files, in your procedure code (with a last line "}" to close a function). You'll see SureMerge does it right while some other 3-way merge tools will show the conflict in a way which makes it too easy to loose one of the closer lines in the resolved merge result. With SureMerge one click on "Keep Both" and you're done.

4.5. Promote aware

As discussed in (2) above, its important to treat inserts at the same point in the parent file as a "conflict". And sometimes as discussed in (3.2) above the inserted changes in each version are identical. This occurs often when doing what is called a "promote" merge.

The following scenario describes another case which happens often in a Promote merge: Branch A(1st file) is being "promoted" to Branch B(2nd file). Branch B is a child of (originally based on) Branch A. Originally Branch A (and Branch B) inserted 2 lines of code after line 26. Branch A is updated adding another line to the lines previously added after line 26. Without awareness that the merge being performed is a "promote" such a change is treated as a "conflict".

In 2011 we researched and developed an improvement for SureMerge's algorithm as an option which detects these additional lines in an inserted change and automtically merges them. The SureMerge "Promote" option was released as part of Guiffy SureMerge 9.6 in November, 2011.

What do these "promote" merge cases look like? {8} Test Case Kit is a "real world", example of a 3-way merge involving what we call "promote". This "real world" example is from one of Guiffy Software's customers. This example involves header files for a Bug Reporting application. You'll see with SureMerge's "promote" option the merge is completed without any Attentions (conflicts), while 3-way merge tools without a promote option will treat the change as a conflict and require the user to manually resolve it.

5. Conclusion

The problems discussed in this paper have plagued 3-way merge tools for many years resulting in the general warning from experienced veterans to "be careful when you do that merge". To avoid these hazards, some folks resort to forbidding concurrent development completely or declare temporary periods of "code freezes", which can last for weeks or months while CM/QA/Development do the release "rain dance". In other environments concurrency is allowed, but everyone faces the dreaded merge process. In larger organizations, additional CM staff are sometimes tasked with performing and/or manually reviewing all the merges.

Guiffy SureMerge avoids all these hazards and provides a trustworthy 3-way merge capability. It can make the difference between forbidding or embracing concurrent development. It can eliminate the "dreaded merge blues".    It can shorten the time required for CM to package bug fix builds or synch up a new release.

In today's world, "merge projects" to upgrade customizations with a new Open Source or packaged product releases can take a team weeks to complete without a trustworthy 3-way merge. Teams using Guiffy SureMerge have completed merge projects in half the time (sometimes less).

6. Test Case Kits

Each test case has 3 files: the two versions and the parent (common ancestor or original). The files are named with extensions .1st, .2nd, and .parent. To aquaint yourself with each test case, begin by comparing the parent and 1st, then compare the parent and 2nd. Then, perform the 3-way merge.

{1} Conflict Definition and Inserts:  conins.zip
This kit includes two test cases for lines inserted in each version at the the same point. The FlownetController.cpp files are an example from the real world of two concurrent changes adding different code at the same locations. The ErrorMsg.java files are an example of two concurrent changes adding error message types with the same value.

{2} BOF/EOF Stingers:  eofsting.zip
This test case includes changes at the Beginning and End of File (BOF/EOF) in each version. In both the 1st and 2nd Misc.java.versions code is added at the beginning and end of the file.

{3} Identical Twins:  idtwins.zip
This test case includes identical changes in each version. In both the 1st and 2nd About.java versions line 7 in the parent was changed and a line was added after that. These two lines are identical in 1st and 2nd. The 2nd version has an additional change --- it //ed several lines near the end.

{4} X-Tuplets:  tuplets.zip
This test case includes a similar change in each version that was shared to begin with, then slightly modified, resulting in 2 conflict/Attentions. In both the 1st and 2nd getCLIArgs.java versions a change block added the code for processing the -fb and -fe arguments. Then, in the 2nd version, improvements are made within that code. Other changes are made in both versions to be realistic and provide addtional test validation points.

{5} Hungry Blobs:  blobs.zip
This test case includes separate changes in each version which are adjacent to each other, resulting in one combined "Attention". In the RearViewMirror 1st version line 7 is changed to add an if condition. In the RearViewMirror 2nd version line 6 is changed to add the sane condition in an exsiting if statement. SureMerge's Auto Focus identifies the changes as an "Attention". Just one click of a button would keep the 1st or 2nd version.

{6} Unique Anchor Moves:  moves.zip
This "real world" test case includes changes made in the 2nd version that move unique anchors (lines unique in all 3 versions). Compare each version to the Parent to become familiar with the changes made by that version. The test case should be completely auto merged (no conflicts). Save the merge result from Guiffy SureMerge. Compare the Parent with the Saved Merge Result to verify manually the correctness of the auto merge result. Try this test case with other merge tools. Most other 3-way merge tools will report a confusing conflict and/or make an auto merge (corruption) mistake. As a final step of your test, be sure to compare the merge result from other merge tools with the saved merge result from SureMerge to find any mistakes.

{7} Avoiding "Eat a Closer":  close.zip
This test case example was made using Guiffy's Release Notes html file. Version 9_4_1 inserts 4 lines after line 48 in the parent 9_4. Version 9_4_2 inserts 2 lines at the same point in the parent. The last line inserted by both versions (a line with </LI>) match. The test case should cause a "conflict". The "conflict" should be shown in a way so that the user can easily resolve the conflict, keep both inserts, with each insert including its last line. Is there a simple, one click button or shortcut keystroke which will do that (like SureMerge's "Keep Both")? Is there a simple, one click button or shortcut keystroke for reversing which version is placed first in the merge result (like SureMerge's Edit->Flip)?

{8} Promote aware:  promote.zip
This "real world" promote test case includes header files for a Bug Reporting application. Branch A(1st file) is being "promoted" to Branch B(2nd file). Branch B is a child of (originally based on) Branch A. Originally Branch A (and Branch B) inserted 2 lines of code after line 26. Branch A is updated adding another line to the lines previously added after line 26. Without selecting the promote option the change is treated as an Attention (conflict). Now, select Options -> Compare/Merge... Promote merge and rerun the merge - the merge is automatically completed without any Attentions (conflicts).

7. Bibliography

[1] Martin Fagereng Johansen, INF9290 Project Report: Testing Eclipse Compare's Algorithm for Three-way Merging. published online, Testing Eclipse Compare's Algorithm for Three-way Merging , June 2010.

[2] Sanjeev Khanna, Keshav Kunal, and Benjamin C. Pierce, A Formal Investigation of Diff3. Foundations of Software Technology and Theoretical Computer Science (FSTTCS) and published online, A Formal Investigation of Diff3. , December 2007.

[3] Eugene W. Meyers, An O(ND) Difference Algorithm and its Variations. Algorithmica, Vol. 1 No. 2, 1986.

[4] Mark Rochkind, The Source Code Control System. IEEE Transactions on Software Engineering, Vol. SE-1, December 1975.

[5] Walter Tichy, RCS -- A System for Version Control. Software -- Practice and Experience, July 1985.

[6] Chuck Walrad and Darrel Strom, The Importance of Branching Models in SCM. IEEE Computing Practices, Vol. 35, September 2002.





   Copyright 2014 Guiffy Software, Inc. All rights reserved.