public final class DiffUtil extends Object
Modifier and Type | Method and Description |
---|---|
static double |
diceCoefficient(String first,
String second)
Computes the dice coefficient between the two given String's bigrams.
|
static int |
findInsertionIndex(Comparison comparison,
Diff diff,
boolean rightToLeft)
This is the main entry point for
findInsertionIndex(Comparison, Iterable, List, List, Object) . |
static <E> int |
findInsertionIndex(Comparison comparison,
Iterable<E> ignoredElements,
List<E> source,
List<E> target,
E newElement)
This will try and determine the index at which a given element from the
source list should be
inserted in the target list. |
static <E> int |
findInsertionIndex(Comparison comparison,
List<E> source,
List<E> target,
E newElement)
This will try and determine the index at which a given element from the
source list should be
inserted in the target list. |
static Set<Diff> |
getAllAtomicRefiningDiffs(Diff diff)
The set of all the atomic diffs that refine the given diff or one of its refining diffs, recursively.
|
static Set<Diff> |
getAllRefiningDiffs(Diff diff)
The set of all the diffs (atomic or not) that refine the given diff or one of its refining diffs,
recursively.
|
static List<Diff> |
getRootRefinedDiffs(Diff diff)
Determines the root refined diff of a refining diff, i.e. a refined diff that is not refining another
diff.
|
static boolean |
isContainmentReference(EStructuralFeature feature)
|
static <E> List<E> |
longestCommonSubsequence(Comparison comparison,
Iterable<E> ignoredElements,
List<E> sequence1,
List<E> sequence2)
This will compute the longest common subsequence between the two given Lists, ignoring any object that
is included in
ignoredElements . |
static <E> List<E> |
longestCommonSubsequence(Comparison comparison,
List<E> sequence1,
List<E> sequence2)
This will compute the longest common subsequence between the two given Lists.
|
public static Set<Diff> getAllRefiningDiffs(Diff diff)
diff
- The diff for which all the refining diffs are seekedpublic static List<Diff> getRootRefinedDiffs(Diff diff)
diff
- The diff for which the root refined diffs are to be determinednull
.
Empty if the provided diff does not refine any diff.public static Set<Diff> getAllAtomicRefiningDiffs(Diff diff)
diff
- The diff for which all the refining diffs are seekedpublic static double diceCoefficient(String first, String second)
This implementation is case sensitive.
Note that this implementation handles two- (or less) character-long strings specifically,
degrading into a "char-by-char" comparison instead of using the bigram unit of the dice coefficient. We
want the similarity between "v1"
and "v2"
to be 0.5
and not
0
. However, we also want "v1"
and "v2"
to be "more similar" to
each other than "v"
and "v2"
and "v1"
and "v11"
to
be "more similar" than "v"
and "v11"
while this latter also needs to be "less
similar" than "v1"
and "v2"
. This requires a slightly different handling for
comparisons with a "single character"-long string than for "two character"-long ones. A set of
invariants we wish to meet can be found in the unit tests.
first
- First of the two Strings to compare.second
- Second of the two Strings to compare.public static <E> List<E> longestCommonSubsequence(Comparison comparison, Iterable<E> ignoredElements, List<E> sequence1, List<E> sequence2)
ignoredElements
. We will use
IEqualityHelper.matchingValues(Object, Object)
in order to try and match the values from both
lists two-by-two. This can thus be used both for reference values or attribute values. If there are two
subsequences of the same "longest" length, the first (according to the second argument) will be
returned.
Take note that this might be slower than longestCommonSubsequence(Comparison, List, List)
and
should only be used when elements should be removed from the potential LCS. This is mainly aimed at
merge operations during three-way comparisons as some objects might be in conflict and thus shifting
the computed insertion indices.
Please see longestCommonSubsequence(Comparison, List, List)
for a more complete description.
E
- Type of the sequences content.comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.ignoredElements
- Specifies elements that should be excluded from the subsequences.sequence1
- First of the two sequences to consider.sequence2
- Second of the two sequences to consider.longestCommonSubsequence(Comparison, List, List).
public static <E> List<E> longestCommonSubsequence(Comparison comparison, List<E> sequence1, List<E> sequence2)
IEqualityHelper.matchingValues(Object, Object)
in order to try and match the values from both
lists two-by-two. This can thus be used both for reference values or attribute values. If there are two
subsequences of the same "longest" length, the first (according to the second argument) will be
returned.
For example, it the two given sequence are, in this order, {"a", "b", "c", "d", "e"}
and
{"c", "z", "d", "a", "b"}
, there are two "longest" subsequences : {"a", "b"}
and {"c", "d"}
. The first of those two subsequences in the second list is
{"c", "d"}
. On the other hand, the LCS of {"a", "b", "c", "d", "e"}
and
{"y", "c", "d", "e", "b"}
is {"c", "d", "e"}
.
The following algorithm has been inferred from the wikipedia article on the Longest Common Subsequence, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem at the time of writing. It is decomposed in two : we first compute the LCS matrix, then we backtrack through the input to determine the LCS. Evaluation will be shortcut after the first part if the LCS is one of the two input sequences.
Note : we are not using Iterables as input in order to make use of the random access cost of
ArrayLists. This might also be converted to directly use arrays. This implementation will not play well
with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its
contract.
E
- Type of the sequences content.comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.sequence1
- First of the two sequences to consider.sequence2
- Second of the two sequences to consider.public static <E> int findInsertionIndex(Comparison comparison, Iterable<E> ignoredElements, List<E> source, List<E> target, E newElement)
source
list should be
inserted in the target
list. We expect newElement
to be an element from the
source
or to have a Match that allows us to map it to one of the source
list's
elements.
The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between
the two given lists, ignoring all elements from that LCS that have changed between the target list and
the common origin of the two. If there are more than one "longest" subsequence between the two lists,
the insertion index will be relative to the first that comes in the target
list.
Note : we are not using Iterables as input in order to make use of the random access cost of
ArrayLists. This might also be converted to directly use arrays. This implementation will not play well
with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its
contract.
E
- Type of the sequences content.comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.ignoredElements
- If there are elements from target
that should be ignored when searching for an
insertion index, set them here. Can be null
or an empty list.source
- The List from which one element has to be added to the target
list.target
- The List into which one element from source
has to be added.newElement
- The element from source
that needs to be added into target
.newElement
should be inserted in target
.longestCommonSubsequence(Comparison, List, List)
public static <E> int findInsertionIndex(Comparison comparison, List<E> source, List<E> target, E newElement)
source
list should be
inserted in the target
list. We expect newElement
to be an element from the
source
or to have a Match that allows us to map it to one of the source
list's
elements.
The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between
the two given lists. If there are more than one "longest" subsequence between the two lists, the
insertion index will be relative to the first that comes in the target
list.
For example, assume source
is {"1", "2", "4", "6", "8", "3", "0", "7", "5"}
and
target
is {"8", "1", "2", "9", "3", "4", "7"}
; I try to merge the addition of
"0"
in the right list. The returned "insertion index" will be 5
: just after
"3"
. There are two subsequence of the same "longest" length 4 :
{"1", "2", "3", "7"}
and {"1", "2", "4", "7"}
. However, the first of those
two in target
is {"1", "2", "3", "7"}
. The closest element before "0"
in
this LCS in source
is "3"
.
Note : we are not using Iterables as input in order to make use of the random access cost of
ArrayLists. This might also be converted to directly use arrays. This implementation will not play well
with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its
contract.
E
- Type of the sequences content.comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.source
- The List from which one element has to be added to the target
list.target
- The List into which one element from source
has to be added.newElement
- The element from source
that needs to be added into target
.newElement
should be inserted in target
.longestCommonSubsequence(Comparison, List, List)
public static int findInsertionIndex(Comparison comparison, Diff diff, boolean rightToLeft)
findInsertionIndex(Comparison, Iterable, List, List, Object)
.
It will use default algorithms to determine the source and target lists as well as the list of elements
that should be ignored when computing the insertion index.comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.diff
- The diff which merging will trigger the need for an insertion index in its target list.rightToLeft
- true
if the merging will be done into the left list, so that we should consider the
right model as the source and the left as the target.diff
's value should be inserted into the 'target' list, as
inferred from rightToLeft
.findInsertionIndex(Comparison, Iterable, List, List, Object)
public static boolean isContainmentReference(EStructuralFeature feature)
feature
- The feature to check.true
if the feature
is a containment reference, false
otherwise.
Copyright (c) 2006, 2015 Obeo and others. All rights reserved.