5
I ran into another interesting problem during an interview. Unfortunately I’m also not sure I found a great (or good enough) algorithm. I thought to share to see if anyone knows a classic solution or a more efficient variation of my implementation.
Context
One way to represent a source code repository (e.g., Git and Mercurial) is through a graph of commits:
Examples
Graph with commits linear
C1-C2-C3-C4
Graph with only one branch
C5-C6 / C1-C2-C3-C4
Graph with a branch that has been merged again (some better translation for merged?) on the main line (
C7
has two direct predecessors:C4
andC6
)C5-C6 / \ C1-C2-C3-C4-C7
The problem
Implement a function that finds the most recent common predecessor of any two commits graph. The Graph of commits is represented by a String[]
called commits containing all the ids of the commits sorted by date (from the latest to the oldest) and a String[][]
called relatives.
The ids of predecessors for a commit in position i
can be found in parentes[i]
. The last position in the array relatives is always null
since it represents the predecessors of the commit root. For example, the graph of example 3 is represented by:
String[] commits = {"C7", "C6", "C5", "C4", "C3", "C2", "C1"};
String[][] parentes ={{"C6","C4"}, {"C5"}, {"C2"}, {"C3"}, {"C2"}, {"C1"}, null};
If one of commits is the predecessor of the other, the same commit is the common predecessor.
Method signature
String predecessorComum(String[] commits, String[][] parentes, String commit1,
String commit2);
Example:
The call:
String predecessor = predecessorComum(commits, parentes, "C4", "C6");
Returns C2
Complexity
It is possible to write a solution O(n) to this problem.
A first solution
My solution attempt started with classic LCA algorithms (see that question of utluiz), Unfortunately linear-time solutions proved complex and not very easy to implement using list of predecessors.
I then implemented the following steps (in my view, not great).
Index ID map (complexity O(n)) so you can efficiently navigate the arrays:
final Map<String, Integer> idParaPosicao = new HashMap<String, Integer>(); for (int i = 0; i < commits.length; i++) { final String id = commitHashes[i]; idParaPosicao .put(id, i); }
Recursive auxiliary method for stacking all predecessors of a
commit
to the root, including itself (complexity O(p + n)):private Stack<Integer> predecessores(Stack<Integer> visitados, int id, String[][] parentes, Map<String, Integer> idParaPosicao) { visitados.push(id); final String[] parentesD = parentes[id]; if (parentesD != null) { for (String parente : parentesD) { predecessores(visitados, idParaPosicao.get(parente), parentes, hashToIndex); } } return visitados; }
Mount the battery for both commits:
final Stack<Integer> predsC1 = predecessores(new Stack<Integer>(), indiceC1, parentes, idParaPosicao); final Stack<Integer> predsC2 = predecessores(new Stack<Integer>(), indiceC2, parentes, idParaPosicao);
Pop the predecessors from the root as long as they are equal. The latter commit is the most recent predecessor (complexity O(n)):
int lca = -1; while (!predsC1.isEmpty() && !predsC2.isEmpty()) { final int a1 = predsC1.pop(); final int a2 = predsC2.pop(); if (a1 == a2) { lca = a1; } else { break; } }
Return the commit:
return commits[lca];
Self-criticism:
- While I think this is a timely solution
O(n + p)
, it has a strong constant in front, most likely better algorithms already exist. - Working: I have no mathematical proof or even confidence that this algorithm works in all cases (it worked for the base case and some other variations).
- Spatial complexity: I used a map and two stacks (in addition to recursion stacks). I’m sure this can be improved (in particular I believe the map should be unnecessary).
- Recursion: The auxiliary method is nothing more than a recursive preorder search. As Java has no tail optimization (I rewrote this code in Scala and the difference of performance for larger trees was absurd) any larger graph will result in stack overflow. I couldn’t implement an iterative solution.
- Finally I think the chosen data structure (graph represented by a list of ordered predecessors of the commit newer to older) must have properties that simplify the problem. I found, for example, several intelligent algorithms to find Lcas in binary search trees.
The parent array is right?
– Jorge B.
You are Jorge. Read in reverse: The country of
"C7"
sane{"C6","C4"}
, the father of"C6"
is"C5"
and so on ("C1"
no father).– Anthony Accioly