Common predecessor in a graph of commits

Asked

Viewed 169 times

5

I ran into another interesting problem during an interview. Unfortunately I’m also not sure I found a great (or good enough) algorithm. I thought to share to see if anyone knows a classic solution or a more efficient variation of my implementation.

Context

One way to represent a source code repository (e.g., Git and Mercurial) is through a graph of commits:

Examples

  1. Graph with commits linear

    C1-C2-C3-C4
    
  2. Graph with only one branch

           C5-C6
          /
     C1-C2-C3-C4
    
  3. Graph with a branch that has been merged again (some better translation for merged?) on the main line (C7 has two direct predecessors: C4and C6)

           C5-C6
          /     \
     C1-C2-C3-C4-C7
    

The problem

Implement a function that finds the most recent common predecessor of any two commits graph. The Graph of commits is represented by a String[] called commits containing all the ids of the commits sorted by date (from the latest to the oldest) and a String[][]called relatives.

The ids of predecessors for a commit in position i can be found in parentes[i]. The last position in the array relatives is always null since it represents the predecessors of the commit root. For example, the graph of example 3 is represented by:

String[] commits = {"C7", "C6", "C5", "C4", "C3", "C2", "C1"};
String[][] parentes ={{"C6","C4"}, {"C5"}, {"C2"}, {"C3"}, {"C2"}, {"C1"}, null}; 

If one of commits is the predecessor of the other, the same commit is the common predecessor.

Method signature

String predecessorComum(String[] commits, String[][] parentes, String commit1,
        String commit2);

Example:

The call:

 String predecessor = predecessorComum(commits, parentes, "C4", "C6");

Returns C2

Complexity

It is possible to write a solution O(n) to this problem.


A first solution

My solution attempt started with classic LCA algorithms (see that question of utluiz), Unfortunately linear-time solutions proved complex and not very easy to implement using list of predecessors.

I then implemented the following steps (in my view, not great).

  1. Index ID map (complexity O(n)) so you can efficiently navigate the arrays:

    final Map<String, Integer> idParaPosicao = new HashMap<String, Integer>();
    for (int i = 0; i < commits.length; i++) {
        final String id = commitHashes[i];
        idParaPosicao .put(id, i);
    }
    
  2. Recursive auxiliary method for stacking all predecessors of a commit to the root, including itself (complexity O(p + n)):

    private Stack<Integer> predecessores(Stack<Integer> visitados, int id, 
            String[][] parentes, Map<String, Integer> idParaPosicao) {
        visitados.push(id);
        final String[] parentesD = parentes[id];
        if (parentesD != null) {
            for (String parente : parentesD) {
                predecessores(visitados, idParaPosicao.get(parente), parentes, hashToIndex);
            }
        }
    
        return visitados;
    }
    
  3. Mount the battery for both commits:

    final Stack<Integer> predsC1 = predecessores(new Stack<Integer>(), indiceC1, parentes, 
            idParaPosicao);
    final Stack<Integer> predsC2 = predecessores(new Stack<Integer>(), indiceC2, parentes, 
            idParaPosicao);
    
  4. Pop the predecessors from the root as long as they are equal. The latter commit is the most recent predecessor (complexity O(n)):

    int lca = -1;
    
    while (!predsC1.isEmpty() && !predsC2.isEmpty()) {
        final int a1 = predsC1.pop();
        final int a2 = predsC2.pop();
        if (a1 == a2) {
            lca = a1;
        } else {
            break;
        }
    }
    
  5. Return the commit:

    return commits[lca]; 
    

Self-criticism:

  1. While I think this is a timely solution O(n + p), it has a strong constant in front, most likely better algorithms already exist.
  2. Working: I have no mathematical proof or even confidence that this algorithm works in all cases (it worked for the base case and some other variations).
  3. Spatial complexity: I used a map and two stacks (in addition to recursion stacks). I’m sure this can be improved (in particular I believe the map should be unnecessary).
  4. Recursion: The auxiliary method is nothing more than a recursive preorder search. As Java has no tail optimization (I rewrote this code in Scala and the difference of performance for larger trees was absurd) any larger graph will result in stack overflow. I couldn’t implement an iterative solution.
  5. Finally I think the chosen data structure (graph represented by a list of ordered predecessors of the commit newer to older) must have properties that simplify the problem. I found, for example, several intelligent algorithms to find Lcas in binary search trees.
  • The parent array is right?

  • 1

    You are Jorge. Read in reverse: The country of "C7" sane {"C6","C4"}, the father of "C6"is "C5" and so on ("C1" no father).

1 answer

4


I think your solution is not so great. The biggest issue is that it is not necessary to assemble two batteries of products.

I made an alternative solution just a little more optimized (in theory, since in a benchmark your can even be faster).

public class GraphPredecessor {

    static String predecessorComum(String[] commits, String[][] parentes, String commit1,
            String commit2) {

        //validação
        if (commit1 == null || commit2 == null || parentes == null || commits == null
                || commits.length != parentes.length) {
            return null;
        }

        //caso ótimo
        if (commit1.equals(commit2)) {
            return commit1;
        }

        //mapa (commit, parents[])
        Map<String, String[]> parentMap = new HashMap<String, String[]>();
        for (int i = 0; i < commits.length; i++) {
            if (parentes[i] != null) {
                parentMap .put(commits[i], parentes[i]);
            }
        }

        //adds all parents of commit1 into a Set
        Set<String> commit1Parents = new HashSet<String>();

        //iterate over parents without recursion
        LinkedList<String> q = new LinkedList<String>();
        q.push(commit1);
        commit1Parents.add(commit1);
        while (!q.isEmpty()) {
            String s = q.pop();
            String[] parentsArray = parentMap.get(s);
            if (parentsArray != null) {
                for (String p : parentsArray) {
                    //for each parent, if it's commit2, then return it!
                    if (p.equals(commit2)) {
                        return p;
                    } else {
                        //otherwise just push the node for the next loop
                        q.push(p);
                        //and adds to parent list
                        commit1Parents.add(p);
                    }
                }
            }
        }

        //finds the first commit2 parent in commit1Parents
        q.push(commit2);
        while (!q.isEmpty()) {

            String[] parentsArray = parentMap.get(q.pop());
            if (parentsArray != null) {
                for (String p : parentsArray) {
                    if (commit1Parents.contains(p)) {
                        return p;
                    }
                    q.push(p);
                }
            }

        }

            //no luck
        return null;

    }

    public static void displayResult(String expected, String returned) {
        System.out.println("---------- TEST ----------");
        System.out.println("Exptected " + expected);
        System.out.println("Result    " + returned);
        System.out.println("Status    " + (expected.equals(returned)?"OK":"ERROR"));
    }

    public static void main(String[] args) {
        String[] commits = {"C7", "C6", "C5", "C4", "C3", "C2", "C1"};
        String[][] parentes ={{"C6","C4"}, {"C5"}, {"C2"}, {"C3"}, {"C2"}, {"C1"}, null};

        displayResult("C2", predecessorComum(commits, parentes, "C4", "C6"));
        displayResult("C2", predecessorComum(commits, parentes, "C6", "C4"));
        displayResult("C1", predecessorComum(commits, parentes, "C1", "C7"));
        displayResult("C1", predecessorComum(commits, parentes, "C7", "C1"));
        displayResult("C7", predecessorComum(commits, parentes, "C7", "C7"));
        displayResult("C6", predecessorComum(commits, parentes, "C6", "C7"));
        displayResult("C6", predecessorComum(commits, parentes, "C7", "C6"));
        displayResult("C2", predecessorComum(commits, parentes, "C2", "C7"));
        displayResult("C2", predecessorComum(commits, parentes, "C7", "C2"));
        displayResult("C2", predecessorComum(commits, parentes, "C5", "C3"));
        displayResult("C2", predecessorComum(commits, parentes, "C3", "C5"));

    }

}

This algorithm basically does the following:

  1. Puts all commit1 parents in one set.
  2. Keep looking at all the commit2 parents until you find the first one that’s in the set.

Exemplifying with the question data:

      C5-C6
     /     \
C1-C2-C3-C4-C7

When searching for C4 and C6, the algorithm will first assemble a set with the children of C4. Namely:

C1, C2, C3, C4

Then he iterates on the parents of C6:

  • C5 is in the set? No, goes to the next.
  • C2 is in the set? Yes, so returns C2.

Note that in this algorithm it will never be necessary to read more than N nodes (except maybe in the last comparison), because when finding a node that has already been processed, the same will be the result of the algorithm.

  • 1

    Utluiz, always you to the rescue. Thank you! Really much better (I didn’t have the purpose of using a map of us for relatives), and the solution to find relatives was so simple, I can’t believe I didn’t think of it hehehe.

  • About benchmarks I won’t toast you this time no, believe you hehehe. And forgive me for posting a comment off topic but I’m still interested in your numbers for the period issue (I got to run the algorithm on a Windows machine and really got the same result as you, the module operation is expensive... You will understand).

  • @Anthonyaccioly I’ll see if tomorrow I turn the benchmark. ;)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.