How can I optimize a recursive method for finding ancestors?

Asked

Viewed 1,546 times

10

I have a class Pessoa who owned relationships for his father and his mother (these at any time may be null).

On a particular piece of my code I need to find out if one person is an ancestor of the other.

I was able to arrive at a recursive solution more or less of the following nature:

public boolean isAncestor(Person p) {
    if (p == null)
        return false;
    if (this.father.equals(p) || this.mother.equals(p))
        return true;
    else if(isAncestor(p.father) || isAncestor(p.mother))
        return true;
    return false;
}

The problem is that these family trees can become fairly large. In addition to what this implementation is looking very much like the trivial recursive implementation of Fibonacci... In other words, it may be my impression, but I believe this is a method of nature O(2^n).

Can you think of a more efficient way to check for ancestors? Maybe there is no solution with loops + stacks or a nonexponential recursion to solve the problem?

  • In the Fibonacci sequence there are many repeating elements (ex.: X and X-1 both use X-2 in its calculation), which allows a linear time calculation. This is not the case in a family tree, where the set of ancestors may have little or no repetition.

4 answers

9


The problem is not exponential, it is linear, but it is linear in the number of people’s ancestors: there is no guarantee that John is not Mary’s ancestor without examining all of Mary’s ancestors.

You can even reduce this a little by eliminating already examined ancestors (the same person may appear more than once as an ancestor -- marriage between cousins, for example), at the expense of linear space in the number of ancestors.

The easiest is to use a queue to solve this:

import java.util.*;

public boolean isAncestor(Person p) {
  if (p == null) return false;

  final Queue<Person> notChecked = new LinkedList<Person>();
  final Set<Person> checked = new HashSet<Person>();

  if (father != null) notChecked.add(father);
  if (mother != null) notChecked.add(mother);

  while (notChecked.peek() != null) {
    final Person nextPerson = notChecked.remove();

    if (!checked.contains(nextPerson)) {
      if (nextPerson == p) return true;

      checked.add(nextPerson);
      if (nextPerson.father != null) notChecked.add(nextPerson.father);
      if (nextPerson.mother != null) notChecked.add(nextPerson.mother);
    }
  }

  return false;
}

Another possibility is to use a stack, based on ArrayList. The class ArrayList is much more efficient than the LinkedList in terms of space and speed, but the implementation gets slightly more complicated:

public boolean isAncestor(Person p) {
  if (p == null) return false;

  final List<Person> notChecked = new ArrayList<Person>();
  final Set<Person> checked = new HashSet<Person>();

  if (father != null) notChecked.add(father);
  if (mother != null) notChecked.add(mother);

  while (notChecked.size() > 0) {
    final int index = notChecked.size() - 1;
    final Person nextPerson = notChecked.remove(index);

    if (!checked.contains(nextPerson)) {
      if (nextPerson == p) return true;

      checked.add(nextPerson);
      if (nextPerson.father != null) notChecked.add(nextPerson.father);
      if (nextPerson.mother != null) notChecked.add(nextPerson.mother);
    }
  }

  return false;
}

Practically the same, but much faster.

The first version will more quickly find close ancestors -- the second version, for example, will search all the ancestors of the mother before searching for the father. The two versions consume the same effort if the person is not ancestral, and the second will be much faster and consume less memory.

  • 1

    Opa Sobral, thank you very much. Both versions are faster than mine (using micro-benchmarks). Unfortunately I don’t have the gigantic production trees for testing. You would say that in the average case the second version tends to be faster than the first?

  • 3

    @Anthonyaccioly I imagine that the second version is faster, but it prioritizes maternal or paternal ancestors, while the first version prioritizes close ancestors. If close parenthesis is much more common, it can become faster. If "false" (not ancestral) is more common, the second is much better.

5

You will hardly find an efficient solution, as the number of candidates in this case grows exponentially with each generation (i.e. one person has 2 parents, 4 grandparents, 8 great-grandparents, etc). The reverse search (starting with the ancestor and looking for their children) will potentially be even more costly, as a person can have more than 2 children. The use of dynamic programming may amortize somewhat in this case (e.g., if a person has the same ancestor on both parent and parent sides), but depending on the configuration of your dataset this may or may not make any significant difference.

You can eliminate recursion using an explicit stack, but it will still grow as much as the implicit stack created by your recursive solution. Therefore, unless you are facing memory exhaustion problems, there is no reason to change your current strategy (searching in depth through recursion).

4

One way around this problem is to use dynamic programming (or memoization).

The idea is to create a kind of "cache" of the function result, so you do not have to re-query the whole hierarchy if you have already solved the problem for the object. You can do this by creating an attribute HashMap in the class and consulting at the beginning of the method.

private Map<Person, Person> memoAncestors = new HashMap<Person, Person>();

// ....

public boolean isAncestor(Person p) {
    if (p == null)
        return false;
    if (memoAncestors.containsKey(p)) {
        return memoAncestors.get(p);
    }
    if (this.father.equals(p) || this.mother.equals(p)) {
        memoAncestors.put(p, true);
        return true;
    } else if(isAncestor(p.father) || isAncestor(p.mother)) {
        memoAncestors.put(p, true);
        return true;
    }
    return false;
}

This solution is somewhat more efficient as it will avoid several unnecessary "climbs" in the tree and does not greatly impair the readability of the code -- a good tradeoff. It’s also a known optimization for the Fibonacci problem, fits into what you’re calling nonexponential recursion.

  • I was reading a little bit about Memoisation And I guess it wouldn’t hurt to try. I believe that a Memcached-style distributed cache service can be used to prevent this cache from growing without restrictions (in addition to getting a higher level of optimization than a local cache since several people from the same family can use the application).

  • On the other hand I believe @mgibsonbr is right in the sense that, this optimization should reduce the time of two subsequent calls to the method using the same person or people from the same family, but it is not as efficient in this case as in the case of Fibonacci that has overlapping subproblems (I don’t know how to translate this).

  • @Anthonyaccioly Yeah, will depend on the amount and size of data you will have. Try to leave generic, start with an implementation in memory. Although you do not have intrinsically superimposed subproblems, you will probably search more than once for the same person -- just having a sibling in a family in the middle of the tree, which saves you several climbs.

2

This is a classic graph problem, and you can use some techniques for this, depending on the size of the problem you have. If your problem is only a few levels, then calculating the whole tree might not be a problem and your solution is good.

Again, it would be better to look at graph-related algorithms to see what is best for your situation, but let’s assume you have thousands of families with no connection to each other. For this situation, it is worth making a union of families in a data set via "Tree flatenning" (I don’t know the term in English), and check if the two objects are in the same set. As this is a quick operation, you can quickly dispose of people who are not from the same family and therefore one cannot be an ancestor of the other. So, if you determine that they are people from the same family, you can revert to your algorithm, since the tree will no longer be as big.

The page on Wikipedia about Disjoint-set data Structure can help you find a solution to your case.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.