How to implement the six degree separation theory algorithm?

Asked

Viewed 966 times

4

What is?

Theory that in the world it takes no more than six bonds of friendship for any two people to be connected.

Question:

I’ve been trying for some time to implement this algorithm, but I can’t imagine how I would make the calls, I want to do as a way of Poc, to better understand how one of the algorithms that social networks use works.

  • 2

    Is your question about how to retrieve the data required for such verification, or how to analyze existing data? (e.g.: you have a graph of people and friendship relationships, and you want to find the "longest minimum path between any vertices")

  • 1

    @mgibsonbr I thought the following: Object Person has a Person List which in case is his network of friends. My doubt is as follows: I have the Person "John", how do I get to "John" through the "Peter" who has friends in common with him? But to get to "Peter," I have to walk all over his Friends List and his Friends List and so on. That part that gave me a lock that I can’t go any further. I wanted to do the following: Pedro->Ana->Jorge->Rafael->João (What people I had to go through to find João, the degree would be easier to calculate from there).

  • That is, the doubt is in the same algorithm. See my answer. Your idea - choosing "John" and "Peter" and then calculating the distance between them - works, but is inefficient: you would have to do this for each pair (i.e. O(V^2)) and each check would have to go through all relations of friendship (O(E) - total: O(V^2 * E)). If instead you fix "João" and calculate his distance to everyone at once you can reduce it to O(V*E).

1 answer

5


This problem can not only be solved with classic algorithms from graph theory, how there are already ready implementations in several languages, including C#.

First, consider that your data mass contains people and relationships between people. Regardless of how this is represented (your comment suggests a adjacency list), that forms a graph where each person is one vertex and the relationship "so-and-so is a friend of beltrano" constitutes a edge. Since we simply want the "distance" between two people, one can consider that this graph has no "weights" (i.e. you are either friends with so-and-so or you are not) and - although this is not relevant in this case - the relationship of friendship is commutative (if A is friends with B, then B is friends with A). The result then is a undirected graph without weights.

The proposition to be tested:

It takes no more than six bonds of friendship for any two people to be connected.

in graph language is equivalent to:

For every vertex A and every vertex B, the shortest path between A and B is less than or equal to 6.

That is, testing the proposition boils down to solving the minor problem for every vertex V, and test whether all the resulting paths are less than or equal to 6. The simplest and most efficient algorithm for this type of graph is search in width.

In your particular case:

  1. Create a Person x Person matrix where every element is "not visited";
  2. For each Person P:
    1. Create a queue (FIFO) - in C#, Queue - and put P in it, with zero distance;
    2. Consider the number of people not visited NV as the total number of people;
    3. While NV > 0:
      1. If the queue is empty, stop - there are people who are not connected in any way.
      2. Remove the first person X from the queue, and check your distance to P;
      3. If X has been visited before, go back to the beginning of the loop (there is a shorter path between P and X, already discovered);
      4. Otherwise, assign the distance from P to X (and from X to P), and descend NV;
        • Here you can already test whether or not the distance is greater than 6.
      5. For every friend of X, add it in the queue, with the distance incremented in 1.

A search for "c Sharp breadth first" returns multiple results (such as that answer in Soen), but for the particularity of this test of yours, I suggest to try to implement according to the algorithm described above (perhaps using the linked code as reference) - since one can make some optimizations in relation to the more general problem (stop when all vertices have been visited, stop when the distance is greater than 6, etc.).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.