I’m guessing this is some kind of social network, right? I can’t see how viable algorithms can compare users one to one. It would be very inefficient to perform this on a Rigger and too "time consuming" in a background process (how many minutes should the user wait to see the result?).
I believe that the ideal solution is a type of pre-classification that, applied to a given user, returns a value that can then be compared to other users.
I’ll try to illustrate that.
First, let us take as a basis the height. Imagine that we want to approach people with similar heights. We can establish ranges of heights, for example, range 1
for persons considered "low", 2 for "medium" and 3 for "high".
To the eye color, we could have the value 1
for bright eyes, 2
for brown and 3
dark.
To marital status, smoke, drinker and some other attributes that may have only two states, one can adopt the values 1
and 2
.
After this classification, one can think of an algorithm that, from a set of interests, returns the most suitable profiles.
The simplest could be a query that compares each attribute and returns first those with more similarities.
The query example below sorts the profiles by similarity, whose value is calculated by adding 1
for each common attribute, that is, the more common attributes, the higher the value of the column:
select c.*,
(
case when faixa_altura = :faixa_altura_interesse then 1 else 0 end +
case when tipos_olhos = :tipo_olhos_interesse then 1 else 0 end +
case when bebo = :bebo_interesse then 1 else 0 end +
(...)
) semelhanca
from caracteristicas c
order by semelhanca desc
Another numerical way to do this (which I found well explained in this SOEN question) is to consider all these features as multidimensional axes as in a Cartesian chart. Then, we find the similarity between interests and profiles through the "location" of the profile.
Consider the image below (source):
The characteristics of each profile are represented by a point, right? So, to find similar interests, just recover the points closer to the interest.
The difference is that in your case the graph would have N
dimensions, being N
the number of attributes.
Another factor to consider is to put weights in these characteristics. For example, the fact of drinking or not may be more important to match the profiles than the height. To do this in the query above, just use instead of 1
, a higher value according to the importance of the characteristic.
Going a little deeper, if you need more optimization and a query is not feasible, you can establish profiles of people. When a user fulfills its characteristics, the system classifies it into one of the registered types. This is just one more type of abstraction to simplify the data structure. The more profiles there are, the more refined the result will be. This is the most "thick" but more efficient method. The weakness is that if someone doesn’t fit a pre-established profile, the chances of not finding anyone compatible will be higher.
What is the bank’s technology?
– Leonel Sanches da Silva
mysql use - innoDB
– RodrigoBorth
An SP is a good idea when the execution of a query is recurring
– jean
BTW the idea of a table to relate the interests x characteristics of people does not seem a good idea because it will grow squared in relation to the number of people. For 1,000 people we will have 1,000,000 compatibilities
– jean
exactly why I’m looking for alternatives... but what exactly is a Stored Procedure?
– RodrigoBorth