How do I search for profiles with characteristics similar to the interests of another profile on a relationship site?

Asked

Viewed 765 times

7

I need to select several clients from a site of relationships that I am developing and sort by compatibility with the person’s profile IE, the more similar characteristics the greater the compatibility and besides that I have to make a kind of "counter" to display the compatibility percentage...

my main problem is to always return results, even if no feature matches the interest of the person and develop a way to display the "compatibility" between the two people...

The most important thing is the logic of how to get to the result and not the ready code

Tabela Carateristica

id
nome
sobrenome
cidade
estadocivil
altura
peso
fisico
pele
olhos
filhos
querofilhos
bebo
fumo
denominacao
frequencia
importancia
igreja
sou
procuro

Table Interest

id
cidade
estadocivil
altura_min
altura_max
peso_min
peso_max
fisico
pele
olhos
filhos
quer_filhos
bebe
fuma
denominacao
frequencia
importancia

I have not yet arrived at a logic of how to do this so I have no code to post so I can better guide you, but once I have it I add here.

  • 5

    Even if you haven’t left 0, it’s best to post your initial attempts, the database features, links from your initial search... The more information about your specific case, the easier it is to provide an adequate response. Check out the guide [Ask].

  • added what I could to help @brasofilo

  • I believe you will depend more on the relationship site api will make available to you. You can even get the system to authenticate and start scanning the site ( a Spider + webcrawler ) but probably the web site will stop you.

  • @Fláviogranato actually I developed the site, and this is an internal query on the site... I will not request by api but create a page on the site that will display this...

  • I found the question very interesting, but I don’t think the tags are correct. The question seems to me to be much more about the method than one or other technology.

  • @Luizvieira which tag do you think would best fit this doubt?

  • @Rodrigoborth Truth, I didn’t even suggest one. It was more. Well, looking at the tags that already exist, I think that algoritmo house well. But, anyway, you can also create a new tag (I haven’t seen such questions here). Maybe compatibilidade and/or métrica?

  • 1

    @Luizvieira inserted the tag algoritmo but I still can’t create new tags to insert metric as a possibility...

Show 3 more comments

1 answer

9


The first thing you need to define is how you will represent the compatibility between two individuals A and B. You yourself mention the term "percentage" in your question, then I think the most appropriate thing for you is to use values between 0 and 1 so that 0 represents no compatibility and 1 represents total compatibility. Values between these limits would define the degree of compatibility.

The simplest way to obtain this compatibility is to calculate a "distance" (or similarity) between individuals by means of their characteristics. Imagining that two individuals A and B completely equal in terms of their characteristics would be fully compatible, the expected result would be 1. This abstraction can be made for each characteristic individually. For example, consider age initially.

Assuming that two completely compatible individuals would have the same age (and this is a rather debatable point - more details later), a possible measurement would be as follows:

inserir a descrição da imagem aqui

The variable DIF_MAX contains the maximum difference between the highest and lowest age available in your database (which can be a pre-calculated value as the database is updated) and serves to normalize (i.e., keep the result between 0 and 1) the absolute difference between the ages of individuals A and B (respectively the variables idadeA and idadeB). The resulting similarity is equal to 1 - diferença absoluta, and makes that way individuals with exactly the same age have TOTAL similarity, that is, equal to 1 (since 1 - (0 / DIF_MAX) = 1 - 0 = 1) and individuals with the maximum difference between ages have ZERO similarity, that is, equal to 0 (since 1 - (DIF_MAX / DIF_MAX) = 1 - 1 = 0).

In the case of this field (age), the fact of being a numerical value facilitates because one can use this numerical approach. In the case of enumerable value fields (eye color, for example), you can represent each value as a distinct integer and apply the same approach (where DIF_MAX will be equal to the number of possible values for the field), or simply apply a logical/boolean approach where the similarity is 1 if and only if the individuals have exactly the same value and 0 otherwise (This also depends on the compatibility interpretation, which I discuss below).

Once having the similarities for each field, a general compatibility can be easily obtained with a simple or weighted average (if certain fields are more important than others) of the values per field (the variable NUM_CARAC represents the number of characteristics, that is, the number of fields of each individual):

inserir a descrição da imagem aqui

This "compatibility" between individuals is very likely to be calculated in advance by the system, and not at the time of the consultation. It should be easy to keep a table comparing each pair of individuals, and their values can be used to sort the list of potential relationships at the time of a query (thus, even if any individual has low compatibility with others in their base, there will always be listings to be displayed).


Finally I would like to discuss some important issues. Although this approach I have just suggested might work, I honestly don’t know if it is the best way to get a good result. Assessing compatibility between people only through age differences is not very useful on its own, but it can help in a larger context. However, what should be the weight of this information? It seems obvious that greater differences between the age of individuals diminish somewhat the compatibility, but this is not necessarily true because there are numerous examples in real life that refute this intuition.

I do not know in detail the form employed by companies/systems that provide this type of service (by the way, this is your most important homework! Try to understand how the competition does), but my intuition tells me that compatibility checking is not something as simple as measuring the similarity between individuals through their characteristics or preferences.

Compatibility depends mainly on personal preferences that are not exactly the preferences regarding partners (one could say that they are meta-preferences; for example, "eye color" may be more or less important for different individuals). In addition, the compatibility may also vary with the context (local or temporal): for example, famous people greatly influence the aesthetic and behavioral preferences of other individuals.

So, even if the weighted average approach can be useful in a first version of your system, surely you will need to use more elaborate methods to be able to compete with what already exists.

Useful compatibility metrics would safely use a combination of individual compatibility (measured for example in a previously suggested way) and other more subjective metrics. You may have noticed that a "trendy" metric is social compatibility (i.e., resulting from the evaluation of the "community" itself regarding individuals), but this is both interesting and controversial (see the discussions on applications such as Lulu and Tinder). Probably the future holds potential for applying similar approaches to Netflix, that is, a recommendation system in which the history of choices and evaluations are used to "predict" future tastes.

Anyway, if your system includes different weights for the characteristics, an option that can be easy to implement and produce satisfactory results may be to allow users to evaluate the displayed results (in terms of liking and dislike). Based on the assessment, the weights themselves could be INDIVIDUALLY dynamically adjusted.

  • 1

    The idea was excellent! and lucky for me all fields are integer (represent predefined values). and the idea of keeping a table with all users' comparisons was the most interesting point for me, because actually making the system generate all the comparison at the time of the query would be the biggest stupid thing I would do... Although I did not understand the second formula that you presented the idea and the logic fulfill exactly what I need, so later I can leave here the final result of this work :D thank you very much for the help and the time spent

  • The "second formula" is just to illustrate the weighted average (if you choose to use weights - otherwise it’s as if all the weights were worth 1 and you just summed the values of the similarities of each field and divided by the total fields). After you have your result, post your own answer. :)

  • Well, now a worse question came up, if I create a table in the database for this, how do I update it with each change in the user’s profile? if I have many users I will have to update millions of records per second the server only has 8gb of ram and will not hold...

  • 1

    @Rodrigoborth: When you have millions of users, it’s probably easier to increase your server capacity.

  • I think it would be a table for account where the user would log in, another for the profile, and another for the "weights", all referencing the same account through a FK conta_id field. When you search profiles, use only weights to filter. And if you know how, create a table for profiling indexing.

  • @Miguelangelo then has to talk to my client, who cried to pay 150 for a vps with backup in hard drive physically separate to avoid crashes... it’s a matter of explaining to someone, who thinks he knows more than a programmer and actually knows less than an ordinary user, that the server will not hold more than 10 thousand users with only 8gb of ram... for you to be aware, for the launch of the site he prepared an email marketing with more than 7 million registered emails... if 1% enter simultaneously the server falls.

  • @Rodrigoborth, if you are a programmer, leave this problem with the server administrator.

  • @Marceloaymone the weights are the same for all columns, however as Luiz Vieira said, can not compare at the time users better save the comparisons in the database and update them, but updating comparisons every time a user registers or modifies their profile would not generate a lot of request? (I’m thinking of opening up a question about that)

  • @Marceloaymone would be easy like this, but here is a small company, any and all problems related to a site made here spare for those who programmed...

  • I think it generates a good question, how to index and update this system of comparisons... maybe a Stored Procedure? They are stored in the BD and run faster.

  • http://answall.com/questions/8318/comor-indexar-e-actualizr-um-sistema-de-users. I put as sql the tag

  • It will probably be more advantageous for you to keep a functional version of the compatibilities table during the day (to be used in searches) and perform a procedure (a stored Procedure, as suggested by @Marceloaymone) the night to update the compatibilities with the previous day’s changes. With the volume of users you’re considering, the fact that you don’t have any updates applied immediately can go unnoticed.

  • I also think @Luizvieira...

Show 8 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.