The first thing you need to define is how you will represent the compatibility between two individuals A and B. You yourself mention the term "percentage" in your question, then I think the most appropriate thing for you is to use values between 0 and 1 so that 0 represents no compatibility and 1 represents total compatibility. Values between these limits would define the degree of compatibility.
The simplest way to obtain this compatibility is to calculate a "distance" (or similarity) between individuals by means of their characteristics. Imagining that two individuals A and B completely equal in terms of their characteristics would be fully compatible, the expected result would be 1. This abstraction can be made for each characteristic individually. For example, consider age initially.
Assuming that two completely compatible individuals would have the same age (and this is a rather debatable point - more details later), a possible measurement would be as follows:
The variable DIF_MAX
contains the maximum difference between the highest and lowest age available in your database (which can be a pre-calculated value as the database is updated) and serves to normalize (i.e., keep the result between 0 and 1) the absolute difference between the ages of individuals A and B (respectively the variables idadeA
and idadeB
). The resulting similarity is equal to 1 - diferença absoluta
, and makes that way individuals with exactly the same age have TOTAL similarity, that is, equal to 1 (since 1 - (0 / DIF_MAX) = 1 - 0 = 1
) and individuals with the maximum difference between ages have ZERO similarity, that is, equal to 0 (since 1 - (DIF_MAX / DIF_MAX) = 1 - 1 = 0
).
In the case of this field (age), the fact of being a numerical value facilitates because one can use this numerical approach. In the case of enumerable value fields (eye color, for example), you can represent each value as a distinct integer and apply the same approach (where DIF_MAX
will be equal to the number of possible values for the field), or simply apply a logical/boolean approach where the similarity is 1 if and only if the individuals have exactly the same value and 0 otherwise (This also depends on the compatibility interpretation, which I discuss below).
Once having the similarities for each field, a general compatibility can be easily obtained with a simple or weighted average (if certain fields are more important than others) of the values per field (the variable NUM_CARAC
represents the number of characteristics, that is, the number of fields of each individual):
This "compatibility" between individuals is very likely to be calculated in advance by the system, and not at the time of the consultation. It should be easy to keep a table comparing each pair of individuals, and their values can be used to sort the list of potential relationships at the time of a query (thus, even if any individual has low compatibility with others in their base, there will always be listings to be displayed).
Finally I would like to discuss some important issues. Although this approach I have just suggested might work, I honestly don’t know if it is the best way to get a good result. Assessing compatibility between people only through age differences is not very useful on its own, but it can help in a larger context. However, what should be the weight of this information? It seems obvious that greater differences between the age of individuals diminish somewhat the compatibility, but this is not necessarily true because there are numerous examples in real life that refute this intuition.
I do not know in detail the form employed by companies/systems that provide this type of service (by the way, this is your most important homework! Try to understand how the competition does), but my intuition tells me that compatibility checking is not something as simple as measuring the similarity between individuals through their characteristics or preferences.
Compatibility depends mainly on personal preferences that are not exactly the preferences regarding partners (one could say that they are meta-preferences; for example, "eye color" may be more or less important for different individuals). In addition, the compatibility may also vary with the context (local or temporal): for example, famous people greatly influence the aesthetic and behavioral preferences of other individuals.
So, even if the weighted average approach can be useful in a first version of your system, surely you will need to use more elaborate methods to be able to compete with what already exists.
Useful compatibility metrics would safely use a combination of individual compatibility (measured for example in a previously suggested way) and other more subjective metrics. You may have noticed that a "trendy" metric is social compatibility (i.e., resulting from the evaluation of the "community" itself regarding individuals), but this is both interesting and controversial (see the discussions on applications such as Lulu and Tinder). Probably the future holds potential for applying similar approaches to Netflix, that is, a recommendation system in which the history of choices and evaluations are used to "predict" future tastes.
Anyway, if your system includes different weights for the characteristics, an option that can be easy to implement and produce satisfactory results may be to allow users to evaluate the displayed results (in terms of liking and dislike). Based on the assessment, the weights themselves could be INDIVIDUALLY dynamically adjusted.
Even if you haven’t left 0, it’s best to post your initial attempts, the database features, links from your initial search... The more information about your specific case, the easier it is to provide an adequate response. Check out the guide [Ask].
– brasofilo
added what I could to help @brasofilo
– RodrigoBorth
I believe you will depend more on the relationship site api will make available to you. You can even get the system to authenticate and start scanning the site ( a Spider + webcrawler ) but probably the web site will stop you.
– Flávio Granato
@Fláviogranato actually I developed the site, and this is an internal query on the site... I will not request by api but create a page on the site that will display this...
– RodrigoBorth
I found the question very interesting, but I don’t think the tags are correct. The question seems to me to be much more about the method than one or other technology.
– Luiz Vieira
@Luizvieira which tag do you think would best fit this doubt?
– RodrigoBorth
@Rodrigoborth Truth, I didn’t even suggest one. It was more. Well, looking at the tags that already exist, I think that
algoritmo
house well. But, anyway, you can also create a new tag (I haven’t seen such questions here). Maybecompatibilidade
and/ormétrica
?– Luiz Vieira
@Luizvieira inserted the tag
algoritmo
but I still can’t create new tags to insert metric as a possibility...– RodrigoBorth