How to search for relevance in Mysql (prioritize whole word)

Asked

Viewed 251 times

1

I have a recipes site in PHP and Mysql.

The consultation is:

SELECT receita FROM receitas WHERE receita LIKE '%mac%'

When I search for "mac", it brings "macaroni" and "big mac". I wanted the result first bring "big Mac" and then "macaroni" because "big mac" has the term "whole Mac" while "macaroni" is only part of the text.

Is it possible to do this? Search first for WHOLE word and then just the snippet.

  • 2

    Possible duplicate of How to sort by Like Relevance?

  • 2

    Search for full-text search. https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html

  • @Andersoncarloswoss in reality has no connection. In this cited case the relevance is by proximity to the beginning

  • @Andersoncarloswoss, as Sorack said, this one doesn’t work much, for example, apple pie came before, any term that starts with mac may appear before

  • Okay, I’ve removed my vote, but I’ll leave the comment because it’s intrinsically related. Perhaps the one I mentioned is useful to someone who comes to this question.

  • @anonym the only problem of using the MATCH is that it will return only the records that have mac whole, that is, it ignores macarrão for example. And using the 'mac*' IN BOOLEAN MODE, in this case, it acts as a LIKE normal not putting the full words first. But it’s QUITE interesting this alternative, I’m doing some tests to see how it behaves.

Show 1 more comment

1 answer

2

I managed using mysql regexp.

I created the following regular expression:

(^|\s)mac($|\s)

The expression evaluates the text by following the rules:

  • (^|\s) - Before the word there must be one or more spaces or it is the beginning of it
  • mac - After space there has to be the word mac
  • ($|\s) - After the word has to have one or more spaces or is the end of it

If by chance the title of the search goes through this evaluation, then this record has to be right at the beginning.

For testing purposes I created the following code:

    /*
    CREATE TABLE IF NOT EXISTS receitas (
        id         INT AUTO_INCREMENT,
        titulo     VARCHAR(255) NOT NULL,
        receita    VARCHAR(255) NOT NULL,
        PRIMARY KEY (id)
    )  ENGINE=INNODB  CHARACTER SET=utf8;

    INSERT INTO receitas VALUES 
    (null, 'Big Mac', '2 hamburguers, alface, queijo...'),
    (null, 'Miojo', 'ferva a água...'),
    (null, 'Macarrão', 'ferva a água...'),
    (null, 'Torta de Maçã', 'Vá até um Mac Donalds e pague no caixa uma torta...'),
    (null, 'Maç', 'Vá até um Mac Donalds e pague no caixa uma torta...');
    */

    SELECT titulo 
    FROM receitas
    WHERE titulo LIKE '%mac%'
    ORDER BY
        CASE 
            WHEN titulo REGEXP '(^|\s)mac($|\s)' THEN 1 
            # aqui é onde acontece a mágica
            ELSE 2
        END
    ;

In the ordering is made the evaluation. If it enters the evaluation the value assigned will be 1 while others who do not enter will have a value of 2.

The problem...

The result obtained is:

Big Mac       // correto
Macarrão      // correto
Torta de Maçã // correto
Maç           // < --- este deveria estar no inicio

Whereas the title Maç should (theoretically) be among the first... This happens because the regexp search exactly the typed word.

To solve this you can create a field of slug in the table where you keep the spaces and change the special characters every time you save a new record, so that the search is done there.

Other option

Another solution, indicated in the comments by @anonimo, is to use full text search functions with the function MATCH instead of LIKE in Mysql.

For this it is necessary to change the field of the table that will be done the search with:

ALTER TABLE receitas ADD FULLTEXT(titulo);

With this, you can perform the union of two different queries:

    SELECT id, titulo 
    FROM receitas
    WHERE MATCH(titulo) AGAINST('mac')
        UNION
    SELECT id, titulo 
    FROM receitas
    WHERE MATCH(titulo) AGAINST('*mac*' IN BOOLEAN MODE);

The first is the whole word query, it will only take the records with the full text. The second will take everything that contains 'mac'(works the same as the LIKE in this case). The UNION combine results without leaving duplicate records.

  • 1

    I liked the first solution, it assigns a weight to each type of query and sorts by these weights. It became very sophisticated.

  • @Augustovasques yes... But it has that side effect that for me is not a problem. But...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.