If you want to search for the phonetics of the words adapted to Portuguese perform the function:
DROP FUNCTION IF EXISTS transformar_fonetica;
DELIMITER $
CREATE FUNCTION transformar_fonetica(ptexto TEXT)
RETURNS TEXT
BEGIN
DECLARE vtexto TEXT;
DECLARE vtexto_apoio TEXT;
DECLARE vposicao_atual INT;
DECLARE vcaracter_anterior VARCHAR(1);
DECLARE vcaracter_atual VARCHAR(1);
DECLARE vcaracter_seguinte VARCHAR(1);
DECLARE vsom VARCHAR(2);
DECLARE com_acentos VARCHAR(65);
DECLARE sem_acentos VARCHAR(65);
SET vtexto = UPPER(ptexto);
SET com_acentos = 'ŠšŽžÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝŸÞàáâãäåæçèéêëìíîïñòóôõöøùúûüýÿþƒ';
SET sem_acentos = 'SsZzAAAAAAACEEEEIIIINOOOOOOUUUUYYBaaaaaaaceeeeiiiinoooooouuuuyybf';
SET vposicao_atual = LENGTH(com_acentos);
-- Remove acentos
WHILE vposicao_atual > 0 DO
SET vtexto = REPLACE(vtexto, SUBSTRING(com_acentos, vposicao_atual, 1), SUBSTRING(sem_acentos, vposicao_atual, 1));
SET vposicao_atual = vposicao_atual - 1;
end while;
-- Remove caracteres inválido
SET vposicao_atual = 1;
WHILE vposicao_atual <= LENGTH(vtexto) DO
SET vcaracter_atual = SUBSTRING(vtexto, vposicao_atual, 1);
IF INSTR('ABCDEFGHIJKLMNOPQRSTUVWXYZ ', vcaracter_atual) <> 0 THEN
SET vtexto_apoio = CONCAT(IFNULL(vtexto_apoio, ''), vcaracter_atual);
END IF;
SET vposicao_atual = vposicao_atual + 1;
END WHILE;
SET vtexto = vtexto_apoio;
-- Substitui os mais simples
SET vtexto = REPLACE(vtexto, 'SS', 'S');
SET vtexto = REPLACE(vtexto, 'SH', 'X');
SET vtexto = REPLACE(vtexto, 'XC', 'S');
SET vtexto = REPLACE(vtexto, 'QU', 'K');
SET vtexto = REPLACE(vtexto, 'CH', 'X');
SET vtexto = REPLACE(vtexto, 'PH', 'F');
SET vtexto = REPLACE(vtexto, 'LH', 'LI');
-- Remove duplicados
SET vposicao_atual = 1;
SET vtexto_apoio = '';
WHILE vposicao_atual <= LENGTH(vtexto) DO
SET vcaracter_atual = SUBSTRING(vtexto, vposicao_atual, 1);
IF vposicao_atual < LENGTH(vtexto) THEN
SET vcaracter_seguinte = SUBSTRING(vtexto, vposicao_atual + 1, 1);
ELSE -- Último caracter não tem motivo para ser verificado
SET vcaracter_seguinte = '';
END IF;
IF vcaracter_atual <> vcaracter_seguinte THEN
SET vtexto_apoio = CONCAT(vtexto_apoio, vcaracter_atual);
END IF;
SET vposicao_atual = vposicao_atual + 1;
END WHILE;
SET vtexto = vtexto_apoio;
-- Troca caracteres pelo som
SET vposicao_atual = 1;
SET vtexto_apoio = '';
WHILE vposicao_atual <= LENGTH(vtexto) DO
SET vcaracter_atual = SUBSTRING(vtexto, vposicao_atual, 1);
IF vposicao_atual < LENGTH(vtexto) THEN
SET vcaracter_seguinte = SUBSTRING(vtexto, vposicao_atual + 1, 1);
ELSE
SET vcaracter_seguinte = '';
END IF;
-- "B" seguindo de qualquer caracter que não seja "A", "E", "I", "O", "U", "R" ou "Y"
IF vcaracter_atual = 'B' AND INSTR('AEIOURY', vcaracter_seguinte) = 0 THEN
SET vsom = 'BI';
-- "C" seguindo de "E", "I" ou "Y"
ELSEIF vcaracter_atual = 'C' AND INSTR('EIY', vcaracter_seguinte) <> 0 THEN
SET vsom = 'S';
ELSEIF vcaracter_atual = 'C' THEN
SET vsom = 'K';
ELSEIF vcaracter_atual = 'D' AND INSTR('AEIOURY', vcaracter_seguinte) = 0 THEN
SET vsom = 'DI';
ELSEIF vcaracter_atual = 'E' THEN
SET vsom = 'I';
ELSEIF vcaracter_atual = 'G' AND INSTR('EIY', vcaracter_seguinte) <> 0 THEN -- GE, GI OU GY
SET vsom = 'J';
ELSEIF vcaracter_atual = 'G' AND vcaracter_seguinte = 'T' THEN -- GT
SET vsom = '';
ELSEIF vcaracter_atual = 'H' THEN
SET vsom = 'H';
ELSEIF vcaracter_atual = 'N' THEN
SET vsom = 'M';
ELSEIF vcaracter_atual = 'P' AND INSTR('AEIOURY', vcaracter_seguinte) = 0 THEN
SET vsom = 'PI';
ELSEIF vcaracter_atual = 'Q' THEN
SET vsom = 'K';
-- QUA, QUE, QUI, QUO ou QUY
ELSEIF IFNULL(vcaracter_anterior, '') = 'Q' AND vcaracter_atual = 'U' AND INSTR('AEIOY', vcaracter_seguinte) <> 0 THEN
SET vsom = '';
ELSEIF vcaracter_atual = 'W' THEN
SET vsom = 'V';
ELSEIF vcaracter_atual = 'X' THEN
SET vsom = 'S';
ELSEIF vcaracter_atual = 'Y' THEN
SET vsom = 'I';
ELSEIF vcaracter_atual = 'Z' THEN
SET vsom = 'S';
ELSE
SET vsom = vcaracter_atual;
END IF;
SET vcaracter_anterior = vcaracter_atual;
SET vposicao_atual = vposicao_atual + 1;
SET vtexto_apoio = CONCAT(vtexto_apoio, vsom);
END WHILE;
SET vtexto = vtexto_apoio;
SET vtexto = CONCAT('%', replace(vtexto, ' ', '%'), '%');
RETURN vtexto;
END
$
The use is as follows::
SELECT *
FROM publicacao
WHERE transformar_fonetica(titulo) LIKE transformar_fonetica('$busca');
If there are many records to query
will have performance problems, so I advise you to create a speaker just for phonetics and a trigger
to update it.
Adding the new column
ALTER TABLE publicacao ADD fonetica TEXT;
Creation of Trigger for updating
DELIMITER $
CREATE TRIGGER publicacao_fonetica AFTER UPDATE ON publicacao
FOR EACH ROW
BEGIN
if NEW.titulo <=> OLD.titulo THEN
UPDATE publicacao
SET fonetica = transformar_fonetica(NEW.titulo)
WHERE id = OLD.id;
END IF;
END;
$
Data selection
SELECT *
FROM publicacao
WHERE fonetica LIKE transformar_fonetica('$busca');
Observing:
The pronunciation of some letter combinations varies according to the words, so there may be discrepancies. Yet I haven’t really found an example where it doesn’t work.
The search also works for first names.
If you give a like on
cia
it will bring, but will bring more things too... Have little information to give a better answer (at least for me :) )– Ricardo
then the word sought could be one of the 4 and always return the 4 lines or all q had pharmacy in one of these variations
– Jasar Orion
Search for FTS (full text search). The accent has no problem, Mysql does natively if you use an international collation (either in UTF-8, or Latin/ISO, works great, just need to choose one compatible with encoding). In the case of PH already changes a little.
– Bacco
I get it, I’ve worked in a place where we called this phonetic search, they registered one of the syllables and there was an engineering to locate based on that, but it’s not very simple
– Ricardo
@Ricardo in Portuguese is very boring, because there are combinations that change all the meaning depending on the division of the words. I find it more efficient to do a similar letter approximation instead of an exact search, but it has to be analyzed on a case-by-case basis. In my systems already adopted search for substring, so if one needs Luiz/Luis Augusto, she researches Lui Augusto and ready :) - For more complicated thing, I use a variant of Levenshtein
– Bacco
@Bacco, I agree 100% with you... It is not the best solution (I’m even reading about the FTS here), but it is a solution that worked and used there, as I’ve seen working I decided to talk.
– Ricardo
@Ricardo I think it’s nice that you spoke. I only mentioned because I’ve seen enough trying this way and getting complicated. I don’t even think it’s impossible no, I just didn’t see anything ready optimized for Portuguese, because people end up trying to reuse things made for English, and then it doesn’t work well. If someone stops to write an algorithm thinking about our particularities, even able to be cool. You just have to be careful with things that give more of a phonetic combination.
– Bacco
If you want something ready, you have this solution with Levenshtein here http://stackoverflow.com/a/4671557/916193
– Bacco
thank you very much, I will try to use I think q should solve my problem. I now have a question. How much this weighs the script?
– Jasar Orion