Is it possible to modify a field before performing a search? (PHP MYSQL)

Asked

Viewed 253 times

1

"SELECT * FROM table WHERE content LIKE '%$search%'"

It is possible to modify the values of the 'content' column before the search is performed?

Example:

"SELECT * FROM table WHERE strip_tags(content) LIKE '%$search%'"

It is possible to accomplish something similar?

I have a content that is all coding and HTML tags, I would like to filter it and leave only the text during the search.

Example of Content:

   <p><b>A</b> hist&oacute;ria da vida.</p>

If $search is 'history' for example, it will not find any results.

  • 1

    It is possible, but you should explain better what you are doing, to have a more appropriate response. You want to give a strip_tags in the field instead of giving the strip_tags what will be searched for? Of course Mysql will not understand PHP commands, but depending on what you want, can do in PHP before the query, or by swapping out PHP functions for Mysql-like ones.

3 answers

2


I see three possibilities for your problem.

  • Use the function REPLACE, which would leave the search extremely slow, due to the amount of substitutions you will have to make by registration;
  • Write text without HTML to another table field, but it will consume a lot of disk space by writing duplicate data;
  • Resorting to Information Recovery algorithms.

There are several types of information recovery algorithms, ranging from the simplest to very complex like Google. What I’m going to present here is very used in SEO.

But here is an explanation of how to build such a simple algorithm and improve searches.

The basic process for information retrieval is to process the text that will be saved, extracting relevant information that will help in the query and ranking of the results. They being the text treatments:

  • Tokenization: Separate all words from text;
  • Normalization: Turn all letters to lower case, remove symbols and accents;
  • Stopwords: Removal of irrelevant searches such as "a", "e", "o" "from", "to", "by", etc;
  • Stemming: Convert all words to the grammatical root. We will not work this part because we need a very complex dictionary.

Let’s use the sample string:

$string = '&Aacute; <strong>Oi</strong> aqui é um teste!!     \n Faça tudo para tirar simbolos e acentos deste teste.';

/**
 * Removendo o HTML.
 */
$clean = html_entity_decode($string);
$clean = strip_tags($clean);

/**
 * Removendo acendos e símbolos.
 */
setlocale(LC_ALL, 'pt_BR.UTF8');

// Remove espaços e quebra de linha.
$clean = trim($string);
$clean = preg_replace('/\s(?=\s)/', '', $clean);
$clean = preg_replace('/[\n\r\t]/', ' ', $clean);

// Remove acentos. Atenção para a função iconv que deve estar instalada.
$clean = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);

// Remove tudo que não for letra ou número.
$clean = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $clean);

// transforma tudo em minúsculo.
$string = strtolower(trim($clean, '-'));

/**
 * Removendo stopwords.
 */
// Criando dicionário de stopwords.
$stopwords = array(
    'a', 'agora', 'ainda', 'ali', 'alguem', 'algum', 'alguma', 'algumas',
    'alguns', 'ampla', 'amplas', 'amplo', 'amplos', 'ante', 'antes', 'ao',
    'aos', 'apos', 'aquela', 'aquelas', 'aquele', 'aqueles', 'aqui', 'aquilo',
    'as', 'ate', 'atraves', 'cada', 'coisa', 'coisas', 'com', 'como', 'contra',
    'contudo', 'da', 'daquele', 'daqueles', 'das', 'de', 'dela', 'delas', 'dele',
    'deles', 'depois', 'dessa', 'dessas', 'desse', 'desses', 'desta',
    'destas', 'deste', 'deste', 'destes', 'deve', 'devem', 'devendo',
    'dever', 'devera', 'deverao', 'deveria', 'deveriam', 'devia', 'deviam',
    'disse', 'disso', 'disto', 'dito', 'diz', 'dizem', 'do', 'dos', 'e',
    'ela', 'elas', 'ele', 'eles', 'em', 'enquanto', 'entre', 'era', 'essa',
    'essas', 'esse', 'esses', 'esta', 'esta', 'estamos', 'estao', 'estas',
    'estava', 'estavam', 'estavamos', 'este', 'estes', 'estou', 'eu',
    'fazendo', 'fazer', 'feita', 'feitas', 'feito', 'feitos', 'foi', 'for',
    'foram', 'fosse', 'fossem', 'grande', 'grandes', 'ha', 'isso', 'isto',
    'ja', 'la', 'lhe', 'lhes', 'lo', 'mas', 'me', 'mesma', 'mesmas',
    'mesmo', 'mesmos', 'meu', 'meus', 'minha', 'minhas', 'muita', 'muitas',
    'muito', 'muitos', 'na', 'nao', 'nas', 'nem', 'nenhum', 'nessa',
    'nessas', 'nesta', 'nestas', 'ninguem', 'no', 'nos', 'nos', 'nossa',
    'nossas', 'nosso', 'nossos', 'num', 'numa', 'nunca', 'o', 'os', 'ou',
    'outra', 'outras', 'outro', 'outros', 'para', 'pela', 'pelas', 'pelo',
    'pelos', 'pequena', 'pequenas', 'pequeno', 'pequenos', 'per', 'perante',
    'pode', 'pude', 'podendo', 'poder', 'poderia', 'poderiam', 'podia',
    'podiam', 'pois', 'por', 'porem', 'porque', 'posso', 'pouca', 'poucas',
    'pouco', 'poucos', 'primeiro', 'primeiros', 'propria', 'proprias',
    'proprio', 'proprios', 'quais', 'qual', 'quando', 'quanto', 'quantos',
    'que', 'quem', 'sao', 'se', 'seja', 'sejam', 'sem', 'sempre', 'sendo',
    'sera', 'serao', 'seu', 'seus', 'si', 'sido', 'so', 'sob', 'sobre',
    'sua', 'suas', 'talvez', 'tambem', 'tampouco', 'te', 'tem', 'tendo',
    'tenha', 'ter', 'teu', 'teus', 'ti', 'tido', 'tinha', 'tinham', 'toda',
    'todas', 'todavia', 'todo', 'todos', 'tu', 'tua', 'tuas', 'tudo',
    'ultima', 'ultimas', 'ultimo', 'ultimos', 'um', 'uma', 'umas', 'uns',
    'vendo', 'ver', 'vez', 'vindo', 'vir', 'vos', 'vos'
);
$string = preg_replace('/\b(' . implode('|', $stopwords) . ')\b/', '', $string);

/**
 * Obtendo os Tokens já com a quantidade de repetição das palavras.
 */
$tokens = explode(' ', $string);
//$tokens = array_count_values($string);
// Agrupa todas as palavras repetidas.
// Caso queira partir para um algoritmo mais complexo de indexação, utilize
// a função 'array_count_values' no lugar de 'array_flip', assim vc terá as
// palavras e a quantidade que cada uma repete.
// $tokens = array_count_values($string);
$tokens = array_flip($string);

// Tratando para o armazenamento
$keywords = join(',', $tokens);

Ready now we need to store this in the database to facilitate the search. To do this create in your table a field called keywords, and before inserting the contents, run the above algorithm.

In your searches assemble a code as follows:

$sql = 'SELECT * FROM textos WHERE keywords LIKE "%:keywords%"';
$keywords = explode(',', $string);
$keywords = join('%', $string);
$stmt->execute( array( ':keywords' => $keywords ) );
$result = $res->fetchAll();

If your table is Myisam it will get even better.

$sql = 'SELECT * FROM textos WHERE MATCH(keywords) AGAINST(":keywords" IN BOOLEAN MODE)';
$stmt->execute( array( ':keywords' => $keywords ) );
$result = $res->fetchAll();

The use of IN BOOLEAN MODE allows you to specify things of type "+test take symbol" which will find results with obligatorily the word test and the others if you have. See the links below to learn more.

Now for existing texts, make a script that goes through all records and do the treatment above.

0

Yes, you can use the function REPLACE(), follows an example:

SELECT * FROM tabela WHERE REPLACE(conteudo, 'procura', 'substitui') LIKE '%$pesquisa%'

I hope it helps

0

For something similar to what you want, MYSQL supports a type of search called FULL TEXT SEARCH that does a very different search from the common comparison.

http://dev.mysql.com/doc/refman/5.0/en/fulltext-query-expansion.html

With it is possible to do some searches with variable content where the result is given by the percentage that the searched value is in the searched columns.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.