Regular expression to select whole word and case sensitive in sharp word

Asked

Viewed 1,013 times

2

I need to make a program that searches a certain word in a set of texts and marks the word searched in the middle of the text.

For this I developed the following method:

 public void grifarTexto(Relato relato, String texto) {
    relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)("
 + texto + ")", "<mark>$1</mark>"));
 }

But then there were two little problems...

1º I would like him to take the whole word, but when I put the beginning ( ) and end ($) markup characters, he ends up not highlighting any part of the text.

Method used:

 public void grifarTexto(Relato relato, String texto) {
    relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)
 ^(" + texto + ")$", "<mark>$1</mark>")); 
 }

2º He is ignoring the maísculo and minusco characters of the word correctly, unless he has an accent. For example: When I search by the word hand

MÃO (não grifa)
mão (grifa)
mÃo (não grifa)
MãO (grifa)

That is, it does not ignore the maísculo characters and minute of accented letters.

I tested these expressions on the site Rubular to see if they were correct and the return of the site seems to be ok. Links with the tests: http://rubular.com/r/YRcTJBn8eY and http://rubular.com/r/dh753n4mgl

Does anyone know which regular expression I should use to get the validations I wish?

2 answers

2


As you are working with word search in text the word search delimiters are not the ^ (beginning), $ end, as these refer to string entire.

  • ^ - beginning of string
  • $ - end of string

To solve this case uses the \b (Boundary), than be for words.

As for the accented words, the problem is that as well as the PHP and Java uses simple ASCII table to treat searches by limiting the top 127 positions of the table.

To solve this problem you need to use the modifier :

Pattern.UNICODE_CHARACTER_CLASS

You could do something like this

Pattern p = Pattern.compile("\\b"+texto+"\\b", Pattern.UNICODE_CHARACTER_CLASS);

0

RESOLVED

Thank you William, it worked!!!

The only detail is that I ended up doing it this way:

relato.setDescricaoRelato(relato.getDescricaoRelato().replaceAll("(?i)(?u)(\\b" + texto + "\\b)", "<mark>$1</mark>"));
  • if he helped you solve it, you should mark his response as correct

  • Sorry... Done!!

  • good @Brenocabral

Browser other questions tagged

You are not signed in. Login or sign up in order to post.