Problem with Regex and Java

Asked

Viewed 35 times

0

my problem is to solve is to remove a substring from a string but ignoring accents, spaces and any special characters. I will paste the test scenarios that makes it easier to understand the problem.

    import static org.junit.jupiter.api.Assertions.*;
    import org.junit.jupiter.api.Test;
    public class StringUtilTest {
        @Test
        public void deveTestarRemoverAcentosEspacos() {
            assertEquals("Maca", StringUtil.removerAcentosEspacos("Maçã"));
            assertEquals("Macadoamor", StringUtil.removerAcentosEspacos("Maçã-do amor"));
            assertEquals("MarcaTeste", StringUtil.removerAcentosEspacos("Marca Test'e"));
        }
        @Test
        public void deveTestarRemoverSubstring() {
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser removido texto", "texto a ser removido"));
            
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto á ser removido texto", "texto a ser removido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto textoaserremovido texto", "texto a ser removido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a s'er removido texto", "texto a ser removido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser-removido texto", "texto a ser removido"));
            
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser removido texto", "texto á ser removido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser removido texto", "textoaserremovido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser removido texto", "texto a s'er removido"));
            assertEquals("texto texto", StringUtil.removerSubstring("texto texto a ser removido texto", "texto a ser-removido"));
            
            assertEquals("tex'tó texto texto-texto", StringUtil.removerSubstring("tex'tó texto texto á ser removido texto-texto", "texto a ser removido"));
        }
    }

The implementation I have failed in the last test as I can’t get back to the original text. Any idea?

    import java.text.Normalizer;
    
    public class StringUtil {
        public static String removerAcentosEspacos(String str) {
            String retorno = Normalizer.normalize(str, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
            retorno = retorno.replaceAll("['\s-]", "");
            return retorno;
        }
    
        public static String removerSubstring(String texto, String textoASerRemovido) {
            texto = removerAcentosEspacos(texto).replaceAll(removerAcentosEspacos(textoASerRemovido), " ");
            return texto.replaceAll("\s+"," ");
        }
    }
  • 1

    Within removerAcentosEspacos, the replaceAll("['\\s-]", "") remove spaces, hyphens and apostrophes, then tex'tó is transformed into texto, and texto-texto flipped textotexto, so the original text is lost. If you don’t want this, then don’t remove these characters from the original text (only from the text to be replaced, perhaps - I don’t understand if that’s right)

  • One detail is that within strings the shortcut \s would have to be written as \\s - I don’t know if it was a typo

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.