REGEX - problem with validation

Asked

Viewed 155 times

2

I need to do the following validation:

a-z (upper and lower case), hyphenate (-), apostrophe (\'), space ( ) and numbers (0-9)

For that I did the following:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {
    /** validação */
    public static final String PATTERN= "^[A-Z|a-z|0-9| |Á-Ú|á-ú|Ã-Ũ|ã-ũ|'|-]+$";
    /** testes positivos */
    public static String[] itens = { "á é í ó ú", "ã ẽ ĩ õ ũ", "Á È Ĩ Ã ó", "aeiou", "abc def ghi", "um 23 45",
                                     "Um - 2 - tres quatro", "Um' 2  três' quatro", "maçã", "Â Ê Î ô û", "á Ae Éi Ĩô O"};

    public static void main(String[] args) {
        for(final String s : itens) {
            boolean b = isValid(s);
            System.out.println(b+" : "+s);
        }
    }
    public static boolean isValid(final String string) {
        Pattern p = Pattern.compile(PATTERN);
        Matcher m = p.matcher(string);
        return m.matches();
    }
}

Technically it is for all items to return true.

But the following String ã ẽ ĩ õ ũ, returns false.

How can I do this validation?

Follow the Ideone link

1 answer

3


What you need to use are regex based on Unicode. Accented characters don’t have an order that makes much sense to mere mortals, and therefore things like á-ú doesn’t work.

The class \p{Letter} (or simply \p{L}) represents letters in general. However, it also encompasses non-Latin letters (Cyrillic, Greek, Hebrew, Chinese, Arabic, etc.)

The class \p{IsLatin} consider the Latin characters. However, special symbols (brackets, brackets, asterisk, percentage, etc.) are also considered special characters.

Therefore, the solution is to use the intersection of these two sets with [\p{L}&&[\p{IsLatin}]] or with [\p{IsLatin}&&[\p{L}]].

Here is the resulting code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegexTest {

    /* Validação. */
    public static final Pattern PADRAO = Pattern.compile(
            "^[([\\p{L}&&[\\p{IsLatin}]]|0-9| |'|-]+$");

    /* Testes positivos. */
    public static String[] positivos = {
            "á é í ó ú",
            "ã ẽ ĩ õ ũ",
            "Á È Ĩ Ã ó",
            "aeiou",
            "abc def ghi",
            "um 23 45",
            "Um - 2 - tres quatro",
            "Um' 2  três' quatro",
            "maçã",
            "Â Ê Î ô û",
            "á Ae Éi Ĩô O",
            "O rato roeu a roupa do rei de Roma",
            "áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ"
    };

    /* Testes negativos. */
    public static String[] negativos = {
            ".",
            "*",
            "/",
            "<",
            "≃",
            "^",
            "~",
            "()",
            "#",
            "中国"
    };

    public static void main(String[] args) {
        for (final String s : positivos) {
            boolean b = isValid(s);
            System.out.println(b + (b ? " ok - " : " oops - ") + s);
        }
        for (final String s : negativos) {
            boolean b = isValid(s);
            System.out.println(b + (b ? " oops - " : " ok - ") + s);
        }
    }

    public static boolean isValid(final String string) {
        return PADRAO.matcher(string).matches();
    }
}

Here’s the way out:

true ok - á é í ó ú
true ok - ã ẽ ĩ õ ũ
true ok - Á È Ĩ Ã ó
true ok - aeiou
true ok - abc def ghi
true ok - um 23 45
true ok - Um - 2 - tres quatro
true ok - Um' 2  três' quatro
true ok - maçã
true ok - Â Ê Î ô û
true ok - á Ae Éi Ĩô O
true ok - O rato roeu a roupa do rei de Roma
true ok - áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛãẽĩñõũÃẼĨÑÕŨçÇ
false ok - .
false ok - *
false ok - /
false ok - <
false ok - ≃
false ok - ^
false ok - ~
false ok - ()
false ok - #
false ok - 中国

See here working on ideone.

Ah, one more detail: The object of the class Pattern is costly to be built, but it is immutable, thread-safe and can be reused the will once it is created. Therefore, always prefer to build it in the static scope if possible, avoiding creating and recreating the same Pattern several times.

  • 1

    Perfect! thank you very much!

  • up for the great explanation

Browser other questions tagged

You are not signed in. Login or sign up in order to post.