Just for the record, to check that a String
contains some character, do not need to traverse all characters in one loop. Just use the method contains
:
if (texto.contains("?")) {
System.out.println("Você não pode adicionar ao texto interrogação.");
} else if (texto.contains(" ")) {
System.out.println("Você não pode adicionar ao texto espaços.");
}
Only now I had to use double quotes, because the method gets one String
and not a char
.
The difference, of course, is that in your loop, as you go through all the characters, so if the String
has a space and a ?
, both messages will be displayed (and if there is more than one occurrence, then the message will be displayed multiple times). Already in the above code only one of them is displayed - unless you take the else
, then both will be displayed:
if (texto.contains("?")) {
System.out.println("Você não pode adicionar ao texto interrogação.");
}
if (texto.contains(" ")) {
System.out.println("Você não pode adicionar ao texto espaços.");
}
Another difference is that contains
is not limited to checking only one character:
System.out.println("abcdef".contains("cde")); // true
Going a little further, the comparison char
to char
works well for texts in Portuguese (and several other languages), but has its limitations, since nowadays it is possible to have codes like this:
String texto = "a";
for (int i = 0; i < texto.length(); i++) {
char c = texto.charAt(i);
System.out.printf("%c - %06X\n", c, (int) c);
}
Yes, a direct emoji in the code. If your IDE does not support this, you can build the same String
thus:
int[] codepoints = { 0x61, 0x1f4a9 };
String texto = new String(codepoints, 0, codepoints.length);
Despite the String
have two "characters" (the letter a
and emoji ), the output shows 3 char
's:
a - 000061
? - 00D83D
? - 00DCA9
That’s because one char
in Java has 16 bits, and is only able to store values up to 65535. But Unicode defines a much larger amount of characters, so characters like emoji PILE OF POO, whose code point is U+1F4A9 (i.e., a value greater than a char
supports) are "broken" in two - in case, 0xD83D and 0xDCA9, which is called "surrogate pair" (that’s because internally Java stores the String
's in UTF-16 - to better understand, read here and here).
I mean, if I want to search for the emoji, it’s no use going through the char
's one by one. A not very good solution would be to check the next character to know if it is a surrogate pair:
for (int i = 0; i < texto.length(); i++) {
char c = texto.charAt(i);
// verifica o surrogate pair (precisa verificar o próximo caractere)
if (c == 0xd83d && i < texto.length() - 1 && texto.charAt(i + 1) == 0xdca9) {
System.out.println("tem emoji");
}
}
The if
above can also be so:
// não preciso saber o valor do próximo, só preciso verificar se são um surrogate pair
if (c == 0xd83d && i < texto.length() - 1 && Character.isSurrogatePair(c, texto.charAt(i + 1))) {
But in this case, I think I’d better contains
:
if (texto.contains("")) {
System.out.println("tem emoji");
}
Or go through the code points of String
:
int codePointCount = texto.codePointCount(0, texto.length());
for (int i = 0; i < codePointCount; i++) {
int cp = texto.codePointAt(i);
if (cp == 0x1f4a9) {
System.out.println("tem emoji");
}
// comparação com char literal continua funcionando para valores abaixo de 0xffff
if (cp == 'a') {
System.out.println("Tem letra 'a'");
}
}