Using regular expressions would give a little more flexibility. See an example:
public class WordUpperCaseExample {
public static final Locale pt_BR = new Locale("pt","BR");
public static void main(String[] args) {
String[] examples = { "douglas léonardo.smc", "XGR - STS.SMC", "xpto.smc", "aRqUiVo_cOmPoStO.tXt", "aRqUiVo-cOmPoStO.tXt", "123nome.txt" };
for (String e : examples) {
System.out.println(upperCaseWords(e));
}
}
public static String upperCaseWords(String phrase) {
Matcher m = Pattern.compile("\\.?[\\p{IsAlphabetic}][\\w\\d&&[^_]]*", Pattern.UNICODE_CHARACTER_CLASS).matcher(phrase);
StringBuffer sb = new StringBuffer();
while (m.find()) m.appendReplacement(sb, upperCaseFirst(m.group()));
return m.appendTail(sb).toString();
}
public static String upperCaseFirst(String word) {
return word.isEmpty() ? word :
word.length() == 1 ? word.toUpperCase(pt_BR) :
word.startsWith(".") ? word.toLowerCase(pt_BR) : (word.substring(0, 1).toUpperCase(pt_BR) + word.substring(1).toLowerCase(pt_BR));
}
}
The regular expression \\.?[\\p{IsAlphabetic}][\\w\\d&&[^_]]*
may seem complex, but she searches for words that:
\\.?
: optionally start with a dot, which would be the file extension. So I check if the captured group starts with dot and, if true, it converts everything to tiny.
[\\p{IsAlphabetic}]
: force the first character of the group captured by the expression to be an alphabetic character in the Unicode table. This causes for example, 123nome
turn 123Nome
, since the expression will capture only starting from the first letter. This restriction also causes other characters dividing words not to be captured.
[\\w\\d&&[^_]]*
causes the other letters and numbers to be captured while &&[^_]
disregards the "underlined* (underscore). This makes it abc_abc
turn Abc_Abc
.
In addition, the parameter Pattern.UNICODE_CHARACTER_CLASS
causes Unicode table characters to be considered, so for example, \w
will capture accented characters like á
in addition to those belonging to the ASCII standard as a
.
The method upperCaseFirst
does the conversion of each word. It contains any additional logic needed to convert a captured word. The rules of the above example are:
- Empty word does nothing. This is just a precaution, as regular expression does not allow this to happen. Thus the method can be reused safely and efficiently.
- Word with a character, converts to uppercase. However you may want to change this so that
olhe a casa
turn Olhe a Casa
and not Olhe A Casa
as it is now.
- If the word starts with a dot, it converts to a lowercase. This is to address the file extension case, but can cause side effects if there are other points in the file name. If you want to treat this, it is better to treat the extension separately as in the reply in Rodrigo.
- In other cases, convert the first character to uppercase and the rest to lowercase. Note that I always specify the locale to carry out the conversion operations. This avoids possible inconsistencies if the program runs in different environments where the Java language is different.
The general idea of the implementation is that you can easily add and modify the rules by changing the regular expression according to the class documentation Pattern
allows and also the method upperCaseFirst
.
http://stackoverflow.com/questions/19828111/making-first-letter-capital-using-regex-like-in-ucwords
– Daniel Omine