Convert String to Offsetdatetime (dd 'from' MMMM 'from' yyyy 'to' HH:mm:ss)

Asked

Viewed 337 times

1

I got this String :

17 October 2008 at 11:35:04

And that Pattern:

dd 'de' MMMM 'de' yyyy 'às' HH:mm:ss

And that code:

private OffsetDateTime getData(){
     return Optional.ofNullable(page.<HtmlTableCell>getFirstByXPath(String.format(DOMAIN_XPATH, "Data de Distribuição")))
            .map(HtmlTableCell::asText)
            .map(DATE_TIME_PATTERN::matcher)
            .filter(Matcher::find)
            .map(Matcher::group)
            .map(str -> LocalDateTime.parse(str, DateTimeFormatter.ofPattern(DATE_TIME_PATTERN.toString())).atOffset(ZoneOffset.ofHours(-3)))
            .orElse(null);
}

I want to turn that date into a OffsetDateTime but using this Pattern it’s not working, it doesn’t go through find().

  • I updated my answer to get the month of March (missed the ç in regular expression)

1 answer

1


By code, I’m assuming DATE_TIME_PATTERN is a java.util.regex.Pattern.


Well, the pattern using in a java.util.regex.Pattern should be a regular expression (regex), which is quite different from pattern used in a DateTimeFormatter.

The regular expression would serve to check if the string is in a certain format (as "two digits, followed by space, followed by several letters (the month), etc"). Then she could be like this: *

Pattern DATE_TIME_PATTERN = Pattern.compile("\\d{2} de [a-zA-Zç]+ de \\d{4}.+\\d{2}:\\d{2}:\\d{2}");

With this, you check whether the String is in that format:

  • \\d{2} means "two digits" and \\d{4} means "four digits"
  • [a-zA-Zç]+ means "one or more letters" (including ç, for the month of March), which is enough for the name of the month, since the DateTimeFormatter will check the name later
  • .+ is "one or more characters"

Already to make the date Parsing, I also suggest set the java.util.Locale for the Portuguese language, because of the name of the month. If you do not specify a locale, the default of the JVM, and it is not always guaranteed to be Portuguese:

// expressão regular para verificar se a String se parece com uma data no formato que vc precisa
Pattern DATE_TIME_PATTERN = Pattern.compile("\\d{2} de [a-zA-Zç]+ de \\d{4}.+\\d{2}:\\d{2}:\\d{2}");

// usar o pattern para fazer o parsing da data, com locale em português por causa do nome do mês
DateTimeFormatter parser = DateTimeFormatter.ofPattern("dd 'de' MMMM 'de' uuuu 'às' HH:mm:ss", new Locale("pt", "BR"));

OffsetDateTime odt = Optional.ofNullable("17 de Outubro de 2008 às 11:35:04")
    .map(DATE_TIME_PATTERN::matcher)
    .filter(Matcher::find)
    .map(Matcher::group)
    .map(str ->LocalDateTime.parse(str, parser).atOffset(ZoneOffset.ofHours(-3)))
    .orElse(null);
System.out.println(odt); // 2008-10-17T11:35:04-03:00

The date obtained will be 2008-10-17T11:35:04-03:00.


* Usually regular expressions to check valid dates are much more complicated, since they need to check if the month has 28, 29, 30 or 31 days, if the year is leap, etc (in addition to being able to have other improvements, such as only accept names of valid months and so on).

But in this case, we are using regex only to extract a chunk that looks like a date (which is in the given format, and therefore has the potential to be a valid date), and then validating with DateTimeFormatter, then the regex can be kept simple as it is (because the parse will already check all details such as month name, valid values for each field, etc). If the date is invalid, the parse will launch a DateTimeParseException.


You can also change the regex to:

Pattern.compile("\\d{2} de [a-zA-Zç]+ de \\d{4} às \\d{2}:\\d{2}:\\d{2}");

Using às directly, instead of .+, because then you will only accept strings that contain exactly these characters (since .+ is more comprehensive as it accepts several occurrences of any character). Anyway, use the one that best fits your use cases.

If you choose to .+, I also suggest trading for .+?, in case of more than one date in the same String. Example:

Pattern DATE_TIME_PATTERN = Pattern.compile("\\d{2} de [a-zA-Zç]+ de \\d{4}.+\\d{2}:\\d{2}:\\d{2}");

String s = "17 de Outubro de 2008 às 11:35:04" +
    "  blablabla " +
    "10 de Outubro de 2018 às 10:35:04";

In this case, the entire string will be passed to parse, resulting in error. This happens because .+ is greedy and tries to grab as many characters as possible. To avoid this behavior, just switch .+ for .+?. The ? after the + cancels greed, causing only the first date to be picked up by regex and passed to parse.


PS: by default, DateTimeFormatter makes some adjustments to dates like April 31st (which is adjusted to April 30). If you want him to only accept valid dates, you can change the java.time.format.ResolverStyle for STRICT:

DateTimeFormatter parser = DateTimeFormatter
    .ofPattern("dd 'de' MMMM 'de' uuuu 'às' HH:mm:ss", new Locale("pt", "BR"))
    .withResolverStyle(ResolverStyle.STRICT);

In this way, the April 31 adjustment is no longer done, and instead a DateTimeParseException. For more details, see this answer.

If you want, you can also make a DateTimeFormatter case insensitive, if the month name is all lower case (or upper case). For this we use a java.time.format.DateTimeFormatterBuilder:

DateTimeFormatter parser = new DateTimeFormatterBuilder()
    .parseCaseInsensitive()
    .appendPattern("dd 'de' MMMM 'de' uuuu 'às' HH:mm:ss")
    .toFormatter(new Locale("pt", "BR"))
    .withResolverStyle(ResolverStyle.STRICT);

Thus, the name of the month can be either "October" or "October" or "OCTOBER".

Browser other questions tagged

You are not signed in. Login or sign up in order to post.