Get date from a string

Asked

Viewed 58 times

2

I am trying to create a regular expression that returns some dates contained in strings. The formats are:

  • dd/MM/yyyy
  • dd-MM-yyyy
  • dd/MM
  • dd-MM

Ex:

public static void main(String[] args) {

    final String text1 = "Foo bar foo bar foo bar 29/01/2021 foo bar foo bar";

    final String text2 = "Foo bar foo bar 29-01-2021 foo bar foo bar";

    final String text3 = "Foo bar foo bar 29/01 foo bar foo bar";

    final String text4 = "Foo bar foo bar 29-01 foo bar foo bar";

    // result 29/01/2021
    final String date1 = getDate(text1);

    // result 29-01-2021
    final String date2 = getDate(text2);

    // result 29/01
    final String date3 = getDate(text3);

    // result 29-01
    final String date4 = getDate(text4);

}

private static String getDate(final String text) {
    return "Magic";

}

1 answer

4


As explained above here, a regex can help to find something that if seem with a date, but it will still be important to validate it - read the link already indicated for more details, but in summary, a date has too complex rules to be validated by a regex (as the variable amount of days in a month, mainly for February in leap years).

Then you can use regex to get the snippet that contains a possible date, and then validate it with the specific tools (in this case, a date API). Something like that:

private static Pattern POSSIBLE_DATE_REGEX = Pattern.compile("\\b\\d{2}[-/]\\d{2}([-/]\\d{4})?\\b");
private static DateTimeFormatter DMY = DateTimeFormatter.ofPattern("[dd-MM[-uuuu]][dd/MM[/uuuu]]").withResolverStyle(ResolverStyle.STRICT);

static String getDate(String text) {
    Matcher matcher = POSSIBLE_DATE_REGEX.matcher(text);
    if (matcher.find()) { // se encontrou, valida a data
        String possivelData = matcher.group();
        try {
            DMY.parseBest(possivelData, LocalDate::from, MonthDay::from);
        } catch (DateTimeParseException e) {
            // data inválida (se quiser, imprima o erro: System.out.println("Erro: " + e.getMessage());
            return null;
        }
        return possivelData;
    }

    // se não encontrou, retorna null
    return null;
}

...
System.out.println(getDate(text1)); // 29/01/2021
System.out.println(getDate(text2)); // 29-01-2021
System.out.println(getDate(text3)); // 29/01
System.out.println(getDate(text4)); // 29-01

The idea is to take something that looks like a date: 2 digits (\d{2}), followed by a bar or hyphen ([-/]) and 2 more digits (and optionally, another bar or hyphen and 4 more digits - the ? after this passage makes it optional). Around the expression I put \b (explained in detail here) to ensure that there is no other digit before or after (so I avoid cases like 123456/789, because if I didn’t have the \b a regex would eventually catch the stretch "56/78").

But because this regex can also take things like "99/99/9999" (which is an invalid date), and on link already indicated is explained how it would be difficult to make a more accurate regex, I prefer to use the date API (in case, the java.time, available from Java 8) to verify that the date is valid.

For that I use Patterns optional (indicated by [ ]), then [dd-MM[-uuuu]][dd/MM[/uuuu]] says I can have "day-month" (with optional "-year" or "day/month" (with optional "/year").

Then, in the parseBest I pass the possibilities of objects that can be created (LocalDate and MonthDay, respectively classes that have day, month and year, or only day and month). If any of them can be built, it is because the date is valid. Otherwise an exception is made.

For cases where there is no date, or it is invalid, I am returning null.


Java <= 7

For Java <= 7, the java.time is not available, so you have to use SimpleDateFormat. The idea is similar:

private static List<SimpleDateFormat> FORMATS = Arrays.asList(createFormat("dd-MM"), createFormat("dd-MM-yyyy"), createFormat("dd/MM"), createFormat("dd/MM/yyyy"));

private static SimpleDateFormat createFormat(String format) {
    SimpleDateFormat sdf = new SimpleDateFormat(format);
    sdf.setLenient(false); // para validar corretamente as datas
    return sdf;
}

static String getDate(String text) {
    Matcher matcher = POSSIBLE_DATE_REGEX.matcher(text);
    if (matcher.find()) { // se encontrou, valida a data
        String possivelData = matcher.group();
        for (SimpleDateFormat sdf : FORMATS) {
            try {
                sdf.parse(possivelData);
                return possivelData;// se deu certo, não precisa testar os outros formatos
            } catch (ParseException e) {}
        }
        // se todos forem inválidos, é porque não encontrou
        return null;
    }

    // se não encontrou, retorna null
    return null;
}
  • 1

    Thank you! You helped me a lot!

  • 3

    Excellent response. Small addendum, when the Patterns or settings of Parsing become complex the class DateTimeFormatterBuilder can help build a DateTimeFormatter

  • @Anthonyaccioly Yes, I also made a version with Builder, but in this specific case I didn’t see much advantage: https://ideone.com/k69yHi - when there are only optional fields, the brackets already solve well, I usually use Builder in other cases :-)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.