While it is possible to do this through regular expressions, I do not believe it is the best way in any programming language.
(I don’t know why I answered the question thinking it was Python’s too - but most of the answer, except the exact code of the example, applies)
Month names will be much easier to check, check, and above all - "take the month number", to have an object date
real if you check these month names out of the regular expression.
Also if your application is ever going to work in a language other than English: there are frameworks for transforming programs into multi-language programs, and in general they depend on you placing all the strings of your program within a function call (often with a name intended to be almost transparent as _()
). This function then searches its string in the desired language in the translation database. If the months names are hardcoded within the regular expression, you would have to pass the entire regexp to the translation engine.
Of course, it would be possible to assemble a regular expression template, with the names of the months in external variables, and to join everything using string interpolation, before calling the regular expression function - this is one of the advantages of Python regular expressions being usable through normal function calls without having a special syntax.
But regular expression is hard enough to read and keep in code. Switchable regular expressions in Runtime would be even more complicated to read.
My tip, as in the first paragraph, would be to use the regular expression to get the groups with day, month and year, and then a quieter mechanism, with dictionaries and if’s to extract the "real month".
And take this opportunity, to validate days of the month, year, and etc...also outside the context of regular expression. I’ll put an example in Python, which is a great pseudo-code for C++ - but you’ll get an idea of the problem:
So instead of:
def validate_date(text):
if re.search(super_complicated_auto_validating_regexp, text):
return True
return False
It is possible to write something like:
short_months = {"jan": 1, "fev": 2,...,"dez": 12}
def days_per_month(month, year):
data = {1: 31, 2: 28, 3: 31, 4:30, ...}
if month == 2 and year % 4 == 0 and (not year % 100 == 0 or year % 400 == 0):
return 29
return data[month]
def parse_date(text):
match = re.search(r"(\d{1,2})/(.{1,3})/(\d{2,4})", text)
if not match:
raise ValueError("Invalid date format")
day, month, year = [match.group[i] for i in (1,2,3)]
day = int(day.lstrip("0"))
if not month.isdigit():
month = short_months[month.lower()]
month = int(month.lstrip("0"))
year = int(year):
if year < 50: # assume 2 digit years < 50 are in XXI
year += 2000
elif year <= 99:
year += 1900
if day > days_per_month(month, year):
raise ValueError(f"Invalid day {day} for month {month}")
result = datetime.date(year=year, month=month, day=day)
Note that more or less 20 lines of programmatic code are needed to perform the Parsing and the validation of the date. With the regular expression approach you have, you want to compress all the logic of these 20 lines into a single 'line', which is actually a mini-program in a language that is not maintenance friendly.
That being said, the most normal way to perform "real" date validation and Parsing in the various crazy formats that users can type, or be in files, is to use a specialized library for this. In it, several people, for hundreds of hours, have already given a thought on how to make the thing friendlier and more proof of error - you would have to duplicate this work in your code (with chances of doing wrong - see the subtlety to correctly calculate leap years - that even microsoft made mistakes in early versions of Excel, for example)
In Python, we have the excellent dateparser, that allows you to simply:
>>> import dateparser
>>> dateparser.parse("25/fev/2018", languages=["pt"])
datetime.datetime(2018, 2, 25, 0, 0)
It allows many other date formats than /, including full written dates in more than 20 languages - and is not prone to errors because of "corner cases".
In C++ I would search for date add-on modules from some framework that you might already be using to provide more functionality to the language - there must be "natural date parsers" using Qt or Boost, for example.
Boost has some interesting libraries, I suggest using them for validation: Format Date Parser. Because Regex would need something more complex to validate for example:
29/02/2018
or29/02/2020
. Where the first is invalid and the second is valid.– danieltakeshi
The comments made in the replies of your other question apply here, as Jsbueno himself pointed out
– Jefferson Quesado