Capture year outside the regex

Asked

Viewed 185 times

4

I am working with a text file using the sublime I want to replace some strings where:
I have several strings like this:

EMISSAO="2016-04-18 00:00:00"

I need a regex that captures where the year is invalid, e.g.: In some records it’s like this:

EMISSAO="65321-04-18 00:00:00"

That is, the number 65321 represents the year and is a year that even exists (invalid), need to see where the years are above 2017 to correct.

  • 1

    What you consider a "valid year"?

  • From 2017 down.

  • 1

    But 6532 is a valid year

  • Corrected in question.

  • but what language are you using? only in regex is difficult...

  • the problem then is that a few years are with more characters? Type 65321, would be to identify only this or know if the year is 2017 down?

  • What would be 65321?

  • Corrected the question again.

  • In what language? To specify regex..

  • It’s a text file and I’m trying to get it from the find sublime

  • Specify it in the question, it gets better.

  • 1945 - 2017: / (194[5-9]|19[5-9] d|200 d|201[0-7])$/

  • I believe this can help you: ]http://answall.com/questions/170218/erro-ao-passar-o-script-de-uma-tabela-do-banco-firebird-para-postgresql

Show 8 more comments

7 answers

5

As it is something punctual to look for in the sublime I believe that the problem is only to find the part of the year valid or invalid , so the part after the year would be the date that for what spoke will come correct.

With this to check only the valid year can use this regex:

^EMISSAO="(19\d{2})|20(0[0-9]|1[0-7])-/d{2}-\d{2}\s(\d{2}:){2}\d{2})"$

Explaining:

(19 d{2})|20(0[0-9]|1[0-7]) - Home years from 19__ until 2000 until 2017

\d{2}- d{2} s - Here are the subsequent parts of the date : month and day. Not validated yet because apparently only the wrong year comes. And instilling any space using the \s

(\d{2}:){2} d{2})" - In this part would be the structure of the time, where comes two sets of elements composed of 2 numbers followed by : and then the last 2 numbers remaining.

I tested here with these cases :

EMISSAO="2017-04-18 00:00:00" // passa
EMISSAO="1990-04-18 00:00:00" // passa
EMISSAO="2016-04-18 00:00:00" // passa
EMISSAO="2015-04-18 00:00:00" // passa
EMISSAO="2017-04-18 00:00:00" // passa
EMISSAO="2018-04-18 00:00:00" // não passa
EMISSAO="22000 00:00:00" // não passa
EMISSAO="2016-04-18 00:00:00" // passa
EMISSAO="65321-04-18 00:00:00" // não passa
EMISSAO="5069 00:00:00" //não passa
EMISSAO="2018 00:00:00" //não passa
EMISSAO="2019 00:00:00" //não passa
EMISSAO="2020 00:00:00" //não passa
  • 1

    test EMISSAO="2001-04-18 00:00:00"

  • That one didn’t work?

  • Now it’s probably working..

3

Using the Reply from @Marlysson.

And adjusting for logic you need :

The number 65321 represents the year and is a year that even exists (invalid), need to see where the years are above 2017 to correct.

In other words, you want to matchs invalid and not valid.

Adjusting REGEX to look like this :

^EMISSAO="(?(?!(19\d{2}|20(0\d|1[0-7]))-\d{2}-\d{2}).*|)$

  • Note that I am disregarding what comes after the date, in this case the part where @Marlysson did \s(\d{2}:){2}\d{2}" to verify the entire sentence.

Logica

  • The logic used is inversion, I mean I have to know the matchs valid to then not capture them. For this I used the (?!...)
  • To perform the capture action of what is not valid I used the logic of Ternary in REGEX (?(?{option}REGEX)then|else).

Explanation

  • ^EMISSAO=" - literal sentence, from the beginning
  • (?!(19\d{2}|20(0\d|1[0-7]))-\d{2}-\d{2}) - If matching this sentence is "false".
  • (?Boolean.*|) - When it is true it captures everything, when it is false it captures nothing.
  • $ - End of capture.

Be it in the REGEX101

3

If an invalid year starts from 3000 you can use the following regex [3-9]\d{3,}. She marries a number that starts between 3 or 9 following from which she wants other digits at least three times.

"414-10-12 17:04:29" //não casa
"6014-10-12 17:04:29" //casa
"8014-10-12 17:04:29" //casa
"85014-10-12 17:04:29" //casa
"2019" //não casa
"3000" //casa
  • He wants the years in the future of the current year, these are invalid.. 2018 is invalid but does not marry his regex

3

This below is not the most optimized way for your situation as there are some possibilities to be explored that you did not raise in your question, however below follows a possible Regex for your situation:

/(1[0-9]{3}|20(0[0-9]|1[0-7]))-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-9]|3[0-1])\s+00:00:00/g

3

Mark with the prefix "####" the odd years, and then edit for manual correction:

perl -i.bak -pe 's/(?<=EMISSAO=")(\d{4,})/$1 < 2018 ? $1 : "###$1"/ge' ex.xml
sublime ex.xml

2

Test the following regular expression to see if a date is valid:

EMISSAO=\"([0-9]{4,}(?<=0*2(0(1[8-9]|[2-9][0-9])|[1-9][0-9]{2})|[3-9][0-9]{3}))-(0?[1-9]|1[0-2])-(0?[1-9]|[1-2][0-9]|3[0-1]) (0?[0-9]|1[0-9]|2[0-3]):(0?[0-9]|[1-5][0-9]):(0?[0-9]|[1-5][0-9])\"

([0-9]{4,}(?<=0*2(0(1[8-9]|[2-9][0-9])|[1-9][0-9]{2})|[3-9][0-9]{3})) => Any year that is not between year 0 and 2017

(0?[1-9]|1[0-2]) => Any month from 1 to 12

(0?[1-9]|[1-2][0-9]|3[0-1]) => Any day up to 31

(0?[0-9]|1[0-9]|2[0-3]) => Any time from 0 to 23

(0?[0-9]|[1-5][0-9]) => Any minute from 0 to 59

(0?[0-9]|[1-5][0-9]) => Any second from 0 to 59

0

I need to see where the years are above 2017

Regular expression

\b0*(?:(?:[12]\d|[3-9])\d{3,}|2(?:0(?:1[89]|[2-9]\d)|[1-9]\d{2}))(?=-\d{2}-\d{2} \d{2}:\d{2}:\d{2})


Test here: https://regex101.com/r/uejZaR/1

Testes em regex101.com

Browser other questions tagged

You are not signed in. Login or sign up in order to post.