Regex only in word without predecessor or successor of dots or bars

Question

Regex only in word without predecessor or successor of dots or bars

Asked 5 years, 2 months ago

Viewed 66 times

1

I would like to get the result of the occurrence without successor or predecessor of points or bar, I have my cases in the database:

Registro 1: VERSION01/VERSION01.5/VERSION01.5.5
Registro 2: VERSION01.5.5.5/VERSION02/VERSION02.5

When I was going to use regular expression to search VERSION01.5, only find in Record 1 the occurrence of it in VERSION01/VERSION01.5/VERSION01.5.5, would be possible?

1 answer

Browser other questions tagged python regex

You are not signed in. Login or sign up in order to post.

by hkotsubo • **55,826** points · Answer 1 · 2020-04-28T12:23:12+00:00

An alternative is:

(?<![\w.])VERSION01\.5(?![\w.])

The regex uses lookbehind and Lookahead negative, to check something nay exists before and after.

Both have the expression [\w.], which is a character class containing a \w (one shortcut which includes letters, numbers or the character _) and the character .. That is, this expression takes both a letter/number/_ as to the point.

This expression is inside the lookbehind and Lookahead negative, indicating that you should not have this before or after:

(?<![\w.]): lookbehind negative, check whether earlier nay has a point or \w
(?![\w.]): Lookahead negative, check whether afterwards nay has a point or \w

That is, regex checks that "VERSION01.5" does not have letters, numbers, _ or point, neither before nor after.

Another detail is that the point in "VERSION01.5" should be escaped with \, since the point has special meaning in regex (meaning "any character, except line breaks"). Interestingly, inside the brackets he does not need the escape, because there he is already interpreted as only the character ..

Anyway, I’d be like this:

textos =  [
  'Registro 1: VERSION01/VERSION01.5/VERSION01.5.5',
  'Registro 2: VERSION01.5.5.5/VERSION02/VERSION02.5',
  'Registro 3: VERSION01.5/VERSION02/VERSION02.5',
  'Registro 4: VERSION01.5.5.5/VERSION01.50/VERSION01.5'
]

import re

r = re.compile(r'(?<![\w.])VERSION01\.5(?![\w.])')
for texto in textos:
    if r.search(texto):
        print(f'encontrado: {texto}')

Exit:

encontrado: Registro 1: VERSION01/VERSION01.5/VERSION01.5.5
encontrado: Registro 3: VERSION01.5/VERSION02/VERSION02.5
encontrado: Registro 4: VERSION01.5.5.5/VERSION01.50/VERSION01.5

Although in this case you can also make a split:

r = re.compile(r'[ :/]')
for texto in textos:
    for s in r.split(texto):
        if s == 'VERSION01.5':
            print(f'encontrado: {texto}')
            break

The idea is to separate the text into parts, using as separator a regex [ :/] (a space, or :, or /). Then just check if any of the parts is the version you want - after all, split and match are only two sides of the same coin: in the split you say what you don’t want (in case, space, : or /) and in the match you say what you want ("VERSION01.5", without any letter, digit or dot before and after).

The output is the same as the previous code.