Regex for positive, negative, sum and subtraction number tokens

Asked

Viewed 237 times

4

I have to do a lexical parser in python using the PLY. Use REGEX to pick up tokens:

Example:

t_FLUTUANTE = r'flutuante'

The expression above gives a match with the word flutuante and return me a token (how ply returns tokens per hour will not be necessary to explain how it works).

The Problem:
My difficulty is in getting the tokens of negative numbers, positive sum and subtraction.

My test entry is:

a := -1
y := +2
b := 2
c := 3+4
z := 20 + 42
funcao(-1)
funcao( -1)
funcao(a, -1)

If my regex is:

[+-]?\d+

The pouch evening:

Primeira tentativa de expressão regular!

But note that in the variable c, got +4, and that can’t happen because in reality it was supposed to be a sum.

Well if I modify a little I get a better result.

((\D)[+-]\d+)|\d+

Your exit is:

Tentativa Numero 2!

Actually a better result. But there are some spaces between the functions and the variable a, including the regex took on one of the functions the match (-1.

How To Get It Right?

I’m using the site https://www.regextester.com/ to test my expressions, I am disabling multiline (m) on the website because in Ply I could not activate it.

1 answer

4


The problem is that the shortcut \D corresponds to a character (any character that is not \d) and therefore this character will also be part of the match. So he picks up the character before the - (in the case of space or ().

An alternative to this not occurring is to use lookbehind:

((?<=\D)[+-]\d+)|\d+

The syntax (?<= defines a lookbehind, that serves to verify if something exists before the current position. The difference is that the lookbehind Just look for something, but its content is not part of the match (this is called zero-width Matches, or assertions).

That is, the passage (?<=\D) only checks if before the signal ([+-]) there is a character that corresponds to \D, but this character will not be part of the match. Therefore, the regex will no longer pick up the character it has before the sign.

See working on regex101.com.

  • It worked perfectly, thank you very much, I will study more this Lookbehind because I did not understand very well. But thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.