I need to find regex to parse a variable

Asked

Viewed 120 times

-1

I need to analyze a C code using regex and I’m having trouble knowing if a variable is receiving a float or integer value.

Ex:

valor_01 = ( 5 * 3 ) / 2.5 + ( 4 % 3 ) ^ 4 ;
  1. valor_01 can be any variable name, something like a \w+
  2. I need to capture if after the = and before the ; has some decimal value (2.5 in this example)
  3. until now I have reached the following expression:
(\w+\s?\=\s?).+

Problem: with this expression I get the variable name, the same and everything you have on the line and I can’t just know if you have a (\d+.\d+) and a ; at the end of the line.

It’s like I need to capture

valor_01 = 2.5 ;

  • Although it is possible, I do not know if regex is the best solution, because in addition to checking the number, she would have to check the context. For example, if the number is inside a string: valor = "abc 2.5";, should not accept, since the variable is receiving a string and not a float. And if you have any variable float in the expression? ex: valor = 1 / x;, whereas x is a float variable that was created earlier. And you can also do valor = .5; or valor = 1e-2; (scientific notation), which are also floats. Anyway, there are too many variations and maybe it’s not worth using regex

  • Maybe a lexical analyzer is more suitable (I’ve never used this, but I’d start looking for something like this). Another tip: if you’re really using regex, \w+ accepts things like 123ab, that is not a valid name for variables, then change to [a-zA-Z_]\w+ (or, [a-zA-Z_]\w{0,n} to limit the amount of characters in n + 1 (although it does consider _ a valid name, and I think it is, although strange).

  • Another complicated case: valor = func(2.5) - the function func may even receive a float, but what if she returns a int (or anything else)? In addition to the regex being complicated by itself (to deal with the cases I’ve already commented above), it would still have to analyze this kind of thing, and in the background you would be writing a mini-compiler of C in Javascript (what it is is already complicated by itself, if it is based on regex for that then...)

  • What if the line is inside a comment? In this case, it should ignore, because the variable is not receiving any value (after all, it is commented). Just remembering that it’s not enough to check if you have // on the line, as C also has multi-line comments. Anyway, they are too many cases to analyze and the regex would be absurdly complex...

1 answer

0

I don’t see the need to validate the variable name. Variable names cannot contain points, if there is a point in any snippet of the string, it means there is a float.

Anyway, if you want to validate the whole expression:

\w+ letters or numbers or underscores

\s* whether or not followed by spaces

= followed by an equal

[\d\+\-\*\/\s]* whether or not followed by numbers, algebraic expressions and spaces

\.\d followed by a point followed by a number

[\d\+\-\*\/\s]* whether or not followed by numbers, algebraic expressions and spaces

; followed by a point and a comma

That is to say \w+\s*=[\d\+\-\*\/\s]*\.\d[\d\+\-\*\/\s]*;

  • The problem is that it validates totally invalid things, like 123 = +/- + .1-**- +; - and not accept other cases I mention in the above comments. Although this line does not compile, it could be within the comments, for example (and somehow, it was not mentioned whether the code being analyzed compiles or not, so every care is little). Behold here a few examples of this working regex.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.