Correct regex in C++ returning 0

Asked

Viewed 97 times

-1

Expensive,

I created a regular expression for MIPS instructions. It was a lot of work, but I got it. At least regex. works, but C++ doesn’t. Here’s the code with the expression some string examples with instructions that should be accepted:

Code:

bool regexOk(string str){
    regex express{"[a-z]{2,4}[\s](([\$][a|f|g|k|r|v|s|t][a|t|p|\d]\,\s[\d]*\([\$][(a|f|g|k|r|v|s|t)][a|t|p|[\d]\)|(\$szero\,\s[\d]*\([\$][(a|f|g|k|r|v|s|t)][a|t|p|[\d]\)|(-?[\d]*\,\s[\d]*\([\$][(a|f|g|k|r|v|s|t)][a|t|p|[\d]\))))|([\$][a|f|g|k|r|v|s|t][a|t|p|\d]\,\s|(\$szero\,\s|(-?[\d]*\,\s)))([\$][a|f|g|k|r|v|s|t][a|t|p|\d]\,\s|(\$szero\,\s|(-?[\d]*\,\s)))([\$][a|f|g|k|r|v|s|t][a|t|p|\d]|(\$szero|(-?[\d]*))))"};
    return regex_match(str, express);
}

Strings:

lw $t0, 0($t7)
srl $t0, $t0, 1
addi $t1, $t7, 28
sll $t0, $t0, 2
add $t1, $t1, $t0
lw $t1, 0($t1)
addi $t1, $t1, 1
lw $t0, 0($t7)
sll $t0, $t0, 2 
addi $t2, $t7, 28
add $t2, $t2, $t0
sw $t1, 0($t2)
lw $t0, 0($t7)
addi $t0, $t0, 1
sll $t0, $t0, 2
addi $t1, $t7, 28
add $t1, $t1, $t0
addi $t2, $szero, -1
sw $t2, 0($t1)

Yes, the expression became immense and difficult to read, but in the regex. works (rs), so I figured C++ would work too. It’s not perfect, but it serves the current purpose. Basically, she searches for the following groups:

  • instruction registrar, address

or

  • register, register, register, register, register

These registers can also be a constant ($szero) or an integer value.

How does C++ regex work? Is there something wrong with the expression? If not, what would be the reason for the error?

  • Hello I think the problem is in the bars, the regex of C++ should be identifying how to escape see this example here on Sopt(mainly read the comments of the answer accepted), and this in cppreference repair the use of bars.

  • @Diegoferreira, I had seen this answer in my research, but I don’t think that’s the case, because the bars I’m using are all inverted intentionally for the escape or for the metacharacters of regex ( s for white space, etc.). It seems that in the answer you suggested, the error was in the use of /, rather than .

  • i rewrote your code by making that each time it found its regular expression, it circled with [ ], the output was [ lw $t0, 0($T7) ] for each line of the string, that’s about what you want ?

1 answer

2


Regarding your regular expression you should use \\ instead of \, regarding the code the error is in the use of std::regex_match, according to the documentation found here, it only searches for complete matches see the case below extracted from the site I mentioned in the section Notes:

Like the std::regex_match considers only complete matches, the same regular expression can provide different matches between std::regex_match and std::regex_search:

std::regex re("Get|GetValue");
std::cmatch m;
std::regex_search("GetValue", m, re);  // retorna true, e m[0] contém "Get"
std::regex_match ("GetValue", m, re);  // retorna true, e m[0] contém "GetValue"
std::regex_search("GetValues", m, re); // retorna true, e m[0] contém "Get"
std::regex_match ("GetValues", m, re); // retorna false

Here is an example where the use of regex is made:

#include <iostream>
#include <regex>
#include <string>
#include <iomanip>

using namespace std;

bool regexOk( const string&, const regex& );
void print_regex( const string&, const regex& );

int main()
{

    string str = "lw $t0, 0($t7)\nsrl $t0, $t0, 1\naddi $t1, $t7, 28\nsll $t0, $t0, 2\nadd $t1, $t1, $t0\nlw $t1, 0($t1)\naddi $t1, $t1, 1\nlw $t0, 0($t7)\nsll $t0, $t0, 2\naddi $t2, $t7, 28\nadd $t2, $t2, $t0\nsw $t1, 0($t2)\nlw $t0, 0($t7)\naddi $t0, $t0, 1\nsll $t0, $t0, 2\naddi $t1, $t7, 28\nadd $t1, $t1, $t0\naddi $t2, $szero, -1\nsw $t2, 0($t1)";

    regex expression( "[a-z]{2,4}[\\s](([\\$][a|f|g|k|r|v|s|t][a|t|p|\\d]\\,\\s[\\d]*\\([\\$][(a|f|g|k|r|v|s|t)][a|t|p|[\\d]\\)|(\\$szero\\,\\s[\\d]*\\([\\$][(a|f|g|k|r|v|s|t)][a|t|p|[\\d]\\)|(-?[\\d]*\\,\\s[\\d]*\\([\\$][(a|f|g|k|r|v|s|t)][a|t|p|[\\d]\\))))|([\\$][a|f|g|k|r|v|s|t][a|t|p|\\d]\\,\\s|(\\$szero\\,\\s|(-?[\\d]*\\,\\s)))([\\$][a|f|g|k|r|v|s|t][a|t|p|\\d]\\,\\s|(\\$szero\\,\\s|(-?[\\d]*\\,\\s)))([\\$][a|f|g|k|r|v|s|t][a|t|p|\\d]|(\\$szero|(-?[\\d]*))))" );

    cout << boolalpha << regexOk( str, expression ) << endl;
    print_regex( str, expression );
}

bool regexOk( const string& str, const regex& expression )
{
    if( regex_search( str, expression ) ){ return true; }
    return false;
}

void print_regex( const string& str, const regex& expression )
{    
    string new_str = regex_replace( str, expression, "[$&]");
    cout << new_str << endl;
}
  • now yes, I understood what you wanted to say the first time. I thought all it took was an inverted bar to understand how to escape. Thank you very much, my dear! Now.

  • And also this difference between regex_match and regex_search, which you didn’t know. vlws even!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.