Error validating with Regex

Asked

Viewed 200 times

2

I am trying to format this text with regex but do not have the desired return:

Of:

"ST STN, SET J, ? SHOPS T-40 /T41, - TER-REO, SHOPPING & BOULEVARD KM 28.5 VALUE 450.00 CENTRAL."

To:

"ST STN SET J STORES T40 T41 TERREO SHOPPING AND BOULEVARD KM 28.5 450.00 CENTRAL"

My code:

String padrao = @"(?i)(,|.)?[^A-Za-z0-9]\s"; String padrao =
@"(?i)[^0-9a-z]\s]";
 
Regex rg = new Regex(_texto, " ");
 
var arrayTexto =
resultado.Normalize(NormalizationForm.Formd).toCharArray();
foreach(char letter in arrayTexto) { if
(CharUnicodeInfo.GetUnicodeCategory(letter) !=
UnicodeCategory.NonSpacingMark) sb.Append(letter); }

What’s wrong with it?

  • you want to remove *, -?

  • Apparently that’s right, um replace will solve.

  • @Ishmael well thought out.

  • That! , remove special characters, point, ecomercial and wherever KM is and keep the comma and point values.

  • @Adrianosuv already tried with Replace?

  • Replace will be very complex because I have to eliminate all special characters and accents and preserve the point and comma where values and kiosk.

  • If I can get the Dot and preserve the Comma of this address AV D.R MAURO L MONTEIRO KM 28,5 com Regex the rest I can make a Replace native to the csharp.

Show 2 more comments

2 answers

0

I advise you to use the replace as cited by the user @Marconi

I find it valid to remember that one thing makes your case very difficult, you want to eliminate several special characters (-,*/&) and leave some at specific points like in the digits after KM, this makes it very difficult to create a general logic that will solve your problem.

But if you want to continue using regex, you can use several OR’s (|)for cases, leaving the rarest at the beginning.

(\d*\.\d*?|\d*,\d|\w*|\s)

The above regex will capture everything you want, it first checks the case of the sequence being digits followed by . with digits after to satisfy the case (450.00) then checks whether the sequence is digit followed by , digit then checks the cases of nonspecial characters being lower or upper case and then checks if these characters are spaces.

0

Perhaps this regex can serve you

string pattern = @"(?i)(,|\.)?[^a-z0-9]\s|(\/|\-)";

Using Regex.Replace() you can remove the special characters.

private static string PreprocessingText(string input)
{
    string pattern = @"(?i)(,|\.)?[^a-z0-9]\s|(\/|\-)";
    return Regex.Replace(input, pattern, " ");
}

See working on .Netfiddle

Browser other questions tagged

You are not signed in. Login or sign up in order to post.