Editing to include a Disclaimer: obviously at some point in your process is made a thing called scraping on the recipe page. As Maniero said in his reply and comment, this is not very reliable. My (incomplete) solution below searches for a CPF or CNPJ in any text, which may or may not contain HTML together. It is only because of this consideration that I answered in the form below. In general, who does Parsing HTML or you don’t know what you’re doing, or you’re desperate #Readyma.
If all you want is to extract a CNPJ, a regular expression can work. Just note that the expression will help because you will not treat HTML, but just extract a number from the text.
The expression you’re looking for is something like:
[0-9]+\.[0-9]+\.[0-9]\\[0-9]+-[0-9]+
And to those who understand REGEX: yes, I know my expression is somewhat lazy. I give a positive vote to everyone who posts an answer with a more precise expression.
Explanation:
- Each block
[0-9]
means "a numeric character here";
- The
+
means that the character to the left of the +
must occur at least once, but may occur multiple times. A more correct and efficient way to capture a CPF or CNPJ would be to repeat the numeric block, type [0-9][0-9][0-9]
. I leave it up to you to do this;
- The backslash serves to escape certain characters that have special meanings, so that their literal values will be used (in this case,
.
and the bar itself).
Note that since there are inverted bars in the expression, you should also escape them when putting this in a string - or place an arroba in front of the string. You can use a code similar to the one below:
string input; // isso deve conter o seu texto de entrada
Regex foo = new Regex(@"[0-9]+\.[0-9]+\.[0-9]\\[0-9]+-[0-9]+");
Match m = foo.Match(input);
if (m.Success) {
string resultado = m.Groups[0]; // Suponho um único CNPJ por entrada.
}
Good luck!
The best I don’t know but it’s common for people to use some external library like the Htmlagilitypack to parse and deliver everything separately to us reliably, then it’s easy to search the elements. Any attempt to reinvent the wheel can produce some result but it takes work and will hardly be reliable and especially future-proof. Nor am I saying that these libraries are fail-safe but it’s an improvement. Otherwise it will be complicated, laborious and unreliable.
– Maniero