5
I have an HTML that I need to remove values from a set of <li>
.
This is the HTML part:
<ul id="minhas-tags">
<li><em>Tagged: </em></li>
<li><a href="/tags/tag1">tag1</a>, </li>
<li><a href="/tags/tag2">tag2</a>, </li>
<li><a href="/tags/tag3">tag3</a>, </li>
<li><a href="/tags/tag4">tag4</a>, </li>
I want to get the contents of <li>
as tag1, tag2, etc..
After much reading here I arrived in that regular expression:
tags/[a-zA-Z]+">[a-zA-Z]+<+
This can isolate the HTML I want from everything else, but I don’t know how to transform this expression so that it finds the values and returns only the content of <li>
.
This expression returns me for example: /tags/tag1">tag1<
, and I want only tag1
.
How would I do that? And would you explain to me how the suggested expression would work as a solution, please?
Updating
Sorry, I didn’t put the language, I’m using C#, my routine goes like this:
public string retorna_Tags_HTML(string html)
{
Regex ER = new Regex(@"tags?([\w]+)<\/a>", RegexOptions.None);
Match m = ER.Match(html);
}
What is the language? It may be possible to use a parser, try to use this regex
/tags?([\w]+)<\/a>/g
.– stderr
Language is c#, this link you sent also returns the </a>. I put more information in the question.
– Ricardo