Get information from a div in C#

Asked

Viewed 1,153 times

4

Well, I have a code that reads the page but I need the following:

<a href="/t848p15-teste">2</a>

The idea of the code is to look for a tag <a> that has this 2 and return the link. In case, it would return: /t848p15-teste.

The code I have to read is this:

WebRequest request = WebRequest.Create("site_aqui");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.ASCII);
string Texto = reader.ReadToEnd();

1 answer

2

You can use the Html Agility Pack, which is a library to handle HTML.

Nuget: Install-Package Htmlagilitypack

You must load the HTML text into the class that handles it, and then you can use XPATH to search for the desired element.

var doc = new HtmlDocument();
doc.LoadHtml(Texto);
var links = doc.DocumentNode.SelectNodes("//a[contains(text(),'2')][@href]");

if (links != null)
{
    var primeiroLinkAchando = links.FirstOrDefault();

    if (primeiroLinkAchando != null)
    {
        var href = primeiroLinkAchando.Attributes["href"].Value;
        // agora você pode fazer o que quizer com o href
    }

    foreach (HtmlNode link in links)
    {
        var href = link.Attributes["href"].Value;
        // agora você pode fazer o que quizer com o href
    }
}
  • It had given an error but it was saying a ". Attributes" in "link". How do I get this only in the first result? He’s picking up on all the results.

  • In case you can simply ignore all other results from the list using the LINQ method FirstOrDefault for example.

  • I got the answer... I really needed the Attributes =) it was bad!

  • Can you explain more about this LINQ method? I wanted to get ONLY the '2'.

  • LINQ is a set of methods for working with lists. FirstOrDefault takes the first element of a list and returns it, or null, if the list is empty.

  • But I think what you want is not about LINQ... to get the element text, "2", you can use the property link.InnerText.

  • I tried to use Innertext, but the accents are all buggered with "??", look too much on the internet to try to solve and look like there’s no way. xD

  • So I looked for an alternative way to achieve that.

  • Here I did not have that accent problem. Try setar Htmldocument encoding: doc.OptionDefaultStreamEncoding = Encoding.UTF8; and see if it resolves.

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.