Read Html from a web page by running an application in wpf c#

Asked

Viewed 380 times

1

Good night,

I have a task to be able to create a desktop application to read the HTML of a page. With some searches I managed to create a method that reads the HTML and returns me a string, follows the code:

string strSiteUrl = "URL";

request = (HttpWebRequest)WebRequest.Create(strSiteUrl);
response = (HttpWebResponse)request.GetResponse();
stream = response.GetResponseStream();
streamReader = new StreamReader(stream);
pereira.txbDescricaoPagina.Text = streamReader.ReadToEnd();

My problem is that the page I need to read is dynamically loaded via javascript, that is, HTML is dynamically injected into the page and my method cannot see HTML.

Any help is welcome.

  • 1

    you have to add a Webbrowser, and wait for the page to load

  • Hi Rovann, could you explain to me in more detail how I would use Webbroser within a descktop application?

  • 1

    @Joaomartins answered, that’s basically it... only after Documentcompleted, it is necessary to wait for javascript to load everything... in this case, a timer and a content check to see if it is ready

1 answer

1


Add to your Form a control of the type WebBrowser (or create in code), where later, in code, you must invoke the method Navigate to navigate to the URL you want:

WebBrowser webBrowser1 = new WebBrowser();

webBrowser1.DocumentCompleted += WebBrowser1_DocumentCompleted;
webBrowser1.Navigate(@"http:\\www.google.com");

Finally, in the event DocumentCompleted take what you need:

private void WebBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    WebBrowser wb = sender as WebBrowser;

    // obtém todo o HTML do corpo para uma "string"
    string body = wb.Document.Body.InnerHtml;

    // percorre todos os elementos HTML no documento
    foreach (HtmlElement elemento in wb.Document.All)
    {
        // ...
    }
}
  • 1

    Don’t forget to also give an UP in the reply :)

  • This is João Martins, I performed a test here as suggested and it worked, he waited for javascript to finish injecting the html and then read it. But it happens my case requires a little more attention, the method suggested above waited the whole page loading, but javascript not only injected HTML but also other script calls that I put above did not wait for these methods. I will read more about Webbrowser to learn more about this object but if you have a suggestion about this second question and want to comment I would be very grateful. Valeu.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.