Element <Strong> returning empty - Htmlagilitypack

Asked

Viewed 218 times

1

I’m trying to get the contents of an Strong text from the submarine site. When I open the website with browser I can see in the code the content however, using Htmlagilitypack the content returns empty.

Example:

 

    HtmlNodeCollection produtos = document.DocumentNode.SelectNodes("//article[@data-component='single-product']");
    foreach (HtmlNode produto in produtos)
    {
        produto.SelectSingleNode(".//span[@class='sale price']/strong").InnerText.Trim();
    }

Request for the page

HtmlWeb WebGet = new HtmlWeb ();
HtmlDocument page = webGet.Load("http://busca.submarino.com.br/busca.php?q=eletrodomesticos+e+eletroportateis&page=1")

I need to send POST, add parameters in the load or another method to get the content?

1 answer

1


private void button1_Click(object sender, EventArgs e)
    {
        var uri = "http://busca.submarino.com.br/busca.php?q=eletrodomesticos+e+eletroportateis&page=1";
        webBrowserControl = new WebBrowser { ScriptErrorsSuppressed = true};

        //exemplo2
        webBrowserControl.ScriptErrorsSuppressed = true;
        webBrowserControl.Navigate(uri);

        waitTillLoad(this.webBrowserControl);

        var doc = new HtmlAgilityPack.HtmlDocument();
        var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowserControl.Document.DomDocument;
        StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
        doc.Load(sr);

        HtmlNodeCollection produtos = doc.DocumentNode.SelectNodes("//article[@data-component='single-product']");
        foreach (HtmlNode produto in produtos)
        {
            Debug.WriteLine("Preço: " + produto.SelectSingleNode(".//span[@class='sale price']/strong").InnerText.Trim());
        }
        Debug.Print("");
    }


    private void waitTillLoad(WebBrowser webBrControl)
    {
        WebBrowserReadyState loadStatus;
        int waittime = 100000;
        int counter = 0;
        while (true)
        {
            loadStatus = webBrControl.ReadyState;
            Application.DoEvents();
            if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
            {
                break;
            }
            counter++;
        }

        counter = 0;
        while (true)
        {
            loadStatus = webBrControl.ReadyState;
            Application.DoEvents();
            if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
            {
                break;
            }
            counter++;
        }
    }

Return:

Resultado

Really with your code the desired field is not returned, however with the above code it was possible to perform the desired operation.

  • Perfect. Thank you.

  • Rubens, have any idea how to do this via console application?

  • I believe it is not possible due to the following dependency: using System.Windows.Forms;

Browser other questions tagged

You are not signed in. Login or sign up in order to post.