Get data from a website to use in C#

Asked

Viewed 233 times

0

Good night,

I’m a beginner, and I’m developing a C# program in Visual Studio for my TCC, this chunk of code I’m having trouble with will get the data from a table of a website and after storing it in some component, maybe a Datagridview. The problem is that I’m not getting the data, I’m using the HTML Agility Pack, but I don’t understand almost anything about HTML. I’ve seen some tutorials, documentation and nothing... The code below until it runs, but returns empty, as if there was nothing in that section of the site. Is there any peculiarity in this site that prevents this data extraction?

I inspected by the browser to get the information to use in the code (I didn’t put the html code here because I couldn’t find an option to copy the entire code in the browser, but attached an image).

The data I need are 3 integers referring to the volumes of rain and the name of the city (optional). I don’t need the code ready, but any hint you can give me will be of great help. Thanks in advance for your attention.

public class HtmlProcessar
{
    public string ProcessHtmlCode()
    {
        //Carrega o conteúdo da URL do Site
        const string html = "http://www.funceme.br/app-calendario/dia/municipios/media/1974/1/1";
        HtmlWeb web = new HtmlWeb();
        var htmlDoc = web.Load(html);

        var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");

        var node = htmlBody.Element("app-root");
        string saida = node.InnerHtml;
        return saida;
    }

}

Código HTML do Site da Funceme - Link está no código acima

  • 2

    there is an API for this site, much more reliable this way: http://api.funceme.br/

2 answers

1

Hello, good afternoon!

As Marcus said, the page is generated through Javascript. You will have to use a tool like Selenium for example.

I did a test here at xPath:

/html/body/app-root/app-root/mat-sidenav-container/mat-sidenav-content/div/div/app-calendario-dia/mat-card/mat-card-content/div/div[2]/app-calendario-tabela/table/tbody/tr[1]/td[1]

and returned given.

A hug!

0


In the html code of the site note that you have <app-root></app-root> and it is precisely where the general content of the site is. It seems that the site page is generated through javascript/wasm to then display. In this case the right is to use API, and if it is not yet possible, you will have to run a browser Offscreen instance to open the site and render the html, in which case I would recommend Cefsharp

Browser other questions tagged

You are not signed in. Login or sign up in order to post.