C# - How to make a simple Web Scraping

Asked

Viewed 3,475 times

0

I want to read information from an HTML page of a online radio. I have tried to read using Htmlagilitypack, but without success because the page in question I am working does not use Elementid, I imagine it is not a problem, however I do not know use the API and the examples I found needed to use the method Getelementbyid().

I need to receive two information from this page (Playing Now, Playing Next), and assign them to their respective variable. Preferably using native C#functions, however I have no problem using some API like Htmlagilitypack(Especially if the procedure is simplified).

Here is a print of the page I wish to realize Web Scraping. Print da pagina para extrair as informações

The code I currently have is like this:

namespace Web_Scraping{
class SimplesWebScraping{
    void Main()
    {
        //Realiza o download da página em String
        var webClient = new WebClient();
        string pagina = webClient.DownloadString("http://hts01.painelstream.net:9074/index.html?sid=1");

        //Declara variáveis do tipo string para armazenar os dados/conteúdos extraidos no website.
        string playingNow = string.Empty;
        string playingNext = string.Empty;

        //Realiza o Web_Scraping
        //Como fazer isso?

        //Escreve os dados extraídos
        Console.Write("Reproduzindo Agora: " + playingNow);
        Console.Write("Reprodução Seguinte: " + playingNext);
    }
}
}

What is the best way to perform this procedure, can anyone show me example codes for this action? I tell you already Thanks for the help!

  • 1

    The question was better answered in another topic. https://answall.com/questions/302606/comort-coletar-dados-de-uma-pagina-web

1 answer

0


Expressoes Regulares

This is still the best way to handle string like you’re looking for.

With Regex - Regular Expressions - vc create a Pattern and conflicts this against a text. And all pouch that exist, will be returned.

Example:

Pattern ([A-Z])\w+: Sequence that starts with characters from A to Z - uppercase - and then followed by letters /w at least one occurrence +.

If you run this Pattern against the text below:

Well coming to Stack Overflow!

The return will be: [ "Bem", "Stack", "Overflow" ], because it’s the sequences that start with uppercase letters, are followed by more letters - and only letters.

This is the Regexr Tester I recommend: https://regexr.com/ Very good for testing and learning Regexpr.

  • Thanks for the tip @Thiago Lunardi.

  • For nothing. Don’t forget to mark the answer as ideal. :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.