Transform each site’s Node into an array element

Asked

Viewed 30 times

0

I want to turn every headline on this site into an element of an array. I’ve tried several ways but none of them work, so if you could help me, I’d be grateful.

I am using Htmlagilitypack

using System;
using System.Net;
using System.IO;
using HtmlAgilityPack;

namespace Teste1
{
    class Program
    {
        static void Main(string[] args)
        {
            //pega o html do site
            WebRequest request = WebRequest.Create("https://omunicipio.com.br/noticias/");
            WebResponse response = request.GetResponse();
            StreamReader reader = new StreamReader(response.GetResponseStream());
            string Texto = reader.ReadToEnd();

            // 11 pq sao 10 materias por página do site, e i precisa ser 1
            for (int i = 1; i < 11; i++)
            {
                //XPath do título
              string nodes = "/html/body/div[6]/div[3]/div/div[2]/div[1]/div/div[" + i + "]/div[2]/h3/a";                
            
                //selecionando node
              var doc = new HtmlDocument();
              doc.LoadHtml(Texto);
              var links = doc.DocumentNode.SelectNodes(nodes);

                //utilizando htmlnode pra pegar o valor do title q é o título da matéria
                foreach (HtmlNode node in links)
                {
                var manchete = node.Attributes["title"].Value;
                
                Console.WriteLine(manchete);
                }
               //aqui eu empaquei.                  
            }
        }
    }
}

From this point I can’t access the var headline and I don’t know how to proceed to take the values from there as elements of a string[] array. Nor do I mount the array inside the foreach I’m achieving, and yet I mount, outside the foreach it says it "doesn’t exist in the current context".

It could also be a List<> instead of an array, because the next step is to move to the next page and keep adding headlines, but the problem is the same.

  • did a debug to see what is not working, which is not right with the variable nodes? Incidentally this is not good, if you change any point in this in html your code will fail to work

  • Worse than the nodes variable is right, that’s exactly how I want to keep her for now. It’s even functional, when you run it brings the headlines. What I can’t do is take the var headline and put the values that enter it in an Array/List outside the foreach, to use in another class for example.

1 answer

0


So, what I was able to do now, after two days of trying, on the basis of trial and even error, was the following:

    \\declarei uma string fora do for
   string nome = "";         
   for(...)
    {
        foreach(HtmlNode node in links)
        {
         var manchete = node.Attributes["title"].Value;
     \\dentro do foreach, "nome" recebe um paragrafo + o valor de "manchete"
         nome += "\n" + manchete;
        }
    }

from here it was only split the string headline using . Split that it becomes an array with the names of the subjects in each index

Browser other questions tagged

You are not signed in. Login or sign up in order to post.