Mechanize with Nokogiri: trying to get information on Ivs

Asked

Viewed 104 times

0

Hello!

I am assembling a Crawler for product information, for this I am using mechanize and consequently Nokogiri, I have a URL (http://www.megamamute.com.br/brother%205652) that returns only one product, but I can’t match the regular expression to get the price of that item, HTML fragment example:

HTML

                <div class="pager top" id="PagerTop_66064345"></div><div id="ResultItems_66064345" class="prateleira vitrine"><div class="prateleira vitrine n1colunas"><ul><li layout="45e718bf-51b0-49c4-8882-725649af0594" class="informatica--teclado-notebook-tablet-pen-drive-|-megamamute last">

    <input type="hidden" class="x-id" value="55492" />

    <div class="x-product">

        <div class="x-selos">
            <p class="flag desconto-10--off-no-boleto">Desconto 10% off no boleto</p>

            <p class="flag Informática" style="display:none;">Informática</p>
        </div>

        <div class="x-get-skuId x-hide"><div class="buy-button-normal" id="55492" name="55492"><a class="buy-button-normal-a55492" href="https://www.megamamute.com.br/checkout/cart/add?sku=55492&qty=1&seller=1&sc=1&price=224900&cv=254ca7d1b9d7fb34e47ca55ceec1b2c0_geral:0F62E16B17B76A6FE17EC7C23A655D8B&sc=1" title="Comprar">Comprar</a><input type="hidden" value="cart" class="buy-button-normal-go-to-cart-55492" /></div></div>

        <div class="x-departamento">
            Multifuncional Laser Monocromática
        </div>

        <div class="x-image">
            <a class="x-productImage" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">
                <img src="http://megamamute.vteximg.com.br/arquivos/ids/6658677-500-500/55492_original.jpg" width="500" height="500" alt="55492_original" id="" />
            </a>
        </div>

        <h2 class="product-name">
            <a title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">
                Impressora Multifuncional Brother DCP-L5652DN Laser Mono
            </a>
        </h2>

        <div data-trustvox-product-code="55492"></div>

                    <div class="x-price">
                <a title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">

                                            <span class="oldPrice">
                             R$ 2.899,00
                        </span> 

                        <span class="x-bestPrice">
                            R$ 2.249,00 
                        </span>

                    <span class="x-installment">
                                                     10X de <strong>R$ 224,90</strong> sem juros
                                            </em> 
                </a>

            </div>

            <!--<div class="x-opiniao">-->
            <!--    <span class="rating-produto avaliacao0">0</span> <span class="navaliacao">(0)</span>-->
            <!--</div>-->



            <div class="x-info-product">
                <ul>
                    <li class="x-info"><a href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a></li>
                    <li class="x-favorite"><a href="#"></a></li>
                    <li class="x-move"><a href="#"></a></li>
                    <li class="x-add"><a href="#"></a></li>
                </ul>

            </div>

            <div class="x-hover">
                <div class="x-buy"> <a class="x-productImage" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"> Comprar </a></div>
                <a class="x-hoverHref" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a>
                <ul>
                    <li class="x-info"><a href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a></li>
                    <li class="x-favorite"><a href="#"></a></li>
                    <li class="x-move"><a href="#"></a></li>
                    <li class="x-add"><a href="#"></a></li>
                </ul>

            </div>


        <div class="x-brand"><p class="texto brand brother">brother</p></div>

</div>

And also I would like to go ahead and check how I would get several products, I would have several "x-product" Ivs, I could not understand a way to assemble an array with all and search the information inside each one,

Thank you!

1 answer

0


If you are using Nokogiri to extract information from HTML tags, I see no reason to use regular expression.

Follow an example using Httparty (just adapt to your situation):

require 'httparty'
require 'nokogiri'

link = "/questions/tagged/python"
response = HTTParty.get(link)
content = Nokogiri::HTML(response)

# Captura os dados presentes em todas as tags <a> com a class "question-hyperlink"
result = content.css('a[class=question-hyperlink]')

# Laço para percorrer e imprimir um por um
result.each do |question|
  puts(question.text)
end

Exit

Import python module

Valueerror error in book exercises Learn Python the Hard Way

Error passing parameter to user.set_password function

Use of the set and for function in the same structure

Know years, months, days, hours, etc... That have passed since a certain date

Find duplicate element in time O(n) and space O(1) [pending]

As a string coverter for timestamp object?

Python says my function name does not exist [pending]

Error while recovering JSON and using Python API

Python does not return files inside a directory

[...]

This should also solve the problem to get multiple Divs 'x-product'.

Send feedback if I understand your problem and if the solution helped you.

I stand by.

  • i will take the test, unfortunately I am with my busy schedule, I will try tonight, Thank you very much!

  • It worked! Thank you very much!

  • I’m glad I could help you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.