Capture using Xpath

Asked

Viewed 116 times

0

I’m making a capture of a site using python (scrapy) and xpath

How to capture only 232,990 of the code below?

<div class="price-advantages-container">
    <div class="price-comparison">
        <div itemprop="price" class="price">
               <div>
                    <span>R$</span> 232.990
               </div>
        </div>
    </div>
</div>

I tried with Response.xpath('//div[contains(@class, "price")]/div/text()') and returned invisible characters like:

[<Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data=' 232.990\r\n\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t'>]
  • Invisible characters will always come. Just make a .trim() in string to remove them.

1 answer

3

You can filter by attribute itemprop element, rather than filtering all Ivs that have pricein the class name. I am using the extract_first() to return only the first match and then strip()to remove whitespace from text.

from scrapy import Selector

source = '''<div class="price-advantages-container">
    <div class="price-comparison">
        <div itemprop="price" class="price">
               <div>
                    <span>R$</span> 232.990
               </div>
        </div>
    </div>
</div>'''

selector = Selector(text=source)

price = selector.xpath('//div[@itemprop="price"]/div/span/following-sibling::node()').extract_first().strip()

print("[*] Price: {}".format(price))

Upshot:
[*] Price: 232.990

Browser other questions tagged

You are not signed in. Login or sign up in order to post.