How to collect text when there is no HTML reference class - Crawler Python

Question

How to collect text when there is no HTML reference class - Crawler Python

Asked 8 years, 8 months ago

Viewed 788 times

1

I have the following situation below:

I want to collect "Text to Crawler" that is below, as I will navigate there without class or id?

<td>Texto para crawler</td>

1 answer

Browser other questions tagged python-3.x web-crawler web-scraping scraping

You are not signed in. Login or sign up in order to post.

by dot.Py • **985** points · Answer 1 · 2017-04-14T22:35:30+00:00

You can use the library Beautifulsoup at the command .find_all() to extract all tags td of a website, without specifying any class, name or id, for example.

Code:

from bs4 import BeautifulSoup
import requests

url = 'https://en.wikipedia.org/wiki/Web_scraping'

html_page = requests.get(url)
html_source = html_page.text

soup = BeautifulSoup(html_source, 'html.parser')

td_tags = soup.find_all('td')

for td in td_tags:
    print(td, '\n')

Output:

<td class="mbox-image"><div style="width:52px"><img alt="Globe icon." data-file-height="290" data-file-width="350" height="40" src="//upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/48px-Ambox_globe_content.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/73px-Ambox_globe_content.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/97px-Ambox_globe_content.svg.png 2x" width="48"/></div></td> 

<td class="mbox-text"><span class="mbox-text-span">The examples and perspective in this article <b>deal primarily with the United States and do not represent a <a href="/wiki/Wikipedia:WikiProject_Countering_systemic_bias" title="Wikipedia:WikiProject Countering systemic bias">worldwide view</a> of the subject</b>. <span class="hide-when-compact">You may <a class="external text" href="//en.wikipedia.org/w/index.php?title=Web_scraping&amp;action=edit">improve this article</a>, discuss the issue on the <a href="/wiki/Talk:Web_scraping" title="Talk:Web scraping">talk page</a>, or <a href="/wiki/Wikipedia:Article_wizard" title="Wikipedia:Article wizard">create a new article</a>, as appropriate.</span> <small><i>(October 2015)</i></small> <small class="hide-when-compact"><i>(<a href="/wiki/Help:Maintenance_template_removal" title="Help:Maintenance template removal">Learn how and when to remove this template message</a>)</i></small></span></td>