How to Scrapping a page that has a javascript’s using python ?

Asked

Viewed 287 times

1

I need to make Scrapping of a page, but the entry of the page has a button (apparently a Javascript) that gives access to all the content of the page itself. Using traditional libs(urllib2, requests, Beatifulsoap) I can’t "pull" the content I need, someone has gone through something similar?

  • Post the code you tried to do unsuccessfully, and preferably the Javascript code that prevents direct connection to the page.

  • Needs to be in python only?

  • To pull data from JS pages you must use a library such as Selenium or the Dryscrape. The first one I’ve used and recommend, already the dryscrape, I haven’t used, but I’ve read good things about it. Another point, when asking something related to web-scraping, post the link you are accessing... helps a lot!

1 answer

1

I usually use Selenium to make webscrapping on sites that have a lot of javascript. Normally, I use Selenium with Java, but in Python it works too. Below, a code with a silly but functional example.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('/path/to/chromedriver')
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

Remember that to use the Chrome Driver, you must have Chromedriver, which you can download at Chromedriver. The Selenium Webdriver documentation in Python is at Documentation.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.