Webdriver error in Python3.5 Attributeerror: can’t set attribute

Asked

Viewed 126 times

2

I need to download the contents of a website. I made a code in python 3.5. When I turn it only to a single page the code works very well but when I put it in a loop or function it gives error.

The code as a function is as follows:

from bs4 import BeautifulSoup
from selenium import webdriver
import html2text


def getPEP(strg):
    driver = webdriver.Firefox()
    driver.page_source = driver.get(strg)
    html = driver.page_source
    driver.close()
    text=html2text.html2text(html)
    return(text);

def salva():
    peps = open('PEP.txt', 'r')
    lines = tuple(peps)
    peps.close()
    for i in range(1):
        strg=lines[i].replace('\n','') 
        print(strg + '\n')
        str(strg)
        getPEP(strg)
        start = '#  '
        end = ', \n\n[ ![Join us on'
        cleaned=(text.split(start))[1].split(end)[0]
        file = open(str(i)+'.txt', 'w')
        file.write(cleaned.replace(' ** ','').replace('**',''))
        file.close()
        print('arquivo ' + str(i) + 'gravado com sucesso' )
    return;

salva()

When I put only on the command line as the following:

>>> strg='http://www.mtsamples.com/site/pages/sample.asp?type=3-Allergy%20/%20Immunology&sample=386-Allergic%20Rhinitis, Allergic Rhinitis'
>>> driver = webdriver.Firefox()
>>> driver.page_source = driver.get(strg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
>>> html = driver.page_source
>>> driver.close()
>>> text=html2text.html2text(html)

As much as it returns this error I can still recover the text from the site. now when I put as a function:

>>> def getPEP(strg):
...     driver = webdriver.Firefox()
...     driver.page_source = driver.get(strg)
...     html = driver.page_source
...     driver.close()
...     text=html2text.html2text(html)
...     return(text);
... 
>>> text=getPEP()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: getPEP() missing 1 required positional argument: 'strg'

Then the program stops in error and does not return the text I need! Can anyone help me?


The getPEP fucao() worked, thank you! Now my code looks like this:

from bs4 import BeautifulSoup
from selenium import webdriver
import html2text



# driver.page_source = driver.get())#
def getPEP(strg):
    driver = webdriver.Firefox()
    driver.get(strg)
    html = driver.page_source
    driver.close()
    text=html2text.html2text(html)
    return(text);

def salva(arqv):
    peps = open(arqv, 'r')
    lines = tuple(peps)
    peps.close()
    for i in range(len(lines)):
        strg=lines[i].replace('\n','') 
        text=getPEP(strg)
        start = '#  '
        end = ', \n\n[ ![Join us on'
        cleaned=(text.split(start))[1].split(end)[0]
        file = open(str(i)+'.txt', 'w')
        file.write(cleaned.replace(' ** ','').replace('**',''))
        file.close()
        print('arquivo ' + str(i) + 'gravado com sucesso' )
    return;

getPEP('PEP.txt')

The getPEP(strg) function is working perfectly, thank you! But when I call the call salva(arqv) that was to read the urls I want to download, collect the text I want to save through the function getPEP(strg) and write to a file, is giving the following error:

Traceback (most recent call last):
  File "crawlerPEP.py", line 35, in <module>
    getPEP('PEP.txt')
  File "crawlerPEP.py", line 10, in getPEP
    driver.get(strg)
  File "/home/angelica/Documents/PyEnv3/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
    self.execute(Command.GET, {'url': url})
  File "/home/angelica/Documents/PyEnv3/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "/home/angelica/Documents/PyEnv3/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Target URL PEP.txt is not well-formed.
Stacktrace:
    at FirefoxDriver.prototype.get (file:///tmp/tmpiag201lm/extensions/[email protected]/components/driver-component.js:10636)
    at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpiag201lm/extensions/[email protected]/components/command-processor.js:12661)
    at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpiag201lm/extensions/[email protected]/components/command-processor.js:12666)
    at DelayedCommand.prototype.execute/< (file:///tmp/tmpiag201lm/extensions/[email protected]/components/command-processor.js:12608)

1 answer

2


>>> driver = webdriver.Firefox()
>>> driver.page_source = driver.get(strg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

Error indicates that an attribute could not be defined, page_source is used to return the content of the page, you cannot overwrite it. Put the result in a variable:

>>> driver = webdriver.Firefox()
>>> driver.get(strg)
>>> conteudo = driver.page_source

The second mistake:

>>> text=getPEP()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: getPEP() missing 1 required positional argument: 'strg'

The function getPEP expects an argument, which in this case is the URL you want to get the content. Pass the URL to the function:

>>> strg = 'http://www.mtsamples.com/site/pages/......'
>>> text = getPEP(strg)

The function getPEP should look like this:

>>> def getPEP(strg):
...     driver = webdriver.Firefox()
...     driver.get(strg)
...     html = driver.page_source
...     driver.close()
...     text = html2text.html2text(html)
...     return (text)
  • Thank you! Now this presenting another error I put in the above question could help me again?

  • @user2535338 You are passing the file name to getPEP (it asks for a URL), it was not to use the function salva? thus: salva('PEP.txt').

  • There are several Urls that are in this file 'PEP.txt'

  • @user2535338 In function salva you call getPEP for each line (URL) present in the file and save the result in a new file named after the line number.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.