PEP8, invalid escape Sequence, what’s wrong with the code

Asked

Viewed 111 times

-1

I’m learning to use webscraping in Python (version 3.7).

I’m dealing with regular expressions and writing this code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen('http://www.pythonscraping.com/pages/pages3.html')
bsObj = BeautifulSoup(html, 'html.parser')
images = bsObj.findAll('img', {'src': re.compile("\.\./img/gifts/img.*\.jpg")})
for image in images:
    print(image['src'])

Precisely on the following line images = bsObj.findAll('img', {'src': re.compile("\.\./img/gifts/img.*\.jpg")}) pycharm notifies me with the message 'PEP8 invalid escape Sequence'.

I would like you to help me understand and fix the possible mistake.

  • Try to format your code better to facilitate our reading, I will answer below my solution

1 answer

2

Just "escape" the "escapes", like this:

re.compile("\\.\\./img/gifts/img.*\\.jpg")

Or you can use the r"..." as explained in: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

re.compile(r"\.\./img/gifts/img.*\.jpg")

PS: the point . regex means any character (it is not the same [\s\S]), so only this r"../img/gifts/img.*.jpg" can fail its logic, since such things would be accepted:

ab/img/gifts/img1000ajpg
fo/img/gifts/img2000Zjpg

And it’s probably not what you want

See that it still works: https://repl.it/@inphinit/regex-escape-python

  • Thank you, brother... The solution was very useful!

  • @Sthino_iv blz. Please mark the answer as correct. If you do not know how to read the instructions on: https://pt.meta.stackoverflow.com/a/1079/3635

Browser other questions tagged

You are not signed in. Login or sign up in order to post.