-2
Good afternoon guys, I am new in python and I am learning a few with this I would like to ask help from you, I have the following source code taken from a site:
<div class='numerando'>1 - 1</div><div class='episodiotitle'><a href='https://www.assistirseriesflix.com/episodios/assistir-mentes-criminosas-1x1-dublado-e-legendado-online-hd/'>Extreme Aggressor</a> <span class='date'>Sep. 22, 2005</span></div></li><li class='mark-2'><div class='imagen'><img src='https://image.tmdb.org/t/p/w154/d46a4eVjnECDSzKGJDNCSRQGrRo.jpg'></div><div class='numerando'>1 - 2</div><div class='episodiotitle'><a href='https://www.assistirseriesflix.com/episodios/assistir-mentes-criminosas-1x2-dublado-e-legendado-online-hd/'>Compulsion</a> <span class='date'>Sep. 28, 2005</span></div></li>
With that I made this little code with the following regex:
site = "https://www.assistirseriesflix.com/series/mente-criminosa-dublado-hd/"
response = requests.get(site)
data = response.content
data.decode('utf-8')
match = re.findall(b'<div class=\'numerando\'>(.*?)</div><div class=\'episodiotitle\'><a href=\'(.*?)\'>(.*?)</a>', data)
I have tried several ways to print the code as follows:
Episode 1 - 1 Extreme Aggressor | Link : https://www.assistirseriesflix.com/episodios/assistir-mentes-criminosas-1x1-dublado-e-legendado-online-hd/
Episode 1 - 2 Compulsion | Link : https://www.assistirseriesflix.com/episodios/assistir-mentes-criminosas-1x2-dublado-e-legendado-online-hd/
onde 1 - 1 vem de <div class='numerando'>1 - 1</div>
E nome do episódio e link vem de <div class='episodiotitle'><a href='https://www.assistirseriesflix.com/episodios/assistir-mentes-criminosas-1x1-dublado-e-legendado-online-hd/'>Extreme Aggressor</a>
Giving a print(match) it correctly shows all the data I want to get, however I can’t filter them the way I mentioned above
I tried with several codes I found in tutorials on the internet and some topics here of the stack itself, but I could not with any, I also tried to understand beautifulsoup but I did not have much success, actually I do not know what is the best way to do this in python, I thank you in advance for your help!
Do not use regex to work with HTML: https://answall.com/a/440262/112052 <-- this link shows how regex can become more and more complicated, while using the right tool, such as Beautiful Soup, is much better
– hkotsubo