Doubt Python URL

Asked

Viewed 55 times

1

Hello Community I’m beginner in python and wanted to create a tool so that in some part I got doubt.

I made this little code that filters the tags <img> of the Page.

import requests 
from bs4 import BeautifulSoup
    t=input('Digite o Nome do Filme:')
    ano=int(input('Digite o Ano do Filme:'))

    if ano==1:
      req=requests.get('https://www.themoviedb.org/search?query='+t+'&language=pt-BR')
      bs=BeautifulSoup(req.text, 'lxml') 
      print(bs.find_all('img')) 
    else:
      req=requests.get('https://www.themoviedb.org/search?query='+t+'%20y%3A'+str(ano)+'&language=pt-BR')
      bs=BeautifulSoup(req.text, 'lxml') 
      print(bs.find_all('img')) 

Ai did this other part that take the image link and display it on the console.

import io
import os
import requests
import tempfile
from PIL import Image
from matplotlib import pyplot as plt

img_url = 'https://image.tmdb.org/t/p/w500_and_h282_face/dKxkwAJfGuznW8Hu0mhaDJtna0n.jpg'

buffer = tempfile.SpooledTemporaryFile(max_size=1e9)
r = requests.get(img_url, stream=True)
if r.status_code == 200:
    downloaded = 0
    filesize = int(r.headers['content-length'])
    for chunk in r.iter_content():
        downloaded += len(chunk)
        buffer.write(chunk)
        print(downloaded/filesize)
    buffer.seek(0)
    i = Image.open(io.BytesIO(buffer.read()))
    i.save(os.path.join('.', 'image.jpg'), quality=85)
buffer.close() 

plt.imshow(i)
plt.show() 

Ai wondered how I do for variavel img_url get the url of print(bs.find_all('img')) automatically. Or if you have a library for that.

1 answer

1

The function find_all() returns a list of elements, so it is possible to take the attributes of the elements as follows:

# Retorna o valor do atributo 'alt' do primeiro elemento da lista
print(bs.find_all('img')[0]['alt'])
# Resultado: The Movie Database (TMDb)

If you want to get all the links within data-src, can do as follows:

import requests 
from bs4 import BeautifulSoup

req=requests.get('https://www.themoviedb.org/search?query=the%20flash&language=en-US')
bs=BeautifulSoup(req.text, 'html.parser') 
res = bs.find_all('img')

for link in res:
    try:
        print(link['data-src'])
    except:
        print('Elemento não possui data-src dentro de <img>')

Note that there are elements that do not have the attribute data-src inside the tag, therefore it is necessary to use the try, except.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.