Download images from a python txt(links) list

Question

Download images from a python txt(links) list

Asked 6 years ago

Viewed 509 times

0

First I imported the packages and created a class and its settings:

class Scraper:
def __init__(self):
    self.visited = set()
    self.session = requests.Session()
    self.session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"}
    requests.packages.urllib3.disable_warnings()

def:

def visit_url(self, url, level):
    print(url)
    if url in self.visited:
        return
    self.visited.add(url)
    content = self.session.get(url, verify=False).content
    soup = BeautifulSoup(content, "lxml")
    for img in soup.select("img[src]"):
        image_url = img["src"]
        if not image_url.startswith(("data:image", "javascript")):
            self.download_image(urljoin(url, image_url))
    if level > 0:
        for link in soup.select("[/html/body/div/div/div[2]/div/div[1]/div[1]/div/div[1]/div[3]/div[1]/a[1]/img]"):
            self.visit_url(urljoin(url, link["/html/body/div/div/div[2]/div/div[1]/div[1]/div/div[1]/div[3]/div[1]/a[1]/img"]), level - 1)

The download:

def download_image(self, image_url):
    local_filename = image_url.split('/')[-1].split("?")[0]
    r = self.session.get(image_url, stream=True, verify=False)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            f.write(chunk)

The link:

if __name__ == '__main__':
scraper = Scraper()
scraper.visit_url('https://mbasic.facebook.com/story.php?story_fbid=2498834290232454&id=198123623636877&refid=17&_ft_=mf_story_key.2498834290232454%3Atop_level_post_id.2498834290232454%3Atl_objid.2498834290232454%3Acontent_owner_id_new.198123623636877%3Athrowback_story_fbid.2498834290232454%3Apage_id.198123623636877%3Aphoto_attachments_list.%5B2498828320233051%2C2498828993566317%2C2498829400232943%5D%3Astory_location.4%3Astory_attachment_style.album%3Apage_insights.%7B%22198123623636877%22%3A%7B%22page_id%22%3A198123623636877%2C%22actor_id%22%3A198123623636877%2C%22dm%22%3A%7B%22isShare%22%3A0%2C%22originalPostOwnerID%22%3A0%7D%2C%22psn%22%3A%22EntStatusCreationStory%22%2C%22post_context%22%3A%7B%22object_fbtype%22%3A266%2C%22publish_time%22%3A1574031070%2C%22story_name%22%3A%22EntStatusCreationStory%22%2C%22story_fbid%22%3A%5B2498834290232454%5D%7D%2C%22role%22%3A1%2C%22sl%22%3A4%2C%22targets%22%3A%5B%7B%22actor_id%22%3A198123623636877%2C%22page_id%22%3A198123623636877%2C%22post_id%22%3A2498834290232454%2C%22role%22%3A1%2C%...', -1)

But would you like to pass a.txt list with multiple links instead of the link, using a loop? Images that are downloaded from the link in the code:

Apparently you’ve made it harder, if you want to use a list of urls for a file, it wouldn’t be enough for you to make one for linha in arquivo and use linha as an argument for download_image? What is your difficulty?

– fernandosavio

2019/11/25 at 19:45
Because I made another piece of code, but when I put the list.txt in place of, scraper.visit_url(site here), download the images, but corrupted, because it downloads directly from the link, and each link needs to be accessed to find the image path and then download the image.

– Hudson Souza

2019/11/25 at 20:06
@Hudsonsouza do not put the answer in the question. You can answer your own question as another user.

– Augusto Vasques

2019/12/18 at 15:50

1 answer

Browser other questions tagged python-3.x beautifulsoup

You are not signed in. Login or sign up in order to post.

by Hudson Souza • 53 points · Answer 1 · 2019-12-19T18:22:33+00:00

f name == 'main':
arq = open(r'url.txt')
for i in arq:
    scraper = Scraper()
    scraper.visit_url(i, -1)

The solution, thanks for the help.