Download images from a python txt(links) list

Asked

Viewed 509 times

0

First I imported the packages and created a class and its settings:

class Scraper:
def __init__(self):
    self.visited = set()
    self.session = requests.Session()
    self.session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36"}
    requests.packages.urllib3.disable_warnings()

def:

def visit_url(self, url, level):
    print(url)
    if url in self.visited:
        return
    self.visited.add(url)
    content = self.session.get(url, verify=False).content
    soup = BeautifulSoup(content, "lxml")
    for img in soup.select("img[src]"):
        image_url = img["src"]
        if not image_url.startswith(("data:image", "javascript")):
            self.download_image(urljoin(url, image_url))
    if level > 0:
        for link in soup.select("[/html/body/div/div/div[2]/div/div[1]/div[1]/div/div[1]/div[3]/div[1]/a[1]/img]"):
            self.visit_url(urljoin(url, link["/html/body/div/div/div[2]/div/div[1]/div[1]/div/div[1]/div[3]/div[1]/a[1]/img"]), level - 1)

The download:

def download_image(self, image_url):
    local_filename = image_url.split('/')[-1].split("?")[0]
    r = self.session.get(image_url, stream=True, verify=False)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            f.write(chunk)

The link:

if __name__ == '__main__':
scraper = Scraper()
scraper.visit_url('https://mbasic.facebook.com/story.php?story_fbid=2498834290232454&id=198123623636877&refid=17&_ft_=mf_story_key.2498834290232454%3Atop_level_post_id.2498834290232454%3Atl_objid.2498834290232454%3Acontent_owner_id_new.198123623636877%3Athrowback_story_fbid.2498834290232454%3Apage_id.198123623636877%3Aphoto_attachments_list.%5B2498828320233051%2C2498828993566317%2C2498829400232943%5D%3Astory_location.4%3Astory_attachment_style.album%3Apage_insights.%7B%22198123623636877%22%3A%7B%22page_id%22%3A198123623636877%2C%22actor_id%22%3A198123623636877%2C%22dm%22%3A%7B%22isShare%22%3A0%2C%22originalPostOwnerID%22%3A0%7D%2C%22psn%22%3A%22EntStatusCreationStory%22%2C%22post_context%22%3A%7B%22object_fbtype%22%3A266%2C%22publish_time%22%3A1574031070%2C%22story_name%22%3A%22EntStatusCreationStory%22%2C%22story_fbid%22%3A%5B2498834290232454%5D%7D%2C%22role%22%3A1%2C%22sl%22%3A4%2C%22targets%22%3A%5B%7B%22actor_id%22%3A198123623636877%2C%22page_id%22%3A198123623636877%2C%22post_id%22%3A2498834290232454%2C%22role%22%3A1%2C%...', -1)

But would you like to pass a.txt list with multiple links instead of the link, using a loop? Images that are downloaded from the link in the code: As imagens de um link apenas O código executado

  • Apparently you’ve made it harder, if you want to use a list of urls for a file, it wouldn’t be enough for you to make one for linha in arquivo and use linha as an argument for download_image? What is your difficulty?

  • Because I made another piece of code, but when I put the list.txt in place of, scraper.visit_url(site here), download the images, but corrupted, because it downloads directly from the link, and each link needs to be accessed to find the image path and then download the image.

  • @Hudsonsouza do not put the answer in the question. You can answer your own question as another user.

1 answer

0


f name == 'main':
arq = open(r'url.txt')
for i in arq:
    scraper = Scraper()
    scraper.visit_url(i, -1)

The solution, thanks for the help.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.