How to open multiple HTML’s and save content sequentially in a txt

Question

How to open multiple HTML’s and save content sequentially in a txt

Asked 6 years, 3 months ago

Viewed 278 times

-1

I need to open several HTML’s, get their text and save to a txt sequentially, but I don’t know how to do that.

I can do this with a single HTML, but I need to do it with several, and sequentially. Because it’s a epub and I need the text to be in the correct order.

Follows my code:

from bs4 import BeautifulSoup
arquivo = open('pfv.txt', 'w')
html = open(("index_split_001.html"), encoding="utf8").read()
soup = BeautifulSoup(html, 'html.parser')
link =  soup.get_text()

arquivo.writelines(link)

1 answer

Browser other questions tagged python beautifulsoup

You are not signed in. Login or sign up in order to post.

by Nicolas Pereira • **154** points · Answer 1 · 2019-04-02T11:29:24+00:00

1

First thing, are HTML Urls standard? If so, create a loop to open all the files. If it is not necessary you pass the name of the page as parameter. A hint if there are several pages create a file. txt with all folders and create a FOR loop to read this site . txt and go opening each of them. Your logic is correct now it is only necessary to scroll through all html files

Nicolas,are local files same. The names I can rename and use anyone. Let me see if I got you. Make an for opening all htmls and every opening write to txt? As I guarantee that what has already been written will not be overwritten by the next html?

– user124673

2019/04/02 at 12:15
In the txt file when inserting you use line break, so it will not overwrite the same line! Have a look at this link http://www.devfuria.com.br/python/manipulando-textfiles/

– Nicolas Pereira

2019/04/02 at 12:20
1

Thanks man, it helped a lot.

– user124673

2019/04/08 at 13:46