How to open multiple HTML’s and save content sequentially in a txt

Asked

Viewed 278 times

-1

I need to open several HTML’s, get their text and save to a txt sequentially, but I don’t know how to do that.

I can do this with a single HTML, but I need to do it with several, and sequentially. Because it’s a epub and I need the text to be in the correct order.

Follows my code:

from bs4 import BeautifulSoup
arquivo = open('pfv.txt', 'w')
html = open(("index_split_001.html"), encoding="utf8").read()
soup = BeautifulSoup(html, 'html.parser')
link =  soup.get_text()

arquivo.writelines(link)

1 answer

1


First thing, are HTML Urls standard? If so, create a loop to open all the files. If it is not necessary you pass the name of the page as parameter. A hint if there are several pages create a file. txt with all folders and create a FOR loop to read this site . txt and go opening each of them. Your logic is correct now it is only necessary to scroll through all html files

  • Nicolas,are local files same. The names I can rename and use anyone. Let me see if I got you. Make an for opening all htmls and every opening write to txt? As I guarantee that what has already been written will not be overwritten by the next html?

  • In the txt file when inserting you use line break, so it will not overwrite the same line! Have a look at this link http://www.devfuria.com.br/python/manipulando-textfiles/

  • 1

    Thanks man, it helped a lot.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.