An alternative is to save the words in a set
, which is a native structure that does not allow repetitions. So, just read the file, read the lines of it, separate the words of each line and go saving them in a set
:
palavras = set()
with open('arquivo.txt') as arq:
for linha in arq: # para cada linha do arquivo
for palavra in linha.split(' '): # para cada palavra da linha
palavras.add(palavra) # adiciona a palavra no set
palavras_em_ordem = sorted(palavras)
print(palavras_em_ordem)
When adding the word, the word itself set
checks if it already exists, and will not add duplicate words. Then just use sorted
to get the list of ordered words.
Notice that I opened the file inside a with
, that ensures that the file is closed at the end, even in case of error while reading or processing lines. The for linha in arq
makes it read one line at a time (use readlines
, as suggested by another answer, loads all file contents to memory at once, which may consume resources unnecessarily if the file is too large).
It is also worth remembering that the solution of the other answer may be to make it much slower as the word list grows, since for each word a test is done to see if it is already in the list, and this test in lists is slower compared to sets (take the test here).
Finally, it is worth remembering that do the split
by space is a "naive" solution to get the words. It was not clear what is in the file, but if you have a phrase like "Hello, okay?" the split(' ')
will consider that Olá,
and bem?
are words (the comma and the question will be part of the "word", so they will be considered different words from Olá
and bem
). If you want to consider more complex cases and eliminate commas, punctuation marks, and also consider compound words (such as "hummingbird") or with apostrophe ("drop of water"), there are a few examples here, here and here.
Also no distinction is made between upper and lower case: oi
and Oi
are considered different words. If you want to consider that both are the same word, just change the line you add in the set
for palavras.add(palavra.casefold())
.
the observation of opening the file with the function
with
, was quite timely. Because every time we work with files, we must ensure their loading as well as their closure. Good observation. + 1.– Solkarped