Remove whitespace from the Python list

Asked

Viewed 1,886 times

3

I’m reading a file, stopwords.txt, and in this file each stopword is on a line, example:

a


o


para

Each stopword I am saving on a list as follows:

with open(sys.argv[1],"r") as file_entrada: 
    lista_nome_base_docs = [line.rstrip() for line in file_entrada]


with open(sys.argv[2],"r") as file_stopwords:  
    lista_stopwords = [line.strip() for line in file_stopwords]

After this reading, I have the list displayed on the screen and exit this way

inserir a descrição da imagem aqui

Among the stopwords, is coming out a white space, example: ['a','','para']

How do I not show these whitespaces in the list?

1 answer

3


Just check when you generate the word list:

lista_stopwords = [line.strip() for line in file_stopwords if line.strip() != ""]

Notice the addition of the condition if line.strip() != "" at the end of the line. With this you will already ensure that only lines with some content other than line break are not included in the list.

Take an example:

words = ["a\n", "\n", "ok\n", "\n", ""]

print([word.strip() for word in words if word.strip() != ""])

# ['a', 'ok']

See working on Ideone | Repl.it

  • I understood @Anderson Calos Woss, another thing that I did not understand why, in the image of the question that shows the exit of the list, notice that the first position of the list, is printing a "garbage" by what it seems to me, can you tell me what that would be? or how to remove this, so that at the first position of the list print the first stopword actually.

  • @Williamhenrique is the character of BOM (https://answall.com/q/9940/5878). This answer may help you: https://answall.com/a/217650/5878.

  • I got it here... using what you showed, in that if I added one more condition: if line.strip() != "" and line.strip() != " " - and deleted these special characters from the list, just the stopwords. Thank you very much for the help!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.