Count how many times a word of a file appears in another file

Question

Count how many times a word of a file appears in another file

Asked 7 years, 10 months ago

Viewed 1,498 times

4

I would like to count how many times a word list (archive1) appears in another word list (archive2)

with open("corpus.txt", "r") as f1, open("lexico.txt", "r") as f2:
    file1 = f1.read()
    file2 = f2.read()

    corpus1 = file1.split(" ")

    for word in file2:
        print(word, corpus1.count(word))

corpus.txt file (archive2)

I am afraid to look for other options for the quality of light is quite recommendable very white light, but the duration of everything opposite I lasted less than months the two lamps and I put them in the assist light. The light I want is paler but strong enough to light the room.

lexico.txt file (archive1)

is

but

light

Upshot

is 2

0

m 0

to 2

s 0

0

l 0

u 0

z 0

0

1

In lexico.txt words are separated by " n" (line break), always? That is, one word per line?

– Miguel

2017/09/26 at 11:14
Always separated by a line break! I tried to make a corpus2 = file2.split(" n") but it didn’t work.

– pitanga

2017/09/26 at 11:20

1 answer

Browser other questions tagged python python-3.x

You are not signed in. Login or sign up in order to post.

by Miguel • **29,306** points · Answer 1 · 2017-09-26T11:28:31+00:00

Here’s what you can do:

count = {}

with open('corpus.txt') as f1, open('lexico.txt') as f2:
    corpus = f1.read().split() # texto
    for word in f2: # palavras a quantificar no texto
        w_strp = word.strip() # retirar quebras de linha
        if w_strp != '' and w_strp not in count: # se ja a adicionamos nao vale a pena faze-lo outra vez
            count[w_strp] = corpus.count(w_strp)
print(count) # {'mas': 2, 'é': 2, 'luz': 4}

Or in this case:

count = {}

with open('corpus.txt') as f1, open('lexico.txt') as f2:
    corpus = f1.read().split()
    lexico = set(word.strip() for word in f2 if word.strip() != '') # set() para evitar palavras repetidas

count = {l_word: corpus.count(l_word) for l_word in lexico}
print(count) # {'mas': 2, 'é': 2, 'luz': 4}

If you’re sure not repeated words in lexico.txt, you can just:

...
lexico = [word.strip() for word in f2 if word.strip() != '']
...

Or even:

count = {}

with open('temp/corpus.txt') as f1, open('temp/lexico.txt') as f2:
    corpus = f1.read().split()
    count = {l_word: corpus.count(l_word) for l_word in (word.strip() for word in f2 if word.strip() != '')}

print(count) # {'mas': 2, 'é': 2, 'luz': 4}