3
How to count the number of occurrences in columns?
Filing cabinet:
luz NC luz
mas ADV más
blanquita ADJ blanco
que CQUE que
las ART el
que CQUE que
traía VLfin traer
de PREP de
serie NC serie
mi PPO mi|mío
coche NC coche
Script:
from collections import Counter
with open ("corpus_TreeTagger.txt", "r") as f:
texte = f.read()
colunas = texte.split("\n")
def frequencia(colunas):
for linhas in colunas:
lexema = linhas.split('\t')[0]
pos = linhas.split('\t')[1]
lema = linhas.split('\t')[2]
return Counter(lexema)
return Counter(pos)
return Counter(lema)
print(frequencia(colunas))
Error:
Traceback (most recent call last):
File "FINALV2.py", line 72, in <module>
print(frequencia(colunas))
File "FINALV2.py", line 23, in frequencia
pos = linhas.split('\t')[1]
IndexError: list index out of range
Could someone help me?
What kind of file is this? what divides the columns? Isn’t there a character to separate them? Do you create the file or recbe from another source? the original ending is
.txt
even?– Sidon
It’s a morpho-syntax labeling software. We give a text and it does the analysis by dividing the output file into three columns: the word, the morphological label and its motto.
– pitanga
Okay! Did you develop it? If not, is there no way to configure it to create a character to separate the columns? the way it is, at least visually, it is impossible to identify the columns, if at least they had fixed width would already help. See this text (which, although not in that context, is about the subject) to understand what I’m talking about.
– Sidon
It’s actually a column, a tabulation, another column, tabulation and column. I can print the entire second column, for example, like this: lines.split(' t')[2]
– pitanga
See if my answer meets the goal
– Sidon
Thank you @Sidon! Chama Treetagger, widely used in linguistics and is a language labeler developed in Germany. My goal is to do a statistic of a text, counting the lexemes, the labels and the slogans. I’m trying to do a simple parser as well. I imagine there are other ways to do it, but I’m a beginner. Thank you very much, I’ll take a look at Pandas :)
– pitanga