Size of tuple lists in a df

Question

Size of tuple lists in a df

Asked 7 years, 2 months ago

Viewed 397 times

2

I have the following df

n_words                       Words                        .
   220     [('trabalho', 17), ('monitor', 17), ('via', 16... 
  3114     [('atend', 863), ('ortopedico', 863), ('proced... 
     5     [('anomalos', 2), ('feixes', 1), ('eletrofisio... 
     3     [('hr', 1), ('sistema', 1), ('fenotipagem'...

I need the amount of different words, that is, the size of each tuple list.

I tried to:

df['palvras_dif'] = ""
i = 0
for row in df['Words']:
    df['palvras_dif'][i] = len(df['Words'][i])
    i+=1
df

But it doesn’t count correctly. Someone can help me?

Is using the Pandas?

– Woss

2019/01/10 at 12:09
I am using yes!

– Gisele Santos

2019/01/10 at 12:20
And what does the number represent on each tuple? It should be considered also or just the word?

– Woss

2019/01/10 at 12:21
It is the frequency that the word appeared in another df. Example: on line 3 I had a list of ['anomalies', 'electrophysiotherapy', 'bundles', 'anomalies', 'electrophysiotherapy'] and I made the list of tuples with her word and phrquency. I need to know qts words are different, so I wanted the size of the list of tuples...

– Gisele Santos

2019/01/10 at 12:25
But should it be considered or not? For example, if there is ('trabalho', 2) and ('trabalho', 14), should be considered as the same word or as separate occurrences?

– Woss

2019/01/10 at 12:26
In this example you gave, I don’t have the same word 2x, just because the number is the word frequency.

– Gisele Santos

2019/01/10 at 12:29
Then it would not be enough to add the values in n_words?

– Woss

2019/01/10 at 12:30
Not pq in n_words I have the total number of words, also considering the repeated ones. I need the number of distinct words. Like the example on line 3: ['anomalies', 'electrophysiotherapy', 'bundles', 'anomalies', 'electrophysiotherapy'] I have n_words =5 and I need the number of different words, which would be: 3.

– Gisele Santos

2019/01/10 at 12:34

Show 3 more comments

1 answer

Browser other questions tagged python pandas tuple

You are not signed in. Login or sign up in order to post.

by Woss • **73,416** points · Answer 1 · 2019-01-10T12:50:23+00:00

As discussed in:

You can use the type set Python which, by definition, has no repeated elements.

Utilize p[0] for palavras in df['Words'] for p in palavras to find all the words of dataframe. After, generate a set from these data and check its size:

num_palavras = len(set(p[0] for palavras in df['Words'] for p in palavras))

For example:

import pandas as pd

df = pd.DataFrame(data={
    'Words': [
        [('a', 1), ('b', 2)],
        [('c', 1), ('d', 2)],
    ]
})

num_palavras = len(set(p[0] for palavras in df['Words'] for p in palavras))

print(num_palavras)  # 4

^{See working on Repl.it}

But, how commented, the words will not repeat themselves by the different lines, so just check the amount of tuples present in the dataframe.

num_palavras = sum(len(palavras) for palavras in df['Words'])