2
I have a df
as follows and I need to make a top 10 of the most repeated words in problem and within these selected words do the same within the solution.
Obs: the dataframe has more than 20,000 lines, this is just a sample.
Code:
df.query((pd.Series('' .join(df['Problema']). split ()). value_counts ()[:10]).apply(lambda x: " ".join('' .join(df['Solucao']). split ()). value_counts () [:10]))
Attributeerror: str Object has no attribute 'value_counts'
Expected:
Return the 10 most frequent words in the column "Problem" and in this return identify which are the 110 most frequent words in the column "Solution"
TOP 10 Problems | TOP 10 Troubleshooting
xlm | [extrair, bug, c, d, e, f, g, g, i, j]
impressao | [rede, parado, c, d, e, f, g, g, i, j]
c | [a, b, c, d, e, ..., j]
d | [a, b, c, d, e, ..., j]
e | [a, b, c, d, e, ..., j]
f | [a, b, c, d, e, ..., j]
g | [a, b, c, d, e, ..., j]
h | [a, b, c, d, e, ..., j]
i | [a, b, c, d, e, ..., j]
j | [a, b, c, d, e, ..., j]
Alex, thank you for answering me!! In the case of problems and solution I do not know the properties to mount the dictionary. I need the top 10 repeating problems, and when I do, I need to know which are the most repeating solutions. This is a dynamic and constantly changing data, for this reason I believe that it is not possible to create a dictionary, because it would need fixed words. In this case, I cannot feed a dictionary. Perchance, you know another solution?
– Beatriz Benz
It is very difficult to work with the
df
the way it is. I recommend doing a Pre-processing of the data to create the dataframe without lists. A suggestion is to use Isolated Problems as Index and each column as a solution. The values ofdf
would be the number of occurrences of the solution in the problem.– AlexCiuffa