2
I want to find in this Dataframe, the 3 best user_id for Nome prova. That is, those who have the highest number of values 1 in the correct column ( such column is composed of values 0 and 1 ):
                 Nome prova  user_id  correta
3      SABESP ASSESMENT RO       85        1
4      SABESP ASSESMENT RO       62        1
5      SABESP ASSESMENT RO       45        1
6      SABESP ASSESMENT RO       39        1
7      SABESP ASSESMENT RO       62        1
                   ...      ...      ...
24269       SICREDI UNID 7     1684        1
24271       SICREDI UNID 7     1695        1
24273       SICREDI UNID 7     1674        1
24275       SICREDI UNID 7     1681        1
24276       SICREDI UNID 7     1672        1
I tried to do it this way :
only_corrects = dataset[dataset.correta == 1]
best_participants = only_corrects[['Nome prova', 'user_id', 'correta']].groupby(['Nome prova', 'user_id']).sum()
best_participants.pivot_table(values='correta', index='user_id', columns='Nome prova')
but what I get are several values NaN among those user_id who did not take the test:
Nome prova  Provas Sabesp - RB  SABESP ASSESMENT RO  SICREDI UNID 7  Sabesp  \
user_id                                                                       
1                          NaN                  NaN             NaN     NaN   
2                          NaN                  NaN             NaN     NaN   
33                         NaN                  4.0             NaN     9.0   
34                         NaN                 11.0             NaN     NaN   
35                         NaN                  6.0             NaN     NaN   
                       ...                  ...             ...     ...   
1699                       NaN                  NaN             8.0     NaN   
1700                       NaN                  NaN             9.0     NaN   
1702                       NaN                  NaN            13.0     NaN   
1703                       NaN                  NaN             7.0     NaN   
1704                       NaN                  NaN            11.0     NaN 
I also just tried to finish the groupby:
only_corrects = dataset[dataset.correta == 1]
best_participants = only_corrects[['Nome prova', 'user_id', 'correta']].groupby(['Nome prova', 'user_id']).sum()
but also without success:
                               correta
Nome prova         user_id         
Disal              404           10
                   405            5
                   406           11
                   407            9
                   408            6
                            ...
prova alto uruguai 785            8
                   786           11
                   787            5
                   788           11
                   789            7

Could attach the question a [mcve] showing your attempt to solve the problem, the error message generated and highlight the line where the error was generated.
– Augusto Vasques
I tried to finalize the groupby function and the pivot_table to perform the sum and separation by user_id, but both do not return the expected result
– luizvale06
@Could Augustovasques please open my question to the answers ? I don’t see where she could differ from any others in the same category and who received answers.
– luizvale06
I’ve already given my vote to reopen, missing the other users.
– Augusto Vasques
I would like to warn you that I have quoted your question in META so as not to give the impression that I’m talking about you.
– Augusto Vasques
Votes of 5 experienced users are required to reopen. I just gave the fourth vote.
– Victor Stafusa