Stacked dice. How to work this on pandas?

Question

Stacked dice. How to work this on pandas?

Asked 6 years, 7 months ago

Viewed 562 times

1

I have a table that is structured with "stacked" data, that is, all the information of a customer occupies a few first lines. Once the customer’s information is complete, the next client takes the next lines, and so on. I see how I can work this out on pandas. In the header of each customer data block, there is some identification information, including your ID, which is called Matricula1, Matricula2, Matricula3... Matriculan. One idea I had was to create a column, copy the Matricula data for it and repeat the matricula field until the next matricula. For example, in the case of the image below, repeat Matricula1 to line B25. On line B26, the enrollment changes, becoming Matricula2 and then repeating this value until another customer’s Matricula. How can I do this? Grateful.

See if my answer answers.

– Sidon

2018/12/13 at 16:24

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Sidon • **6,563** points · Answer 1 · 2018-12-13T16:19:42+00:00

Use Groupby

df = pd.DataFrame({'Time': ['Alpha', 'Alpha', 'Beta', 'Beta', 'Gama', 'Delta', 
                            'Gama', 'Gama', 'Alpha', 'Delta', 'Delta', 'Alpha'],
        'Rank': [2, 1, 3, 2, 3, 1, 4, 1, 2, 4, 1, 2],
        'Ano': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
        'Pontos':[976,689,963,773,845,712,866,999,684,721,794,700]})

print(df)
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
2    Beta     3  2014     963
3    Beta     2  2015     773
4    Gama     3  2014     845
5   Delta     1  2015     712
6    Gama     4  2016     866
7    Gama     1  2017     999
8   Alpha     2  2016     684
9   Delta     4  2014     721
10  Delta     1  2015     794
11  Alpha     2  2017     700

Grouping by the desired column (accepts multiple):

dfg = df.groupby('Time')

Iterating on the groups:

for name, group in dfg:
  print(name, group, sep='\n')
Alpha
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700
Beta
   Time  Rank   Ano  Pontos
2  Beta     3  2014     963
3  Beta     2  2015     773
Delta
     Time  Rank   Ano  Pontos
5   Delta     1  2015     712
9   Delta     4  2014     721
10  Delta     1  2015     794
Gama
   Time  Rank   Ano  Pontos
4  Gama     3  2014     845
6  Gama     4  2016     866
7  Gama     1  2017     999

Selecting a group:

print (dfg.get_group('Alpha'))
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700

Aggregations:

print('Media dos pontos de cada time',dfg.Pontos.agg(np.mean), sep='\n')
Media dos pontos de cada time
Time
Alpha    762.250000
Beta     868.000000
Delta    742.333333
Gama     903.333333
Name: Pontos, dtype: float64
    
print('Somatória dos pontos de cada time',dfg.Pontos.agg(np.sum), sep='\n')
Somatória dos pontos de cada time
Time
Alpha    3049
Beta     1736
Delta    2227
Gama     2710
Name: Pontos, dtype: int64

Filtering:

print('Times que estão presentes 4+ vezes no conjunto de dados:',\  
       dfg.filter(lambda n: len(n) >= 4),  sep='\n')
Times que estão presentes 4+ vezes no conjunto de dados:
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700

Imagination is the limit to what you can do with pd.groupby :-)

See working on repl it.