How does pivot table work on pandas?

Asked

Viewed 1,805 times

0

I’m new to data science and I’m trying to use the Dataframe.pivot() of the Pandas to create a heatmap but he’s returning me this mistake:

Valueerror: Index contains Duplicate Entries, cannot reshape

I’m not able to solve it, but when I change the parameters of pivot() works, however, many values return as NaN.

I researched some topics about it but could not find a solution.

The heatmap format will be with the year columns and the rows will be the months.

Structure of the Dataframe

data        usuarios    ano     mes      dia    ano-mes     mes-dia
2018-01-01  215         2018    01       01     2018-01     01-01
2018-01-02  167         2018    01       02     2018-01     01-02
2018-01-03  123         2018    01       03     2018-01     01-03
2018-01-04  193         2018    01       04     2018-01     01-04
2018-01-05  235         2018    01       05     2018-01     01-05
2018-01-06  241         2018    01       06     2018-01     01-06

Series Type

data        datetime64[ns]
usuarios             int64
ano                 object
mes                 object
dia                 object
ano-mes             object
mes-dia             object

Attempts

# Com esse trecho, está me retornando o erro que especifiquei acima
test = df.pivot("ano", "mes", "usuarios")

# Utilizando assim ele até funciona, mas todos os valores ficam NaN
test2 = df.pivot("data", "mes", "usuarios")

Values after using the pivot df.pivot("data", "mes", "usuarios") that worked

mes 01  02  03  04  05  06  07  08  09  10  11  12
data                                                
2018-01-01  215.0   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-02  169.0   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Why values are being set as NaN and how I could use the pivot referring only the year and month without returning the mentioned error?

  • Could you add more lines to the bank so we can replicate the error? If you cannot provide this data, give a minimum replicable example. See instructions here: https://answall.com/help/minimal-reproducible-example

  • What is your cross-service unit?

  • if possible, try to clarify what output you want. Your heatmap will have variables in the row and columns?

  • 1

    I edited the question with the information you requested

1 answer

2


There are two ways to pivot in the pandas, the pivot and the pivot_table. Both the pivot as to pivot_table return a dataframe as output, the difference between them is that the first does not accept aggregation, as can be seen in the documentation.

Pivot

Getting back to your problem, at least for me, it is not clear whether there is repetition of the combination ano-mês in your bank. If there is no repetition, you do not need aggregation and your problem can be solved with pivot. See a replicable example:

import pandas as pd

df = pd.DataFrame({'ano': [2017,2018,2017,2018,2017,2018],
                       'mes': [1,1,2,2,3,3],
                       'usuarios': [215,167,123,193,235,241]})
print(df)

    ano mes usuarios
0   2017    1   215
1   2018    1   167
2   2017    2   123
3   2018    2   193
4   2017    3   235
5   2018    3   241

df.pivot(values = 'usuarios', index = 'mes', columns = 'ano')

Output:

ano 2017   2018
mes     
1   215     167
2   123     193
3   235     241

Pivot Table

On the other hand, if there is repetition in the pattern ano-mês, then you will have to aggregate the number of users in some way and will have to use pivot_table. See this example below using sum as aggregation function:

df = pd.DataFrame({'ano': [2017,2018,2017,2018,2017,2018],
                       'mes': [1,2,2,2,3,3],
                       'usuarios': [215,167,123,193,235,241]})
df

    ano mes usuarios
0   2017    1   215
1   2018    2   167
2   2017    2   123
3   2018    2   193
4   2017    3   235
5   2018    3   241

df.pivot_table(values = 'usuarios', index = 'mes', columns = 'ano', aggfunc = 'sum')

Output:

ano 2017    2018
mes     
1   215.0   NaN
2   123.0   360.0
3   235.0   241.0

Note that in this case, there is a NaN in the month 1 of 2018, this is because there is no entry in the original bank.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.