How to use groupBy with pd. Cut?

Asked

Viewed 36 times

1

I’m trying to perform a groupby with the pd.cut() function, but I’m getting weird results. I have the following columns:

    jul/16        Ago/16

 R$ 34.252,00  R$ 52.000,00 
 R$ 33.850,00  R$ 33.850,00 
 R$ 34.850,00  R$ 44.590,00 
 R$ 35.850,00  R$ 45.590,00 
 R$ 36.850,00  R$ 46.590,00 
 R$ 37.850,00  R$ 47.590,00 
 R$ 38.850,00  R$ 48.590,00 
 R$ 39.850,00  R$ 49.590,00 
 R$ 40.850,00  R$ 50.590,00 
 R$ 41.850,00  R$ 51.590,00 
 R$ 42.850,00  R$ 52.590,00

And when I do pd.cut() in the column of Jul/16...

quartis = pd.cut(teste['Jul/16'], 4)

I have the following rooms:

[(33841.0, 36100.0] < (36100.0, 38350.0] < (38350.0, 40600.0] < (40600.0, 42850.0]]

But when I use the groupby with the quartiles found, to apply descriptive functions in the column Aug/16, I have strange results:

                      min    max     mean  count
Jul/16                                          
(33841.0, 36100.0]  33850  52000  44007.5      4
(36100.0, 38350.0]  46590  47590  47090.0      2
(38350.0, 40600.0]  48590  49590  49090.0      2
(40600.0, 42850.0]  50590  52590  51590.0      3

Like the maximum for the first group, which goes from 33841 to 36100, is holding a value of 52000? It’s like the groups are being ignored, they can help me?

  • 1

    The answer is: You are setting up the quartiles based on July, but using the August values. Within the first quarter of July (33841 to 36100), you have, in the August column, the value 53000. I hope to have been able to explain.

  • I get it now, he’s not filtering Aug/16 like I thought he would.. You know how to do that? For example, I have this categorical Jul/16 data with ranges of values, and I would like Aug/16 to follow these ranges, there is some way to do this?

  • I don’t understand what you want to do. You can update the post with the desired output for what you need?

  • For example, I found the intervals for Jul/16, correct? Now, I wanted to filter Aug/16 with the same intervals, that is, if the first interval is this: (33841.0, 36100.0], I wanted him to give me a minimum, maximum and counting, of the numbers of Aug/16, that fit the proposed interval. In this case, it would be only 1 number, the R $ 33.850,00, since all others are larger than the proposed intervention. In this case the count would be 1, and the minimum and maximum would be equal. I could explain better?

1 answer

0

I believe the solution may be using the method between

Creating Dataframe

df = pd.DataFrame({"jul": [34252, 33850, 34850, 35850, 36850, 37850, 38850, 39850, 40850, 41850, 42850], "ago": [52000, 33850, 44590, 45590, 46590, 47590, 48590, 49590, 50590, 51590, 52590]})

Generating quartiles

quartis = pd.cut(df['jul'], 4)

Printing Dataframe based if August is inside the quartile

for quartil in set(quartis):
    print(df[df['ago'].between(quartil.left, quartil.right)])
    print(80*"-")

For the example of data that was passed in question, the output would be:

Empty DataFrame
Columns: [jul, ago]
Index: []
--------------------------------------------------------------------------------
Empty DataFrame
Columns: [jul, ago]
Index: []
--------------------------------------------------------------------------------
     jul    ago
1  33850  33850
--------------------------------------------------------------------------------
Empty DataFrame
Columns: [jul, ago]
Index: []
--------------------------------------------------------------------------------

Browser other questions tagged

You are not signed in. Login or sign up in order to post.