Include the frequency table in the Dataframe (range of values)

Asked

Viewed 32 times

0

Hello!

I have a dataframe with several prices. I calculated the amount of ranges (Sturges) and would like to include in a column in which range that price fits.

The expected exit would be something like:

       casa         preco               intervalo       
        A       580.000.000     67.374.999 - 583.333.333
        B       1.700.000.0     1.600.000.0 - 2.108.333.333
        C       600.000.000     67.374.999 - 583.333.333

This is my chart:

price                        frequency  percentage
(67374.999, 583333.333]        14883    68.861333
(583333.333, 1091666.667]      5499     25.443020
(1091666.667, 1600000.0]       805      3.724610
(1600000.0, 2108333.333]       238      1.101189
(2108333.333, 2616666.667]     106      0.490446
(2616666.667, 3125000.0]       44       0.203581
(3125000.0, 3633333.333]       18       0.083283
(3633333.333, 4141666.667]     9        0.041642
(4141666.667, 4650000.0]       3        0.013881
(4650000.0, 5158333.333]       2        0.009254
(5158333.333, 5666666.667]     3        0.013881
(5666666.667, 6175000.0]       0        0.000000
(6175000.0, 6683333.333]       0        0.000000
(6683333.333, 7191666.667]     2        0.009254
(7191666.667, 7700000.0]       1        0.004627

How did I calculate:

## Sturges == k = 1 + (10/3).log n 

n = df.shape[0]
k = int(round(1 + (10/3) * np.log10(n)))
k ##frequency

## k == 17
## Frequency table

## count
frequency = pd.value_counts(
    pd.cut(x = df.price, bins = k, include_lowest = True), sort = False)

## percentage
percentage = pd.value_counts(
pd.cut(x=df.price, bins = k, include_lowest = True), sort = False, normalize = True) *100 

## Formatting Frequency Table
frequency_table = ({'frequency' : frequency, 'percentage' : percentage})

frequency_table = pd.DataFrame(frequency_table)

frequency_table.rename_axis('price', axis = 'columns', inplace = True)
frequency_table

What I couldn’t do to use other methods is to understand what price is a column:

frequency_table.columns

out [ ]

Index(['Frequency', 'percentage'], dtype='Object', name='price')

Sorry about the size of the question, I’ve been trying for a few days to find a way to do this. Thank you!




I got the answer by looking in the documentation. Follow my solution if someone comes across the same situation:

I reset the index. By transforming the variables into a Dataframe, the interval calculated by the cut function became the index. For that reason I was not able to visualize it as a column (lack of attention beast, but ok)

frequency_table.reset_index(inplace=True)

I changed the name of the column

frequency_table.rename(columns={'index' : 'range'}, inplace = True)

And includes in Dataframe "original".

More quickly and simply:

df['range_price'] = pd.cut(x = df.price, bins = k, include_lowest = True)

where:

k == Sturges calculation

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.