0
Hello!
I have a dataframe with several prices. I calculated the amount of ranges (Sturges) and would like to include in a column in which range that price fits.
The expected exit would be something like:
       casa         preco               intervalo       
        A       580.000.000     67.374.999 - 583.333.333
        B       1.700.000.0     1.600.000.0 - 2.108.333.333
        C       600.000.000     67.374.999 - 583.333.333
This is my chart:
price                        frequency  percentage
(67374.999, 583333.333]        14883    68.861333
(583333.333, 1091666.667]      5499     25.443020
(1091666.667, 1600000.0]       805      3.724610
(1600000.0, 2108333.333]       238      1.101189
(2108333.333, 2616666.667]     106      0.490446
(2616666.667, 3125000.0]       44       0.203581
(3125000.0, 3633333.333]       18       0.083283
(3633333.333, 4141666.667]     9        0.041642
(4141666.667, 4650000.0]       3        0.013881
(4650000.0, 5158333.333]       2        0.009254
(5158333.333, 5666666.667]     3        0.013881
(5666666.667, 6175000.0]       0        0.000000
(6175000.0, 6683333.333]       0        0.000000
(6683333.333, 7191666.667]     2        0.009254
(7191666.667, 7700000.0]       1        0.004627
How did I calculate:
## Sturges == k = 1 + (10/3).log n 
n = df.shape[0]
k = int(round(1 + (10/3) * np.log10(n)))
k ##frequency
## k == 17
## Frequency table
## count
frequency = pd.value_counts(
    pd.cut(x = df.price, bins = k, include_lowest = True), sort = False)
## percentage
percentage = pd.value_counts(
pd.cut(x=df.price, bins = k, include_lowest = True), sort = False, normalize = True) *100 
## Formatting Frequency Table
frequency_table = ({'frequency' : frequency, 'percentage' : percentage})
frequency_table = pd.DataFrame(frequency_table)
frequency_table.rename_axis('price', axis = 'columns', inplace = True)
frequency_table
What I couldn’t do to use other methods is to understand what price is a column:
frequency_table.columns
out [ ]
Index(['Frequency', 'percentage'], dtype='Object', name='price')
Sorry about the size of the question, I’ve been trying for a few days to find a way to do this. Thank you!
I got the answer by looking in the documentation. Follow my solution if someone comes across the same situation:
I reset the index. By transforming the variables into a Dataframe, the interval calculated by the cut function became the index. For that reason I was not able to visualize it as a column (lack of attention beast, but ok)
frequency_table.reset_index(inplace=True)
I changed the name of the column
frequency_table.rename(columns={'index' : 'range'}, inplace = True)
And includes in Dataframe "original".
More quickly and simply:
df['range_price'] = pd.cut(x = df.price, bins = k, include_lowest = True)
where:
k == Sturges calculation
You can create an example Minimum, complete and verifiable problem? This greatly increases the chance of you receiving a response
– Terry