0
Hello!
I have a dataframe with several prices. I calculated the amount of ranges (Sturges) and would like to include in a column in which range that price fits.
The expected exit would be something like:
casa preco intervalo
A 580.000.000 67.374.999 - 583.333.333
B 1.700.000.0 1.600.000.0 - 2.108.333.333
C 600.000.000 67.374.999 - 583.333.333
This is my chart:
price frequency percentage
(67374.999, 583333.333] 14883 68.861333
(583333.333, 1091666.667] 5499 25.443020
(1091666.667, 1600000.0] 805 3.724610
(1600000.0, 2108333.333] 238 1.101189
(2108333.333, 2616666.667] 106 0.490446
(2616666.667, 3125000.0] 44 0.203581
(3125000.0, 3633333.333] 18 0.083283
(3633333.333, 4141666.667] 9 0.041642
(4141666.667, 4650000.0] 3 0.013881
(4650000.0, 5158333.333] 2 0.009254
(5158333.333, 5666666.667] 3 0.013881
(5666666.667, 6175000.0] 0 0.000000
(6175000.0, 6683333.333] 0 0.000000
(6683333.333, 7191666.667] 2 0.009254
(7191666.667, 7700000.0] 1 0.004627
How did I calculate:
## Sturges == k = 1 + (10/3).log n
n = df.shape[0]
k = int(round(1 + (10/3) * np.log10(n)))
k ##frequency
## k == 17
## Frequency table
## count
frequency = pd.value_counts(
pd.cut(x = df.price, bins = k, include_lowest = True), sort = False)
## percentage
percentage = pd.value_counts(
pd.cut(x=df.price, bins = k, include_lowest = True), sort = False, normalize = True) *100
## Formatting Frequency Table
frequency_table = ({'frequency' : frequency, 'percentage' : percentage})
frequency_table = pd.DataFrame(frequency_table)
frequency_table.rename_axis('price', axis = 'columns', inplace = True)
frequency_table
What I couldn’t do to use other methods is to understand what price is a column:
frequency_table.columns
out [ ]
Index(['Frequency', 'percentage'], dtype='Object', name='price')
Sorry about the size of the question, I’ve been trying for a few days to find a way to do this. Thank you!
I got the answer by looking in the documentation. Follow my solution if someone comes across the same situation:
I reset the index. By transforming the variables into a Dataframe, the interval calculated by the cut function became the index. For that reason I was not able to visualize it as a column (lack of attention beast, but ok)
frequency_table.reset_index(inplace=True)
I changed the name of the column
frequency_table.rename(columns={'index' : 'range'}, inplace = True)
And includes in Dataframe "original".
More quickly and simply:
df['range_price'] = pd.cut(x = df.price, bins = k, include_lowest = True)
where:
k == Sturges calculation
You can create an example Minimum, complete and verifiable problem? This greatly increases the chance of you receiving a response
– Terry