NOTE: The solution presented below NAY will meet if existing numbers are repeated.
Defining dataframe
>>> df = pd.DataFrame([['A',1,100],['B',2,None],['C',3,None],['D',4,182],['E',5,None]], columns=['A','B','C'])
Creating a new column, copying column C and replacing Nan with the last valid observation
>>> df['D'] = df['C'].fillna(method='ffill')
>>> df
A B C D
0 A 1 100.0 100.0
1 B 2 NaN 100.0
2 C 3 NaN 100.0
3 D 4 182.0 182.0
4 E 5 NaN 182.0
Calculating differences
>>> df['diferenca'] = df['D'].diff()
Upshot
>>> df
A B C D diferenca
0 A 1 100.0 100.0 NaN
1 B 2 NaN 100.0 0.0
2 C 3 NaN 100.0 0.0
3 D 4 182.0 182.0 82.0
4 E 5 NaN 182.0 0.0
Leaving only the results
>>> import numpy as np
>>> df['diferenca'].replace({0: np.nan}, inplace=True)
>>> df
A B C D diferenca
0 A 1 100.0 100.0 NaN
1 B 2 NaN 100.0 NaN
2 C 3 NaN 100.0 NaN
3 D 4 182.0 182.0 82.0
4 E 5 NaN 182.0 NaN
Returning to note initial...
>>> df = pd.DataFrame([['A',1,182],['B',2,None],['C',3,None],['D',4,182],['E',5,None]], columns=['A','B','C'])
>>> df
A B C
0 A 1 182.0
1 B 2 NaN
2 C 3 NaN
3 D 4 182.0
4 E 5 NaN
>>> df['D'] = df['C'].fillna(method='ffill')
>>> df
A B C D
0 A 1 182.0 182.0
1 B 2 NaN 182.0
2 C 3 NaN 182.0
3 D 4 182.0 182.0
4 E 5 NaN 182.0
>>> df['diferenca'] = df['D'].diff()
>>> df
A B C D diferenca
0 A 1 182.0 182.0 NaN
1 B 2 NaN 182.0 0.0
2 C 3 NaN 182.0 0.0
3 D 4 182.0 182.0 0.0
4 E 5 NaN 182.0 0.0
>>> df['diferenca'].replace({0: np.nan}, inplace=True)
>>> df
A B C D diferenca
0 A 1 182.0 182.0 NaN
1 B 2 NaN 182.0 NaN
2 C 3 NaN 182.0 NaN
3 D 4 182.0 182.0 NaN
4 E 5 NaN 182.0 NaN
>>>
In this case there would be no results
I believe using
df['C'].notnull()
would be more intuitive than the denial ofisnull()
. You could also avoid the use ofmerge
attributing thediff_series
directly to a new column to the DF, as the intersection between them is made by the indices (which have been preserved), something like:df['C diff] = diff_series
. In fact, it’s an answer :)– Terry