How to insert a line in a Dataframe Pandas in the middle of other lines?

Asked

Viewed 7,854 times

2

I have an output of sensor data that has the following desired structure:

--- Beginning ---

$LAGM,Colar03,Yellow,32262,-31226,-5120,-104,40,190,1662.00,1670.00,236.00,MGAL $GPGGA,113203.181,2026.6812,S,05443.4264,W,1,03,3.4,0.0,M,4.8,M,,0000*68 $GPGSA,A,2,07,23,30,,,,,,,,,,3.5,3.4,0.9*3E $GPGSV,3,1,12,30,54,247,37,07,54,185,38,09,51,135,32,28,37,352,10*7E $GPGSV,3,2,12,23,31,096,43,06,20,297,26,03,18,029,30,08,10,088,33*77 $GPGSV,3,3,12,02,10,263,24,05,05,218,28,16,00,146,,27,00,118,*70 $GPRMC,113203.181,A,2026.6812,S,05443.4264,W,000.0,000.0,241017,,,A*63 $GPVTG,000.0,T,,M,000.0,N,000.0,K,A*0D

--- End ---

Where the information in the Dataframe of Pandas is positioned sequentially, as a table and its records. It turns out that the sensor, in some blocks of code, simply did not record some data, getting missing information, according to the following block:

--- Beginning ---

$LAGM,Colar03,Yellow,6,27904,6144,332,-172,-216,1536,109,24,MGAL $GPGGA,120025,0,N,0,E,0,0,0,0,M,0,MŽÆF¦F&Ö $GPRMC,120025,V,0,N,0,E,0,0,280606,,,N*78 $GPVTG,0,T,,M,0,N,0,K,N*02

--- End ---

It is possible to notice that information is missing, as well as the existence of Non-ascii characters (This is another treatment). My main contribution in the master’s degree is the pre-processing, where it consists in rescuing these missing lines and putting an average between the values, rescuing the lost information.

However, in excel it is possible to perform this "on the arm" according to the gif animated below:

Inserindo uma linha

Where it drags the existing records down so we can add new data in the created lines, but I could not find a way to add this information in Pandas like the image.

I wonder, it is possible to perform this action of inserting a line with new information in Pandas with some function or even circumvent it with Python?

1 answer

1


Assuming we have this Dataframe:

d = {'nome': ['maria', 'Pedro', 'Mario'], 'idade': [30, 45, 36], 'estado': ['SP', 'BA', 'RJ']}
df = pd.DataFrame(data=d)

>>> print df
  estado  idade   nome
0     SP     30  maria
1     BA     45  Pedro
2     RJ     36  Mario

Using that function I copied from here: https://stackoverflow.com/questions/24284342/insert-a-row-to-pandas-dataframe

def inserir_linha(idx, df, df_inserir):
    dfA = df.iloc[:idx, ]
    dfB = df.iloc[idx:, ]

    df = dfA.append(df_inserir).append(dfB).reset_index(drop = True)

    return df

This function breaks the Dataframe in two (dfa and dfB) into a predetermined id (idx) and then joins df_with dfB in that order.

Using the function:

d_iserido = d = {'nome': ['nome_iserido1','nome_2'], 'idade': [0,100], 'estado': ['EXEMPLO', 'olaa']}
df_iserido = pd.DataFrame(data = d_iserido)
df = inserir_linha(1, df, df_iserido)

>>> print df
    estado  idade           nome
0       SP     30          maria
1  EXEMPLO      0  nome_iserido1
2     olaa    100         nome_2
3       BA     45          Pedro
4       RJ     36          Mario
  • Thank you Alexciuffa, that solved my problem very efficiently. Now I just need to develop an iteration that goes through all the data and calls this function. Again thankful!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.