I would like to know faster ways to execute the insertion of information in one column based on the value of another

Asked

Viewed 37 times

0

I have a Dataframe with numerous columns, but for the following question the important columns are:

  • ID (number) of the report
  • Product
  • Event

Example:

id_relato event product
456 edema medication1
456 itching
456 sleepiness
789 erythema medication2
789 dizziness

A single product report may contain more than 1 Event and therefore may contain more than 1 line. However, the product is not filled in the other lines, only in the first one and so I made a for loop to insert the name of this product in these other lines of the Product column.

for i in relatos['id_elato']:
   relatos.loc[relatos['id_elato'] == i, 'produto'] = list(relatos.loc[relatos['id_elato'] == i]['produto'].unique())[0]

Upshot:

id_relato event product
456 edema medication1
456 itching medication1
456 sleepiness medication1
789 erythema medication2
789 dizziness medication2

I get the expected result, however, in a larger dataframe the processing is very time consuming. So there would be better performing alternatives than for loop?

1 answer

1


Whereas the product fields are empty, i.e., empty string (""). Follow the steps below:

Creating Dataframe for Testing

import pandas as pd

df = pd.DataFrame({"id_relato": [456, 456, 456, 789, 789], "evento": ["edema", "prurido", "sonolência", "eritema", "tontura"], "produto": ["medicamento1", "", "", "medicamento2", ""]})

df

   id_relato      evento       produto
0        456       edema  medicamento1
1        456     prurido
2        456  sonolência
3        789     eritema  medicamento2
4        789     tontura

Replacing the empty string with Nan

import numpy as np

df["produto"] = df["produto"].replace("", np.nan)

df

   id_relato      evento       produto
0        456       edema  medicamento1
1        456     prurido           NaN
2        456  sonolência           NaN
3        789     eritema  medicamento2
4        789     tontura           NaN

Using fillna with the method ffill (forward Fill)

df = df.fillna(method="ffill")

df
   id_relato      evento       produto
0        456       edema  medicamento1
1        456     prurido  medicamento1
2        456  sonolência  medicamento1
3        789     eritema  medicamento2
4        789     tontura  medicamento2

Browser other questions tagged

You are not signed in. Login or sign up in order to post.