How to add one column of data based on another in excel through Pandas?

Asked

Viewed 125 times

4

I have a spreadsheet with multiple import data. The problem is always the format as the data comes. For example, I have a column called "Imported Goods" that has values like the below:

1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRIAFOL TECNICO SINON FLUTRIAFOL 97% TECH

By this description, I know that the imported product in this case is the Flutriafol. In excel, I used the combination below to take the values of this column and return only the product in another:

IF(ISNUMBER(SEARCH("FLUTRIAFOL", colunaX)), "Flutriafol", "Não identificado")

It worked well, but I wanted to do it now on Python, because it seems more suitable when there are many possibilities of values (and it is also my only resource available today).

I already know how to upload my base on Python and import the pandas to help me with the treatment, but I have no idea which function/command I could use to do this operation of finding a value in one column and returning it in another.

3 answers

1

To create a column based on another follow the steps below:

Create Dataframe

>>> import pandas as pd

>>> df = pd.DataFrame({"compras": ["1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRIAFOL TECNICO SINON FLUTRIAFOL 97% TECH", "OUTRA COMPRA QUALQUER"]})

>>> df
                                             compras
0  1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRI...
1                              OUTRA COMPRA QUALQUER

New column

>>> df["nova_coluna"] = df.apply(lambda x: "FLUTRIAFOL" if "FLUTRIAFOL" in x["compras"] else "Não identificado", axis=1)

>>> df
                                             compras       nova_coluna
0  1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRI...        FLUTRIAFOL
1                              OUTRA COMPRA QUALQUER  Não identificado

The lambda function could be replaced by a function explicitly defined as below:

>>> def eh_fluriafol(row):
...     if "FLUTRIAFOL" in row["compras"]:
...         return "FLUTRIAFOL" 
...     else:
...         return "Não identificado"

>>> df["nova_coluna1"] = df.apply(eh_fluriafol, axis=1)

>>> df
                                             compras       nova_coluna      nova_coluna1
0  1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRI...        FLUTRIAFOL        FLUTRIAFOL
1                              OUTRA COMPRA QUALQUER  Não identificado  Não identificado

I hope it helps

0

@imonferrari, this is the code I’m using:

import pandas as pd

import numpy as np

#import the data source

database = pd.read_csv("https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5DQQC23APAHTA2VKNTADKLSK")

#isolate the column of interest in the csv spreadsheet

database_goodshipped = database.iloc[:, [26]]

#database_goodshipped

#turn the isolated column into a list

content = [database_goodshipped]

#run np.Where df = pd.Dataframe({"purchasing data": content})

df['variavel_resposta'] = np.Where(df['dados_de_compra'].str.contains('FLUTRIAFOL'), 'FLUTRIAFOL', 'NID')

df

In the output, I end up with only one line:

inserir a descrição da imagem aqui

thanks for the help : )

  • database = pd.read_csv("https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5DQQC23APAHTA2VKNTADKLSK")
database['variavel_resposta'] = np.where(database['Goods Shipped'].str.contains('FLUTRIAFOL'), 'FLUTRIAFOL', 'NID')

0

Importing the libs

import pandas as pd
import numpy as np

Creating the test df

conteudo = ['1 X 40 CONTAINERS 40 BAGS OF FLUTRIAFOL TECNICO SINON FLUTRIAFOL 97% TECH',
            '1 X 20 CONTAINERS 20 BAGS OF BLABLABLA TECNICO SINON BLABLABLA TECH',
            '1 X 10 CONTAINERS 10 BAGS OF BLABLABLA TECNICO SINON BLABLABLA TECH']


df = pd.DataFrame({"Dados_de_compra": conteudo })

Using np.Where, where the word FLUTRIAFOL exists it will return 'FLUTRIAFOL', otherwise it will return 'Unidentified'

df['Variavel_Resposta'] = np.where(df['Dados_de_compra'].str.contains('FLUTRIAFOL'), 
                                   'FLUTRIAFOL','Não identificado')

Exit

                                     Dados_de_compra    Variavel_Resposta
0   1 X 40 CONTAINERS 40 BAGS OF FLUTRIAFOL TECNIC...   FLUTRIAFOL
1   1 X 20 CONTAINERS 20 BAGS OF BLABLABLA TECNICO...   Não identificado
2   1 X 10 CONTAINERS 10 BAGS OF BLABLABLA TECNICO...   Não identificado

Update

import pandas as pd
import numpy as np

database = pd.read_csv("https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5DQQC23APAHTA2VKNTADKLSK")
database['variavel_resposta'] = np.where(database['Goods Shipped'].str.contains('FLUTRIAFOL'), 'FLUTRIAFOL', 'NID')
  • Imonferrari, I followed the np.Where methodology adapted to my situation, but for some reason it’s not running like yours. In my case, I have a csv file, I put it in the base, extract one of his columns and turn it into a list, then follow as indicated by you. No output is only emitting a line: variable purchasing data_de_reply 0 Good... FLUTRIAFOL

  • @Caiquebarretovê transforms the column into a list? Could put here the code used?

  • @imonferrari, if I post the code as an answer to my question can you view? By comment the code becomes unreadable hahah. I posted as an answer to the question

Browser other questions tagged

You are not signed in. Login or sign up in order to post.