Output of np.Where only back a value

Asked

Viewed 33 times

0

I am developing code to help me analyze import spreadsheets faster and better than the tools excel offers.

Basically, I take a spreadsheet, convert to CSV and import pandas and numpy to analyze the data.

I’m focusing analysis on using np.Where:

Libraries used

import pandas as pd
import numpy as np 

Uploading the CSV database and isolating the column I’m interested in working with

database = pd.read_csv("https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5CKNZG5LXZUVMQPAMLADLQGM")
database_goodshipped = database.iloc[:,[26]]

The critical step now is to turn the 26 column of this base into a list and analyze the strings of each row of it, returning a conditional value in another column:

conteudo = [database_goodshipped]
df = pd.DataFrame({"dados_de_compra": conteudo})
df['variavel_resposta'] = np.where(df['dados_de_compra'].str.contains('FLUTRIAFOL'),
                               'FLUTRIAFOL', 'NID')
df

Basically, I want him to return me in another column the expression 'FLUTRIAFOL' if he finds this expression in any row of the column I’m analyzing (which in this case was converted into the list 'content')

inserir a descrição da imagem aqui

I’m confused and do not know what could be there for the code not working for all lines, since there are almost 3k items in the list 'content' I created and for sure have more than one line containing the 'FLUTRIAFOL'

  • Link to csv not working

  • for some reason it expires in a short time and I have to keep updating.... the following is another: https://raw.githubrcontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5GRZSSWEF5TWSX4AX3ADLSPW

  • no access again

  • puts... sorry. I don’t know yet how to make it not expire, Paulo. Thanks for the patience. I think you should go now, I made an adjustment on the github: https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5HTMLLMRGZFZRHVIYTADLT6C

  • The mistake is 404: Not Found. The repository is private or public?

  • was private... I made it public now. It should work this time: https://raw.githubrcontent.com/CaiqueBarrreto/agroci/main/basePY.csv

Show 1 more comment

1 answer

0

There is a way...

Loads variables

import pandas as pd
import numpy as np 

Creates dataframe

database = pd.read_csv("https://raw.githubusercontent.com/CaiqueBarrreto/agroci/main/basePY.csv?token=ALHIP5CKNZG5LXZUVMQPAMLADLQGM")
database_goodshipped = database.iloc[:,[26]]

Create new column using np.Where

database_goodshipped["variavel_resposta"] = np.where(database_goodshipped['Goods Shipped'].str.contains('FLUTRIAFOL'), 'FLUTRIAFOL', 'NID')

Note You can get one Warning as the below... But it works

<stdin>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Another option

database_goodshipped["nova_coluna"] = database_goodshipped['Goods Shipped'].apply(lambda row: 'FLUTRIAFOL' if 'FLUTRIAFOL' in row else 'NID')

The result

>>> database_goodshipped
                                          Goods Shipped variavel_resposta nova_coluna
0     15 X 20 CONTAINERS CONTAINING 300 BAGS OF 2,4-...               NID         NID
1     4 X 40 CONTAINERS CONTAINING 160 BAGS OF CARBE...               NID         NID
2     1 X 40 CONTAINERS CONTAINING 40 BAGS OF FLUTRI...        FLUTRIAFOL  FLUTRIAFOL
3     1 X 40 CONTAINERS CONTAINING 4032 BAGS OF HEXA...               NID         NID
4     2 X 20 CONTAINERS CONTAINING 160 DRUMS OF GALI...               NID         NID
...                                                 ...               ...         ...
3332  4 X 20 CONTAINERS CONTAINING 4 TANK OF DIMETIL...               NID         NID
3333  5 X 40 CONTAINERS CONTAINING 5040 CARTONS OF G...               NID         NID
3334  5 X 40 CONTAINERS CONTAINING 5040 CARTONS OF G...               NID         NID
3335  1 X 20 & 1 X 20 CONTAINERS CONTAINING 2 TANK O...               NID         NID
3336  3 X 20 CONTAINERS CONTAINING 240 DRUMS OF ENVI...               NID         NID

[3337 rows x 3 columns]

I hope it helps

  • Thank you, Paulo! You helped a lot!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.