Dataframe - Pandas. Creating new column after comparison between columns

Question

Dataframe - Pandas. Creating new column after comparison between columns

Asked 6 years, 4 months ago

Viewed 14,334 times

0

I have the following df

I need to add the code of the CODIGO column in the cod_city column in cases where the value of the ds_city column equals the LOCALITY column.

I’ve tried that way, but it didn’t roll:

def acha_codigo(registro):
    if registro['ds_cidade'] == registro['LOCALIDADE']:
        return registro['CODIGO']
    else:
        return 'UNKNOWN'

Expected result:

You say you need to "add the COLUMN code to the cod_city column," can you explain that further? I didn’t see a column called COLUMN. :-) It would be interesting to print the df and put the result instead of the image. It is easier to help.

– Sidon

2019/04/04 at 01:05
Oops, my mistake. And the CODE column. This image is a print of df.head()

– Shadowmonster

2019/04/04 at 01:10
So... Always put the print result (contrl-c/control-v) instead of the image, so those who try to help can take advantage to do tests.

– Sidon

2019/04/04 at 01:13
It would be interesting too, that you put as you want it to be at the end.

– Sidon

2019/04/04 at 01:14
Possible duplicate of Dataframe Pandas - Calculate column based on other

– AlexCiuffa

2019/04/04 at 01:42
Ah, got it @Sidon. I’ll do it.

– Shadowmonster

2019/04/04 at 02:34
@Alexciuffa I think this related, but I don’t think it’s duplicate.

– Sidon

2019/04/04 at 11:46
Now that the problem has become clearer, I really saw that it is not duplicate.

– AlexCiuffa

2019/04/04 at 12:44

Show 3 more comments

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Sidon • **6,563** points · Answer 1 · 2019-04-04T02:00:16+00:00

TL;DR

In the image of your example, there is no conicity between the columns LOCALIDADEand ds_cidade, so I created a DataFrame where one of the rows has these conicised columns, then I create a new column (nova_coluna), iteration on df to check if there are coincidences in the values of the cited columns, if yes, change the value of the nova_coluna for the value of the column CODIGO.

import pandas as pd

data = [(5200050, 'ABADIA DE GOIAS', 'GO', 'NAN', 'ABAETETUBA'), 
        (3100104, 'ABADIA DOS DOURADOS', 'MG', 'NAN', 'ABAETETUBA'),
        (5200100, 'ABADIANA', 'GO', 'NAN', 'ABADIANA'),
        (3100203, 'ABAETE', 'MG', 'NAN', 'ABAETETUBA'),
        (1500107, 'ABETETUBA', 'GO', 'NAN', 'ABAETETUBA')]


df = pd.DataFrame(data, columns = ['CODIGO', 'LOCALIDADE', 'UF', 'cod_cidade', 'ds_cidade'])

print(df)

Exit:

    CODIGO           LOCALIDADE  UF cod_cidade   ds_cidade
0  5200050      ABADIA DE GOIAS  GO        NAN  ABAETETUBA
1  3100104  ABADIA DOS DOURADOS  MG        NAN  ABAETETUBA
2  5200100             ABADIANA  GO        NAN    ABADIANA
3  3100203               ABAETE  MG        NAN  ABAETETUBA
4  1500107            ABETETUBA  GO        NAN  ABAETETUBA

Creating a new column in the DataFrame.

# Criando nova coluna
df['nova_coluna'] = 'NaN'

Iterating:

for index, row in df.iterrows():
    if row['LOCALIDADE'] == row['ds_cidade']:
        df.loc[index,'nova_coluna'] =  str(df.loc[index,'CODIGO'])  

# Apresentando o novo df:
print(df)

Exit, to the df modified:

    CODIGO           LOCALIDADE  UF cod_cidade   ds_cidade nova_coluna
0  5200050      ABADIA DE GOIAS  GO        NAN  ABAETETUBA         NaN
1  3100104  ABADIA DOS DOURADOS  MG        NAN  ABAETETUBA         NaN
2  5200100             ABADIANA  GO        NAN    ABADIANA     5200100
3  3100203               ABAETE  MG        NAN  ABAETETUBA         NaN
4  1500107            ABETETUBA  GO        NAN  ABAETETUBA         NaN

See working on repl.it.