0
I have a Pandas dataframe, in which I need to add a new column called codprojeto
. To do this, I created this new column and inserted zeros, so that it has the type int64
, as follows:
df['codprojeto'] = 0
This new column should be fed with the codprojeto
present in another dataframe. This column is of the type int64
, reason I took the previous step.
As a comparison, I’m using the column CNPJ
, with values in both Dfs. If equal, you must fill in df['codprojeto']
the value of df2['codprojeto']
.
Attempt 1:
for i in range(len(df['CNPJ'])):
for j in range(len(df2['CNPJ'])):
if df2.loc[j, 'CNPJ'] == df.loc[i, 'CNPJ']:
df.loc[i, 'codprojeto'] == df2.loc[j, 'codprojeto']
Returns:
Erro: KeyError: 4
Attempt 2:
for index, row in df.iterrows():
for index2, row2 in df2.iterrows():
if str(row['CNPJ']) == row2['CNPJ']:
df.loc[index,'codprojeto'] = df2.loc[index2,'codprojeto']
Returns: eternal execution; pending processing.
Data set (sample):
DF:
CNPJ,DATA,codprojeto
00000000000123,2020-12-02 00:00:00 UTC,0
99900000000123,2020-12-02 00:00:00 UTC,0
00000000000123,2020-12-02 00:00:00 UTC,0
00000000000123,2020-12-02 00:00:00 UTC,0
00000000000145,2020-12-02 00:00:00 UTC,0
00000000000123,2020-12-02 00:00:00 UTC,0
00000000000167,2020-12-02 00:00:00 UTC,0
00000000000167,2020-12-02 00:00:00 UTC,0
00000000000167,2020-12-02 00:00:00 UTC,0
00000000000167,2020-12-02 00:00:00 UTC,0
00000000000101,2020-12-02 00:00:00 UTC,0
00000000000122,2020-12-02 00:00:00 UTC,0
00000000000144,2020-12-02 00:00:00 UTC,0
00000000000123,2020-12-02 00:00:00 UTC,0
00000000000155,2020-12-02 00:00:00 UTC,0
00000000000155,2020-12-02 00:00:00 UTC,0
00000000000155,2020-12-02 00:00:00 UTC,0
00000000000166,2020-12-02 00:00:00 UTC,0
99900000000123,2020-12-02 00:00:00 UTC,0
99900000000123,2020-12-02 00:00:00 UTC,0
DF2:
"codcliente";"nome";"CNPJ";"codprojeto"
1;"CLIENTE 1";"00000000000123";1234
2;"CLIENTE 1";"00000000000145";5678
3;"CLIENTE 1";"00000000000167";9012
4;"CLIENTE 1";"00000000000189";3456
5;"CLIENTE 1";"00000000000101";7890
6;"CLIENTE 1";"00000000000122";11
7;"CLIENTE 1";"00000000000133";22
8;"CLIENTE 1";"00000000000144";33
9;"CLIENTE 9";"00000000000155";44
10;"CLIENTE 10";"00000000000166";55
The original DF and DF2 have respectively 635939 and 1054 lines.
For cases that there is no CNPJ compatible, leave as 0 the codprojeto.
How can I fix this?
Marlos, good afternoon! Can you make the data set available? Hug!
– lmonferrari
Without the data it is difficult to answer, but I tell you that 1) This solution is extremely slow, use
np.where
instead of nested loops O(n 2); 2) On the last line the correct is=
and not==
; 3) The error is possibly occurring pq 4 is not in the index ofdf
ordf2
– Lucas
Includes the dataset in the question. About the '==' in the last line, I have already made the correction. About Keyerror 4, it’s strange, because I’m not looking for value 4, but comparing between 2 separate values.
– marloswn
A tip to always get objective answers that meet you is to disbonibilize samples of data that contemplate all the problems you are going through. With the data you disbonibilizou you received 2 different responses that apitaram 2 different errors. There is a unique topic showing how to create a minimum, complete and verifiable example
– Terry
Easy, thanks for the tip, Terry.
– marloswn