Python Error in if

Asked

Viewed 236 times

1

I am having an error in the if and do not know how to correct this error, I am using Python 3.6 and Pandas for reading, writing and data analysis.

df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]

I have this mistake:

Traceback (most recent call last):
  File "C:/Users/User01/Desktop/Normmm/Norm.py", line 11, in <module>
    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]:
  File "C:\anaconda\lib\site-packages\pandas\core\ops.py", line 818, in wrapper
    raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects

1 answer

0


Okay, after the real chat in the comments, and you’ve sent me the csv’s, "solved your problem" (don’t let that happen again, Ahahaha), let’s go. The problem is that you need to compare each row of the dataframes, and Oce is comparing the entire columns, this can not; even because the dataframes have different sizes (so gives the error), I did tests with the csv’s that you sent me, no line of the smaller csv is fully identical to the relative line (even Dice) to the larger csv, see the code:

Click here to see code below running.

import pandas as pd
df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

all_equals=[]
cp3_equal=[]
cp4_equal=[]

for index, row in df1.iterrows():
    if str(row.CP4)==str(df2.CP4[index]) and str(row.CP3)==str(df2.CP3[index]):
        all_equals.append(row)

    if  str(row.CP3)==str(df2.CP3[index]):
        cp3_equal.append(row)

    if str(row.CP4)==str(df2.CP4[index]):
        cp4_equal.append(row)  


print ('Igualdades em ambos: ', len(all_equals))
print ('Igualdades em CP3: ', len(cp3_equal))
print ('Igualdades em CP4: ', len(cp4_equal))

Igualdades em ambos:  0
Igualdades em CP3:  9
Igualdades em CP4:  0

I will leave the initial explanation (below) why it can serve other people in other contexts.

Take this example:

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']])
df1==df2
       0      1
0  False  False
1  False  False

Now, the same example, but with df2 indexed'.

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']], index=[1,0])
df1==df2
....
raise ValueError('Can only compare identically-labeled '
...

Note that an exception was raised with the same error that you report, I deleted the entire msg to make it easier.

Solution 1: Drope the indices:

df1.reset_index(drop=True) == df2.reset_index(drop=True)
       0      1
0  False  False
1  False  False

Solution 2: Perform Sort on Axis=0:

df1.sort_index()==df2.sort_index()
      0     1
0  True  True
1  True  True

Note that == is 'sensitive' to column order.

Applying to the example of your question.

In case of your question, just try to change the if for:

if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and
df1.reset_index(drop=True)["CP3"] == df2.reset_index(drop=True)["CP3"]
  • I didn’t understand a thing

  • I edited the answer and changed your if, try it and see if it works.

  • had the same error: Traceback (Most recent call last): File "C:/Users/User01/Desktop/Normmm/Norm.py", line 12, in <module> if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and df1.reset_index(=True)["CP3"] == df2.reset_index(drop=True)["CP3"]: File "C: anaconda lib site-Packages pandas core ops.py", line 818, in wrapper raise Valueerror(msg) Valueerror: Can compare only identically-labeled Series Objects

  • All the code I have made so far https://pastebin.com/DTcTUV6M

  • Now you need to "grope", first try to convert all df by dropping the indices, type df11 = df1.reset_index(drop=True), df22= df2.reset_index(drop=True) ... Then compare the novs dfs one to one, preferably on the console, so maybe you’ll find out what’s causing the error

  • I have the same mistake.

  • If you want to send me the csv’s I’ll test you.

  • Could be, I’m using Pycharm and anaconda

  • What you use doesn’t matter, I just need the csv files.

  • https://pastebin.com/DTcTUV6M https://drive.google.com/file/0BxdEKGO_S6HDVUZoa05BZ2hEQkE/view?usp=sharing https://drive.google.com/filed/0BxdEKGO_S6HDeXQzSWxbWdpOWs/view?usp=sharing

  • You sent only 1, how do I make the comparison? To make the test I would have to have the 2 csv files.

  • are two links 1 - https://drive.google.com/file/d/0BxdEKGO_S6HDeXQzSWxEbWdpOWs/view and 2 - https://drive.google.com/filed/0BxdEKGO_S6HDVUZoa05BZ2hEQkE/view

  • Okay, I’ll test it later.

  • Explain to me something, what is your goal with that if? Voce could edit the msg and add the block if complete?

  • this if serves, if you find an address in df2 with CP4 and CP3 alike will write in another table and then analyze those lines that match, in order to find the correct address

  • So.... Your if this wrong, put the if block in msg so that it becomes even clearer q ve want, I correct it.

  • if df1['CP4'] == df2['CP4'] and df1["CP3"] == df2["CP3"]: # compares the CP4 and CP3 columns of df1(incomplete and wrong deaths) to df2(correct addresses) # Writes to csv teporario with all matchs # if you find only one match writes to csv Norm.csv # if it doesn’t make the similarity, with higher match is normalized

  • You didn’t develop the code inside the if yet?

  • Okay, I edited the answer and put the specific solution to your problem, now just make the adjustments to your context.

Show 14 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.