Python Error in if

Question

Python Error in if

Asked 8 years, 2 months ago

Viewed 236 times

1

I am having an error in the if and do not know how to correct this error, I am using Python 3.6 and Pandas for reading, writing and data analysis.

df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]

I have this mistake:

Traceback (most recent call last):
  File "C:/Users/User01/Desktop/Normmm/Norm.py", line 11, in <module>
    if df1["CP4"] == df2["CP4"] and df1["CP3"] == df2["CP3"]:
  File "C:\anaconda\lib\site-packages\pandas\core\ops.py", line 818, in wrapper
    raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects

1 answer

Browser other questions tagged python python-3.x if pandas

You are not signed in. Login or sign up in order to post.

by Sidon • **6,563** points · Answer 1 · 2017-06-09T09:33:03+00:00

Okay, after the real chat in the comments, and you’ve sent me the csv’s, "solved your problem" (don’t let that happen again, Ahahaha), let’s go. The problem is that you need to compare each row of the dataframes, and Oce is comparing the entire columns, this can not; even because the dataframes have different sizes (so gives the error), I did tests with the csv’s that you sent me, no line of the smaller csv is fully identical to the relative line (even Dice) to the larger csv, see the code:

Click here to see code below running.

import pandas as pd
df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])

all_equals=[]
cp3_equal=[]
cp4_equal=[]

for index, row in df1.iterrows():
    if str(row.CP4)==str(df2.CP4[index]) and str(row.CP3)==str(df2.CP3[index]):
        all_equals.append(row)

    if  str(row.CP3)==str(df2.CP3[index]):
        cp3_equal.append(row)

    if str(row.CP4)==str(df2.CP4[index]):
        cp4_equal.append(row)  


print ('Igualdades em ambos: ', len(all_equals))
print ('Igualdades em CP3: ', len(cp3_equal))
print ('Igualdades em CP4: ', len(cp4_equal))

Igualdades em ambos:  0
Igualdades em CP3:  9
Igualdades em CP4:  0

I will leave the initial explanation (below) why it can serve other people in other contexts.

Take this example:

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']])
df1==df2
       0      1
0  False  False
1  False  False

Now, the same example, but with df2 indexed'.

df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']], index=[1,0])
df1==df2
....
raise ValueError('Can only compare identically-labeled '
...

Note that an exception was raised with the same error that you report, I deleted the entire msg to make it easier.

Solution 1: Drope the indices:

df1.reset_index(drop=True) == df2.reset_index(drop=True)
       0      1
0  False  False
1  False  False

Solution 2: Perform Sort on Axis=0:

df1.sort_index()==df2.sort_index()
      0     1
0  True  True
1  True  True

Note that == is 'sensitive' to column order.

Applying to the example of your question.

In case of your question, just try to change the if for:

if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and
df1.reset_index(drop=True)["CP3"] == df2.reset_index(drop=True)["CP3"]