Okay, after the real chat in the comments, and you’ve sent me the csv’s, "solved your problem" (don’t let that happen again, Ahahaha), let’s go. The problem is that you need to compare each row of the dataframes, and Oce is comparing the entire columns, this can not; even because the dataframes have different sizes (so gives the error), I did tests with the csv’s that you sent me, no line of the smaller csv is fully identical to the relative line (even Dice) to the larger csv, see the code:
Click here to see code below running.
import pandas as pd
df1 = pd.read_csv("JonnyTheBoy10.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
df2 = pd.read_csv("JonnyTheBoyFull.csv", usecols=['ART_TIPO', 'ART_DESIG', 'PORTA', 'CP4', 'CP3', 'LOCALIDADE'])
all_equals=[]
cp3_equal=[]
cp4_equal=[]
for index, row in df1.iterrows():
if str(row.CP4)==str(df2.CP4[index]) and str(row.CP3)==str(df2.CP3[index]):
all_equals.append(row)
if str(row.CP3)==str(df2.CP3[index]):
cp3_equal.append(row)
if str(row.CP4)==str(df2.CP4[index]):
cp4_equal.append(row)
print ('Igualdades em ambos: ', len(all_equals))
print ('Igualdades em CP3: ', len(cp3_equal))
print ('Igualdades em CP4: ', len(cp4_equal))
Igualdades em ambos: 0
Igualdades em CP3: 9
Igualdades em CP4: 0
I will leave the initial explanation (below) why it can serve other people in other contexts.
Take this example:
df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']])
df1==df2
0 1
0 False False
1 False False
Now, the same example, but with df2 indexed'.
df1 = pd.DataFrame([['A', 'B'], ['C', 'D']])
df2 = pd.DataFrame([['C', 'D'], ['A', 'B']], index=[1,0])
df1==df2
....
raise ValueError('Can only compare identically-labeled '
...
Note that an exception was raised with the same error that you report, I deleted the entire msg to make it easier.
Solution 1: Drope the indices:
df1.reset_index(drop=True) == df2.reset_index(drop=True)
0 1
0 False False
1 False False
Solution 2: Perform Sort on Axis=0:
df1.sort_index()==df2.sort_index()
0 1
0 True True
1 True True
Note that == is 'sensitive' to column order.
Applying to the example of your question.
In case of your question, just try to change the if
for:
if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and
df1.reset_index(drop=True)["CP3"] == df2.reset_index(drop=True)["CP3"]
I didn’t understand a thing
– JoãoGraça
I edited the answer and changed your if, try it and see if it works.
– Sidon
had the same error: Traceback (Most recent call last): File "C:/Users/User01/Desktop/Normmm/Norm.py", line 12, in <module> if df1.reset_index(drop=True)["CP4"] == df2.reset_index(drop=True)["CP4"] and df1.reset_index(=True)["CP3"] == df2.reset_index(drop=True)["CP3"]: File "C: anaconda lib site-Packages pandas core ops.py", line 818, in wrapper raise Valueerror(msg) Valueerror: Can compare only identically-labeled Series Objects
– JoãoGraça
All the code I have made so far https://pastebin.com/DTcTUV6M
– JoãoGraça
Now you need to "grope", first try to convert all df by dropping the indices, type df11 = df1.reset_index(drop=True), df22= df2.reset_index(drop=True) ... Then compare the novs dfs one to one, preferably on the console, so maybe you’ll find out what’s causing the error
– Sidon
I have the same mistake.
– JoãoGraça
If you want to send me the csv’s I’ll test you.
– Sidon
Could be, I’m using Pycharm and anaconda
– JoãoGraça
What you use doesn’t matter, I just need the csv files.
– Sidon
https://pastebin.com/DTcTUV6M https://drive.google.com/file/0BxdEKGO_S6HDVUZoa05BZ2hEQkE/view?usp=sharing https://drive.google.com/filed/0BxdEKGO_S6HDeXQzSWxbWdpOWs/view?usp=sharing
– JoãoGraça
You sent only 1, how do I make the comparison? To make the test I would have to have the 2 csv files.
– Sidon
are two links 1 - https://drive.google.com/file/d/0BxdEKGO_S6HDeXQzSWxEbWdpOWs/view and 2 - https://drive.google.com/filed/0BxdEKGO_S6HDVUZoa05BZ2hEQkE/view
– JoãoGraça
Okay, I’ll test it later.
– Sidon
Explain to me something, what is your goal with that
if
? Voce could edit the msg and add the blockif
complete?– Sidon
this if serves, if you find an address in df2 with CP4 and CP3 alike will write in another table and then analyze those lines that match, in order to find the correct address
– JoãoGraça
So.... Your
if
this wrong, put the if block in msg so that it becomes even clearer q ve want, I correct it.– Sidon
if df1['CP4'] == df2['CP4'] and df1["CP3"] == df2["CP3"]: # compares the CP4 and CP3 columns of df1(incomplete and wrong deaths) to df2(correct addresses) # Writes to csv teporario with all matchs # if you find only one match writes to csv Norm.csv # if it doesn’t make the similarity, with higher match is normalized
– JoãoGraça
You didn’t develop the code inside the
if
yet?– Sidon
Okay, I edited the answer and put the specific solution to your problem, now just make the adjustments to your context.
– Sidon