1
I have two csv separated by tab. Both have the same number of Rows and Columns. The first column POS
has the same Unique values in both df. Differences (or not), occur in the values (strings) of the columns col1:col4
.
I thought I’d run a query, something like: query = subset_pl(subset_pl.isin(subset_ad))
and from that continue with the code, but I got stuck there...
import pandas as pd
subset_ad = pd.read_csv('subset_ad.csv', sep='\t')
subset_ad.set_index('POS')
subset_ad
POS col1 col2 col3 col4
28355991 A A A A
28356037 A A A A
28356130 A A A A
28356246 A A A A
subset_pl = pd.read_csv('subset_pl.csv', sep='\t')
subset_pl.set_index('POS')
subset_pl
POS col1 col2 col3 col4
28355991 A B A A
28356037 B B B A
28356130 A B A A
28356246 A A B A
What I intend to achieve is: compare subset_ad
with subset_pl
, upgrade subset_ad
with the value of subset_pl
maintaining the value of subset_ad
separated by ,
(A,B for example), if there are different values and count these differences both in Rows and Columns by adding an Row and an extra column (cont_col
, cont_row
) to display the count of cells that have changed...
The output would be something like:
subset_ad
POS col1 col2 col3 col4 cont_row
28355991 A A,B A A 1
28356037 A,B A,B A,B A 3
28356130 A A,B A A 1
28356246 A A A,B A 1
cont_col 1 3 2 0
Any direction will be welcome!
Colleague, your question is not at all clear. To begin with, its values are strings in the "x/x" format where x seems to always be
0
,1
or.
. How is this compared? Is it string comparison? If so, what was its difficulty? Second, that columnsoma_rows
, for example, where does this value come from? Is it a sum? If so, sum of what?! Why does the end result have a comma? Anyway, I suggest you provide an example simple of your problem, perhaps with two rows and columns, and explain in detail. If no one can help you.– Luiz Vieira
Hello friend. I followed your suggestion and changed the example... At first all the data are of the same string type! I was reading here, and I thought if the option
df.merge
would not be possible in this case... So that by merging it would add the values ofsubset_pl
at thesubset_ad
only where these are different as shown above in the desired output insubset_ad[1,1]
– guidebortoli
Now yes! : ) I will answer.
– Luiz Vieira