Fetch value of a condition on lines and separate into a new dataset

Asked

Viewed 47 times

-1

I have the following dataset

Dataset_base

I’m trying to create a dataset based on the dataset above.

I am trying to search row by row values that are not Nan, export to a new dataset this value linked to column 'CO_OCDE'

Ex. At index 0 - The only value notnull() is the 2.5
So in the new dataset I want to get is:

Dataset_resultado

But I found another "problem" in some lines there are duplicate values and need to remove them too to get the result above.

What I tried to

https://stackoverflow.com/questions/41337477/select-non-null-rows-from-a-specific-column-in-a-dataframe-and-take-a-sub-select

ex4_desc_y.loc[ex4_desc_y['NU_INTEGRALIZACAO_VESPERTINO', 'NU_INTEGRALIZACAO_NOTURNO', 'NU_INTEGRALIZACAO_INTEGRAL', 'NU_INTEGRALIZACAO_MATUTINO', 'NU_INTEGRALIZACAO_EAD'].notnull()]

  • And if you have two different values in a row? What to do?

  • I checked, all repeating values are equal, sorry I didn’t put that information above

1 answer

0


Whereas the condition described below is true:

  1. For each row, columns have either the same value or Nan

Therefore, it is enough to average them, because the method mean ignore Nan

Example

>>> import pandas as pd

>>> df = pd.DataFrame({"col1": [10, 20, 30], "col2": [None,20,30], "col3": [None,None,30]})

>>> df
   col1  col2  col3
0    10   NaN   NaN
1    20  20.0   NaN
2    30  30.0  30.0

>>> df['media'] = df.mean(axis=1)

>>> df
   col1  col2  col3  media
0    10   NaN   NaN   10.0
1    20  20.0   NaN   20.0
2    30  30.0  30.0   30.0

I hope it helps

  • 1

    But where in question A.P. is using the function "Mean" ?

  • 1

    @jsbueno, good morning, good morning! From what I understand, I believe that as the values of the lines are equal, just make the average to discover the missing value in the desired line. Hug!

  • 1

    ok - as when it has more values, they are apparently equal in this specific case, the "Mean" is a hack that can work.

  • 1

    @jsbueno, yes it looks like a 'hack', I do not trust solution so because at some point can appear different values.... But it seems that the questioner said that he checked the data and they are the same.

  • 1

    Good morning, thank you very much for the answer! In this case the values will not be different, because the values refer to the duration of a course, the codes of the course are separated between day and night. So in this case you won’t have this problem. But I was trying to adopt a more complicated solution, this solution was sensational! Thank you very much

  • @jsbueno , I had asked before giving my answer (see comments on the question) what to do if the numbers were different. I was told there would be no such case.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.