How do I convert string values or items from an object column into binary with pandas?

Asked

Viewed 38 times

0

I’m trying to convert a column into a data set where there is 'negative' and 'positive' for binaries or numeric items like 0 and 1, but I don’t know if I’m doing it right with the Pandas library.

pd.cut(data.Class, bins=['negative','positive'],labels=['0','1'])

but the following error appears:

ValueError: could not convert string to float: 'negative' 

I have tried by Boolean Mask as follows:

mask = data['Class'] == 'negative'

data.loc[mask, 'Class'] = 0
data.loc[~mask, 'Class'] = 1

But then he turns them all into one of the guys!

A sample of the database I’m using is:

          v8      v9      Class  
0    0.00000  0.1224   negative  
1    0.00000  0.0000   negative  
2    0.00000  0.0000   negative  
3    0.00000  0.0000   negative  
4    0.00000  0.0561   negative  
..       ...     ...        ...  
166  0.66150  0.0000   negative  
167  1.06155  0.0000   negative  
168  1.62855  0.0000   negative  
169  1.71045  0.0000   positive  
170  1.54980  0.0000   positive

I would like the result to be as follows:

          v8      v9   Class  
0    0.00000  0.1224   0  
1    0.00000  0.0000   0  
2    0.00000  0.0000   0  
3    0.00000  0.0000   0  
4    0.00000  0.0561   0  
..       ...     ...    ...  
166  0.66150  0.0000   0  
167  1.06155  0.0000   0  
168  1.62855  0.0000   0  
169  1.71045  0.0000   1  
170  1.54980  0.0000   1  
  • 1

    This is a conversion of a categorical variable into numerical. I didn’t find a question about this for pandas, so I answered your question. But it would be interesting to change the title to better reflect the content of the question. See the case of this analogous question for R: https://answall.com/questions/41889/converter-variables-qualitatives-em-factores-no-r

1 answer

2


A solution using map:

data['Class'].map({'positive':1,'negative':0})

Note that this solution serves several categories. In the specific case of binary variables, there is the get_dummies. In your case, the command would look like this:

pd.get_dummies(data['Class'])

The output will be two columns. Each one using one of the classes as reference. Choose the one you prefer.

  • thanks for the help, but I’ve tried using the map before and I don’t know why the values are null = Nan. I tried again with this tip of yours and gave in it. This get_dummies I did not know and I will try to use it with the classifier. Because I’m trying to apply with One Class SVM.

  • It was supposed to work. You can make the data available?

  • The data is from the classic KEEL database https://sci2s.ugr.es/keel/imbalanced.php. and you can see in most datasets that their classes are named with 'Positive' and 'Negative'.

  • Lucas, I got it! Most of these databases have data with blank spaces, so they advise using the skipinitialspace = True parameter in the read_csv function.

  • Ok. If everything is all right and there is no doubt left, please mark the question as resolved by clicking on "✅".

  • There is no button, so I put the tag Solved

  • the button is below the vote counter. See example in this image: https://i.stack.Imgur.com/uqJeW.png

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.