How do I convert string values or items from an object column into binary with pandas?

Question

How do I convert string values or items from an object column into binary with pandas?

Asked 4 years, 5 months ago

Viewed 38 times

0

I’m trying to convert a column into a data set where there is 'negative' and 'positive' for binaries or numeric items like 0 and 1, but I don’t know if I’m doing it right with the Pandas library.

pd.cut(data.Class, bins=['negative','positive'],labels=['0','1'])

but the following error appears:

ValueError: could not convert string to float: 'negative'

I have tried by Boolean Mask as follows:

mask = data['Class'] == 'negative'

data.loc[mask, 'Class'] = 0
data.loc[~mask, 'Class'] = 1

But then he turns them all into one of the guys!

A sample of the database I’m using is:

          v8      v9      Class  
0    0.00000  0.1224   negative  
1    0.00000  0.0000   negative  
2    0.00000  0.0000   negative  
3    0.00000  0.0000   negative  
4    0.00000  0.0561   negative  
..       ...     ...        ...  
166  0.66150  0.0000   negative  
167  1.06155  0.0000   negative  
168  1.62855  0.0000   negative  
169  1.71045  0.0000   positive  
170  1.54980  0.0000   positive

I would like the result to be as follows:

          v8      v9   Class  
0    0.00000  0.1224   0  
1    0.00000  0.0000   0  
2    0.00000  0.0000   0  
3    0.00000  0.0000   0  
4    0.00000  0.0561   0  
..       ...     ...    ...  
166  0.66150  0.0000   0  
167  1.06155  0.0000   0  
168  1.62855  0.0000   0  
169  1.71045  0.0000   1  
170  1.54980  0.0000   1

1

This is a conversion of a categorical variable into numerical. I didn’t find a question about this for pandas, so I answered your question. But it would be interesting to change the title to better reflect the content of the question. See the case of this analogous question for R: https://answall.com/questions/41889/converter-variables-qualitatives-em-factores-no-r

– Lucas

2021/02/19 at 21:00

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Lucas • **3,858** points · Answer 1 · 2021-02-19T20:56:49+00:00

2

A solution using map:

data['Class'].map({'positive':1,'negative':0})

Note that this solution serves several categories. In the specific case of binary variables, there is the get_dummies. In your case, the command would look like this:

pd.get_dummies(data['Class'])

The output will be two columns. Each one using one of the classes as reference. Choose the one you prefer.

thanks for the help, but I’ve tried using the map before and I don’t know why the values are null = Nan. I tried again with this tip of yours and gave in it. This get_dummies I did not know and I will try to use it with the classifier. Because I’m trying to apply with One Class SVM.

– StaLLoNe_CoBRa

2021/02/20 at 01:00
It was supposed to work. You can make the data available?

– Lucas

2021/02/20 at 01:01
The data is from the classic KEEL database https://sci2s.ugr.es/keel/imbalanced.php. and you can see in most datasets that their classes are named with 'Positive' and 'Negative'.

– StaLLoNe_CoBRa

2021/02/20 at 01:06
Lucas, I got it! Most of these databases have data with blank spaces, so they advise using the skipinitialspace = True parameter in the read_csv function.

– StaLLoNe_CoBRa

2021/02/20 at 02:14
Ok. If everything is all right and there is no doubt left, please mark the question as resolved by clicking on "✅".

– Lucas

2021/02/20 at 03:06
There is no button, so I put the tag Solved

– StaLLoNe_CoBRa

2021/02/20 at 18:23
the button is below the vote counter. See example in this image: https://i.stack.Imgur.com/uqJeW.png

– Lucas

2021/02/20 at 18:31

Show 2 more comments