Doubt about converting str to int with py/pandas ((dataset))

Asked

Viewed 34 times

-1

I have a column of notes in string format. They come as follows: 432432.0 Always comes with . 0 then I wanted to go straight to int and not to float As ta in this format I saw that I should first convert to float and then to int Only that there are some lines that had the string "Unknown" that I previously changed everything to None Then when I turn the conversation from float to int of this error:

ValueError: Cannot convert non-finite values (NA or inf) to integer

I can solve this error with this modification here:

thereof:

data['Score-9'] = data['Score-9'].astype(int)

for that reason:

data['Score-9'] = data['Score-9'].fillna(0).astype(int)

but I don’t want None’s gaps to be 0 because that’s gonna change the averages, medians and the like. How can I do the conversion by continuing with None and without error?

  • I believe you add the average to nan will affect average and fashion. The question should be what to do with "Unknown". If they should be ignored, replace them with np.nan, for this import the numpy (import numpy as np). With this: a series 1, nan, 3 would average = 2.

3 answers

1

In average, median and fashion calculation; the value NaN is ignored.

Give preference to the type float.

Take the example:

Creating Dataframe Test

df = pd.DataFrame({"valores": ["1", "1", "Unknown", "4"]})

df
   valores
0        1
1        1
2  Unknown
3        4

Converting Unkown for NaN

import numpy as np

df = df.replace("Unknown", np.nan)

In case you try to convert to int, the exception will be made

df["valores"] = df["valores"].astype(int)
(...)
ValueError: cannot convert float NaN to integer

Converting to float

df["valores"] = df["valores"].astype(float)

df
   valores
0      1.0
1      1.0
2      NaN
3      4.0

Medium, medium, fashion

df["valores"].mean()
2.0

df["valores"].median()
1.0

df["valores"].mode()
0    1.0
dtype: float64

0

You can use lambda this way to remove ". 0" since your numbers are integers

data['Score-9'] = data['Score-9'].apply(lambda x: x.replace('.0', ''))

-1

What you can do is treat the values "Unknown" to 0 so when convert to int you will not get this error.

If you don’t want the values like 0, after the conversion, change the 0 to None

  • The question describes that he does not want to use 0 (zero) not to change the average etc...

  • Do not greet and thank in the publications. See https://answall.com/help/behavior

Browser other questions tagged

You are not signed in. Login or sign up in order to post.