We have two scenarios:
- Columns that have integers that should be string
- Columns that have string that should be integer
For the first case, for example, the whole case 1
has to become the string 1
, the solution is simple: just use the astype
Example
df["coluna"] = df["coluna"].astype(str)
For the second case we have two possibilities:
a. All values that have to be converted from string to int (float) can be converted
b. Some (or several, or all) values that have to be converted from string to int (float) cannot be converted
In case all values can be converted, just use the same solution already described:
df["coluna"] = df["coluna"].astype(int)
For the second case, see the example:
Create a transform function to int
or return nan
import numpy as np
def to_int(row):
try:
return int(row)
except ValueError:
return np.nan
Using the function in a dataframe
df = pd.DataFrame({"A": [1, "a", 3, "4"]})
print(df)
A
0 1
1 a
2 3
3 4
df["A"] = df["A"].apply(to_int)
print(df)
A
0 1.0
1 NaN
2 3.0
3 4.0
EDITED 10/08/2021 - reason: comment below
If 10 is whole, do it:
df["A"].apply(lambda x: x if isinstance(x, str) else np.nan)
If 10 is string type, do something like:
def only_strings(row):
try:
int(row)
return np.nan
except ValueError:
return row
and call with
df["A"] = df["A"].apply(only_string)
End of issue
i did not understand this code. Pq it is casting the value to integer if then it replaces the value by
nan
? It seems that the program is the same thing without the lineint(row)
. Why not usenp.where
for that purpose?– Lucas
The
int(row)
is only here to test if it is possible to turn Row into an integer. If it is not, it generates an exception. The big problem of this code, besides iterating item by item, is to consider that the index is numerical, starts with 0 (zero) and is sequential; which is not always true.– Paulo Marques