Is it possible to replace certain values with NA in pandas without the use of loops?

Question

Is it possible to replace certain values with NA in pandas without the use of loops?

Asked 3 years, 9 months ago

Viewed 31 times

0

I was studying data cleaning, and I saw that sometimes there can be int values in columns that should be string and vice versa, so the solution given by the author of the publication I was reading uses a for loop to replace the values for Nan in the following way.

# Detecting numbers 
cnt=0
for row in df['OWN_OCCUPIED']:
    try:
        int(row)
        df.loc[cnt, 'OWN_OCCUPIED']=np.nan
    except ValueError:
        pass
    cnt+=1

But for large volume of data loops are not too slow? there’s another way to do it?

i did not understand this code. Pq it is casting the value to integer if then it replaces the value by nan? It seems that the program is the same thing without the line int(row). Why not use np.where for that purpose?

– Lucas

2021/08/10 at 11:42
The int(row) is only here to test if it is possible to turn Row into an integer. If it is not, it generates an exception. The big problem of this code, besides iterating item by item, is to consider that the index is numerical, starts with 0 (zero) and is sequential; which is not always true.

– Paulo Marques

2021/08/11 at 17:04

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2021-08-10T17:02:25+00:00

We have two scenarios:

Columns that have integers that should be string
Columns that have string that should be integer

For the first case, for example, the whole case 1 has to become the string 1, the solution is simple: just use the astype

Example

df["coluna"] = df["coluna"].astype(str)

For the second case we have two possibilities:

a. All values that have to be converted from string to int (float) can be converted

b. Some (or several, or all) values that have to be converted from string to int (float) cannot be converted

In case all values can be converted, just use the same solution already described:

df["coluna"] = df["coluna"].astype(int)

For the second case, see the example:

Create a transform function to `int` or return `nan`

import numpy as np

def to_int(row):
    try:
        return int(row)
    except ValueError:
        return np.nan

Using the function in a dataframe

df = pd.DataFrame({"A": [1, "a", 3, "4"]})

print(df)

   A
0  1
1  a
2  3
3  4

df["A"] = df["A"].apply(to_int)

print(df)

     A
0  1.0
1  NaN
2  3.0
3  4.0

EDITED 10/08/2021 - reason: comment below

If 10 is whole, do it:

df["A"].apply(lambda x: x if isinstance(x, str) else np.nan)

If 10 is string type, do something like:

def only_strings(row):
    try:
        int(row)
        return np.nan
    except ValueError:
        return row

and call with

df["A"] = df["A"].apply(only_string)

End of issue

Is it possible to replace certain values with NA in pandas without the use of loops?

1 answer

Create a transform function to int or return nan

Using the function in a dataframe

Create a transform function to `int` or return `nan`