How to filter rows where columns meet consecutive conditions in Python?

Question

How to filter rows where columns meet consecutive conditions in Python?

Asked 4 years, 7 months ago

Viewed 97 times

3

I’m trying to filter lines in which the columns comply with conditions consecutively. That is, if the row has columns with the conditions of after an L/I, the next column has a A/S, then return the value of 1 in the new column (if no, return 0)

Input:

       RFA RFB RFC RFD       
    0   S   S   S   S   
    1   A   I   A   A       
    2   A   A   L   A       
    4   S   S   L   A

Output:

       RFA RFB RFC RFD  promo
    0   S   S   S   S     0
    1   A   I   A   A     1 
    2   A   A   L   A     1
    4   S   S   L   A     1

Script:

      def promo_behaviour(x):
          for i in range(0,95411):
             for j in data_rfa_r.columns:
                 if (x[j][i] == 'L' or x[j][i] == 'I') and (x[j][i+1] == 'A' or x[j][i+1] == 'S'):
                    return 1
                 else:
                    return 0
      data_rfa_r['promo'] = data_rfa_r.apply(promo_behaviour)

I wrote this function but without success (95411 are the number of remarks/lines).

I forgot to mention that in the context of the problem, the index column 0 is the latest! I mean, it should be read from right to left.

EDIT:

Output:

       RFA promo2 RFB promo1 RFC RFD    
    0   S    0    S     0     S   S   
    1   A    1    I     0     A   A     
    2   A    0    A     1     L   A   
    4   S    0    S     0     L   A

Good afternoon! In the actual database no! but there are more than 25 variables( --> 25 columns)...

– zoramind

2020/12/16 at 20:18

2 answers

1

One Line Solution:

df['promo']=pd.Series([bool(re.search(r'(L|I)(?=[AS])',k)) for k in df.sum(axis=1)])

My idea was to transform the columns into a single column with the concatenation of the other columns. In this new column, I applied the logic test using regex. I used Positive lookbehind to check if there is an A or S after I saw an I or L. Whole code:

import pandas as pd
import numpy as np
import re

df=pd.read_csv("stack.txt",sep=",")

df['promo']=pd.Series([bool(re.search(r'(L|I)(?=[AS])',k)) for k in df.sum(axis=1)]).map({True:1, False:0})

print(df)

Returns:

  RFA RFB RFC RFD  promo
0   S   S   S   S      0
1   A   I   A   A      1
2   A   A   L   A      1
3   S   S   L   A      1

Although I don’t think it will work in my particular case because of Missing values(null), I think I would be completely correct!

– zoramind

2020/12/16 at 22:21
well, I guess you just replace NULL with any string other than A,S,L or I, no?

– Lucas

2020/12/16 at 22:23
In another situation it would be, but in this case null values are important because in fact they are not null values(they are simply moments when the customer was not yet partner to have access to the promotion) But replacing should work for sure! I will try it and if it works, I will put as correct too! Thank you very much!

– zoramind

2020/12/16 at 22:26
Ok. Avoid nested for loop pq is an O(n 2) algorithm and therefore extremely inefficient.

– Lucas

2020/12/16 at 22:30
I’ll take your advice! Although the sample is only 100k but still, it always pays!

– zoramind

2020/12/16 at 22:32
1

After resolving to name the null by another letter...it worked right with a much better time! By the way, do you happen to know if you can put the promo column when there is condition? I mean, the promotion that got L to A...!

– zoramind

2020/12/16 at 23:33
Perhaps it is the case to open another question

– Lucas

2020/12/17 at 01:32
Okay! Just to avoid opening another topic!

– zoramind

2020/12/17 at 01:40

Show 3 more comments

Browser other questions tagged python function pandas

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2020-12-17T02:09:55+00:00

You can use isin by creating a list of possible combinations.

vl = ['LA','IA','LS','IS']
dados['promo'] = (dados.shift(axis = 1) + dados).isin(vl).any(axis = 1).astype(int)

dados.shift 'move' the data frame
isin checks the occurrence within the list
any checks for occurrence of True on lines
astype(int) returns 0 or 1 instead of True or False