Pandas - Problems with ". Loc[]" in multiple inputs

Asked

Viewed 23 times

0

My project has 4 user-given inputs that are stored in variables a, b, c, d.

I use these entries to filter information from a giant . csv by storing it in a dataframe. The filter is made using "Loc[]", as follows:

arquivo = chunk.loc[(chunk['Coluna1'] == a) & (chunk['Coluna2'] == b) & (chunk['Coluna3'] == c) & (chunk['Coluna4'] == d)]

The problem is that the user may or may not fill in the 4 inputs. I need the "Loc" method to be able to filter information even when, for example, the user input the "a" variable and leave the others blank.

As I put it, the variables that did not receive information, receive the a string: "". O ". Loc[]" then tries to find lines where the column in question has "".

I thought that one possible solution would be to assign a "wildcard value" to the unfilled variables, but I do not know if this is possible.

1 answer

0

I think that can be done not on the call of loc, but yes when you get the variables from the user. First, we need to consider an entry that the user did not provide as None. Currently, you are considering an empty string (''). So we need to convert her:

def retorna_variavel(nome):
    # Insira aqui o código para obter a entrada do usuário pela interface
    # gráfica. Estou assumindo que a entrada esteja sendo obtida por meio
    # de uma caixa de texto por exemplo, portanto ela será do tipo string
    texto = retorna_variavel_de_caixa_de_texto(nome)

    # Verifica se o usuário não digitou nada. Essa verificação é equivalente
    # à `if texto is None or len(texto) == 0:`
    if not texto:
        return None
    
    # Converte a variável inserida pelo usuário para um tipo inteiro. Você
    # também pode convertê-la para o tipo ponto flutuante utilizando `float`. 
    return int(texto)

Then you create the variables a, b, c and d desired:

a = retorna_variavel('a')
b = retorna_variavel('b')
c = retorna_variavel('c')
d = retorna_variavel('d')

Finally, you can use the following code to filter the pd.DataFrame:

import pandas as pd

# Um DataFrame de exemplo
chunk = pd.DataFrame({
    'Coluna1': [1, 1],
    'Coluna2': [2, 2],
    'Coluna3': [3, 3],
    'Coluna4': [5, 6],
})

arquivo = chunk.loc[
    (a is None or chunk['Coluna1'] == a) & 
    (b is None or chunk['Coluna2'] == b) & 
    (c is None or chunk['Coluna3'] == c) & 
    (d is None or chunk['Coluna4'] == d)]
    
print(arquivo)

In this example,

  • case a == 1, b == 2, c == 3 and d == 4, the exit will be
Empty DataFrame
Columns: [Coluna1, Coluna2, Coluna3, Coluna4]
Index: []
  • case a == 1, b == 2, c == 3 and d == 5, the exit will be
   Coluna1  Coluna2  Coluna3  Coluna4
0        1        2        3        5
  • case a == 1, b == 2, c == 3 and d == 6, the exit will be
   Coluna1  Coluna2  Coluna3  Coluna4
1        1        2        3        6
  • case a == 1, b == 2, c == 3 and d == None (the user has omitted the value of d), the exit will be
   Coluna1  Coluna2  Coluna3  Coluna4
0        1        2        3        5
1        1        2        3        6

Like d was omitted, all values in column 4 were returned.

  • These entries are given by a graphical user interface. I necessarily receive values for all 4, but those that are not filled in, I receive: "". In my logic, I needed the Loc function to ignore the unfilled variables, but I’m not able to think of a way to do that... The wildcard variable I meant would be a value by which the Loc function would search in the respective column, but to bring something, you know?

  • I understand, I apologize for the misunderstanding. I updated my answer, see if it solves your problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.