Select Dataframe lines based on list of values for a column

Asked

Viewed 172 times

3

I have a Dataframe with more than 5 thousand lines and more than 20 columns.

I want to select only a few rows, second column match criteria with specific values

SELECT *
FROM dataframe
WHERE colume_name = lista_de_valores

that is to say

import pandas as pd
df=pd.read_csv('arquivo.csv', encoding = "utf-8-sig", sep=",")  
print(df.columns)

## Index(['idx', 'prod', 'number', 'date', ..., 'stockN'], dtype='object')

What I need is to create a new df2, with all lines on which 'stockN' is equal to a value, keeping some columns

lista_de_valores = {'stockN':['169', ..., '1390', '1464', '157', '3833']} # mais de 20 valores
df2 = pd.DataFrame(lista_de_valores, columns= ['idx', 'date', 'stockN'])

But despite the 'stockN' come correct, the Dataframe comes with nan in the other selected columns. What I’m missing here?

Thanks in advance!

2 answers

2

From what I understand you want to make a filter. Here you can consult the documentation better pandas.DataFrame.filter

There are great examples, for example, this is exactly what you need:

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6

So it can also be more useful for you.

>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6

2

Another solution is to use the method isin()

Creating Dataframe for Testing

>>> import pandas as pd

>>> df = pd.DataFrame({'prod':["a","b","c","d","e","f","g","h","i","j"], 'number':[1,1,1,5,5,5,5,1,1,1], 'stockN':[1,2,3,4,5,6,7,8,9,10]})

>>> df
  prod  number  stockN
0    a       1       1
1    b       1       2
2    c       1       3
3    d       5       4
4    e       5       5
5    f       5       6
6    g       5       7
7    h       1       8
8    i       1       9
9    j       1      10

Creating query

>>> query = [4,5,6,7]

>>> df['stockN'].isin(query)   # Somente para verificar o resultado
0    False
1    False
2    False
3     True
4     True
5     True
6     True
7    False
8    False
9    False
Name: stockN, dtype: bool

Assigning to a new Dataframe

>>> df2 = df[df['stockN'].isin(query)]

>>> print(df2)

  prod  number  stockN
3    d       5       4
4    e       5       5
5    f       5       6
6    g       5       7

Browser other questions tagged

You are not signed in. Login or sign up in order to post.