Get range of values in pandas object. Dataframe

Asked

Viewed 2,450 times

3

I have an object pandas. Dataframe with a python date field. How can I get a set of records according to date range?

An example of the data set (last 10 records):

         datpre codneg     nomres modref  preabe  premax  premin  preult  \
153  2017-08-14  PETR4  PETROBRAS     R$   12.98   13.22   12.91   13.08   
154  2017-08-15  PETR4  PETROBRAS     R$   13.08   13.24   13.05   13.15   
155  2017-08-16  PETR4  PETROBRAS     R$   13.30   13.38   13.10   13.13   
156  2017-08-17  PETR4  PETROBRAS     R$   13.12   13.26   13.02   13.05   
157  2017-08-18  PETR4  PETROBRAS     R$   13.16   13.60   13.16   13.60   
158  2017-08-21  PETR4  PETROBRAS     R$   13.64   13.66   13.28   13.34   
159  2017-08-22  PETR4  PETROBRAS     R$   13.70   13.87   13.64   13.79   
160  2017-08-23  PETR4  PETROBRAS     R$   13.78   13.93   13.65   13.76   
161  2017-08-24  PETR4  PETROBRAS     R$   13.78   13.89   13.68   13.80   
162  2017-08-25  PETR4  PETROBRAS     R$   13.87   13.94   13.81   13.88   
       quatot        codisi       data    porvar  prevar  
153  31607400  BRPETRACNPR6 2017-08-14  1.003861    0.13  
154  24737300  BRPETRACNPR6 2017-08-15  0.535168    0.07  
155  44946600  BRPETRACNPR6 2017-08-16 -0.152091   -0.02  
156  23503300  BRPETRACNPR6 2017-08-17 -0.609292   -0.08  
157  66263500  BRPETRACNPR6 2017-08-18  4.214559    0.55  
158  40485900  BRPETRACNPR6 2017-08-21 -1.911765   -0.26  
159  55240700  BRPETRACNPR6 2017-08-22  3.373313    0.45  
160  47679700  BRPETRACNPR6 2017-08-23 -0.217549   -0.03  
161  32300600  BRPETRACNPR6 2017-08-24  0.290698    0.04  
162  24185600  BRPETRACNPR6 2017-08-25  0.579710    0.08 

The date column contains dates in date format. I needed to get the data set within a date range.

  • You can add a small example of the data set you have?

  • I am separating a small set and shortly edit the question.

  • I found a solution in the English version of stackoverflow: "select-dataframe-Rows-between-two-Dates"

2 answers

1

Hello, you can use the iloc() pandas function that follows the same notations as the Numpy lib.dataframe.iloc[20:50,2] that is, index 20 to index 49 and only column of index 2

1


The way to filter lines from a dataframe is to use the syntax of [ ] to select lines, but within the same place an expression that results in a series of the same size as the dataframe, with True or False values.

And to do this, just compare the elements of the date column with the desired values - in this case, datetime objects with the lowest and highest date. Unfortunately the nature of the operations to create these series, which somewhat abuses the overload of Python operators, does not allow the use of Python syntax that works for scalar values "start < end value", as we use in mathematical notation - then we have to combine the two operations with the "&"-operator which, in turn, requires a set of parentheses on account of precedence

import datetime
...

data_inicio = datetime.datetime(2017, 8, 5)
data_final = datetime.datetime(2017, 8, 15)

df_filtrado = df[(data_inicio <= df["datpre"]) &  (data_final > df["datpre"]) ]

(of course the dates do not need to be in variables, I only did to improve readability)

Example -

It is always interesting to leave complete and reproducible code examples so that other people can learn and reproduce what is happening without having to reproduce everything from scratch. Then below the lines that, in a terminal session, can create a test dataframe and select a few lines:

import pandas as pd, datetime
# Variável para abreviar a chamada de `datetime.datetime` mais abaixo
D = datetime.datetime

# Criando um DF com uma coluna de dias, um dia de maio de 2019 por linha:

df = pd.DataFrame({"dias":[(D(2019, 5, 1) + datetime.timedelta(days=i)) for i in range(32)    ]})

print(df)
# selecionando um subconunto de datas
df[(D(2019, 5, 5)< df["dias"]) & (df["dias"] < D(2019, 5, 10)) ]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.