Pandas iterrows, how to make the second looping using index

Question

Pandas iterrows, how to make the second looping using index

Asked 6 years, 7 months ago

Viewed 2,655 times

0

for index, row in candles.iterrows():
    if (row['Twintower'] == 1):

I would like to make a second looping from the moment he finds this condition, ie from this index down, or Row down, tried several options

TypeError                                 Traceback (most recent call last)
<ipython-input-161-e34b05ba34a8> in <module>
      1 for index, row in candles.iterrows():
      2     if (row['Twintower'] == 1):
----> 3         for row in range(index, candles):
      4             print(1)

TypeError: 'Timestamp' object cannot be interpreted as an integer

@jsbueno, have as you help, I can’t really make the second looping picking from the moment he finds the second condition, I tried to use range(Row, Candles) but did not give

– Jair Miranda

2018/12/22 at 22:26

1 answer

Browser other questions tagged python pandas numpy

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2018-12-22T22:33:21+00:00

You don’t seem to need a second loop there - you still need to go through all the lines (Rows) of the dataframe once, albeit for two different purposes: you want to go through the first lines until you find the moment when your condition is true for the first time, and from there go through the other lines, performing some other action.

You could even do it in two steps - the first only to note the value of "index" when your condition is true, and the other, having this value, from there down. It is what you are trying to do, and the efficiency of the program would be the same, after all, each line would be covered only once. However, the function iterrows does not accept a start line. (And because of that you tried to use a range to get an index number of the line - this is wrong on several levels. And the level at which it gives error is that the index axis of your dataframe is not an integer, but a "Timestamp" object of Pandas - so the error you have when calling the range).

So, since iterrows doesn’t allow a start index, one cool way to work there is to have another variable, which indicates whether you’ve reached your point of interest or not - and only then perform the actions that would be performed in your second loop. The key to this is to skip running a part of the loop block using the command continue: it simply jumps to the next run of the loop.

So what you’re trying to do can be written as:

region_of_interest = False
for index, row in candles.iterrows():
    if (row['Twintower'] == 1):
        region_of_interest = True
    if not region_of_interest:
        # Até que a condição de cima seja verdadeira a primeira vez,
        # retona ao inicio do loop aqui
        continue

    # Aqui vai o código que você estava qurendo colocar
    # no "segundo loop".
    ...

If you really want to "crop" the dataframe from the index where the condition is true, this is also possible - in this case, maybe the most recommended is to create a new copy of the dataframe only with the lines of interest, and then repeat the iterrows:

for row_number, (index, row) in enumerate(candles.iterrows()):
    if (row['Twintower'] == 1):
        # Encerra este loop nesse ponto
        break
else:
   # Else do comando for - este bloco só é executado se o comando
   # break acima não acontecer nunca.
   raise ValueError("O dataframe não tem uma linha onde Twintower == 1")

# O atributo ".loc" do dataframe retorna um objeto que tem a cópia
# dos dados do dataframe, mas é endereçavel com a sintaxe de "[]"
# com números do Python (e se você recortar uma fatia desse objeto,
# tem  um novo dataframe)
candles2 = candles.loc[row_number:]

for index, row in candles2.iterrows():
    # aqui é seu segundo loop, somente na região de interesse.
    ...