Python iloc[] function (Pandas)

Asked

Viewed 3,821 times

0

I saw a code that the iloc function was like this:

x = dados.iloc[0:-1, d:]
y = dados.iloc[d].values[0]

I read that iloc Select the row and column, but what would that be d there ?

2 answers

2

Adapted from documentation:

Dataframe.iloc
Indexing purely localization-based integers for selection by position.

iloc[] is basically based on placements represented by integers (from 0 to the axis size-1) but can also be used with a Boolean array.

Permitted entries:

  • An integer, for example 5
  • A list or array of integers, e.g. [4, 3, 0]
  • An object of the type slice of integers, e.g. 1:7
  • A boolean array

A function that can be called with an argument (Series, Dataframe or Panel) and returns a valid output for indexing (one of the above options). This is useful in string methods, when vc does not have a reference to the calling object, but would like to base its selection on some value. .iloc will raise the exception IndexError if the requested index is out of bounds, although calling via Slices (according to the Pyhton/numpy semantics) admits values out of bounds.

Examples:

>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Accessing the first line of df with iloc[0]:

>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

# Investigando o tipo de objeto retornado qdo fazemos a chamada acima:
>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>

Note that the call with iloc[escalar] does not return a Dataframe and yes an object of the type pandas.series.

Now let’s access the first two lines through a list of integers:

>>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

# Investigando o tipo de objeto retornado pela chamada acima:
>>> type(df.iloc[[0, 1]])
pandas.core.frame.DataFrame

See that now return object type was a pandas.DataFrame, something like a "subdataframe".

Accessing with High Schools:

# Extrapoloando o limite
>>> df.iloc[:60]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

# Começando em 2, indo até o final
>>> df.iloc[2::]
      a     b     c     d
2  1000  2000  3000  4000

Through a function, useful in string methods, in the example below the Dice is passed to lambda which returns, at last, a list of even integers.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Note that the return was a df with lines whose indices were pairs in the df original.

indexing the 2 axes:

We can also make a mix between indexes and columns, in the example below we will access the value of the second column of the first row and then the values of the first row and third column plus the second row and fourth column

# Acessando o valor da primeira linha nas segunda coluna
>>> df.iloc[0, 1]
2

# Acessando os valores presentes na primeira e segunda linha na terceira e quarta coluna, respectivamente.
df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

Specifically answering the code in the question:

To elucidate the code placed in the question, let’s define a value for the variable d, probably this code was taken from a context in which this variable was defined.

x = dados.iloc[0:-1, d:]:

# Vamos copiar o valor de `df` definido acima para um novo df chamado `dados
>>> dados = df.copy()

# Vamos definir a variável d com o valor 2
d = 2

# Executando o primeiro comando do codigo da pergunga
x = dados.iloc[0:-1, d:]

# Apresentando o resultado
print(x)
     c    d
0    3    4
1  300  400

The result is the third and fourth columns (c and d) of the lines 0 and 1, note that the command does what was set in the topic Indexing the 2 axes with the use of Slices in the python/numpy semantics, dados.iloc[0:-1, d:] indicates a Slice starting at index 0 (line 0) up to the next half (-1). The columns were selected according to the value of the variable d (=2), either from column 2 to the last (of c à d).

y = dados.iloc[d].values[0]:

This line assigns to the variable y the first value of the third line (since the value of the variable d is 2) of the dataframe, i.e., 1000:

# Relembrando os valores no dataframe
>>> print(dados)
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

# Lista dos valores na linha do índice equivalente à variável `d`:
>>> dados.iloc[d].values
array([1000, 2000, 3000, 4000])

# Executando segunda linha de codigo da pergunta
>>> y = dados.iloc[d].values[0]

# Apresentando o resultado:
print(y)
1000

0

I didn’t get to use pandas, so I didn’t get to use this method, from what I read, iloc, is a data selection method, and according to this site, its structure works as follows:

data.iloc[<row selection>, <column selection>]

So in this case I believe it is recovering the values from the first line to the last, in column d onwards.

dados.iloc[0:-1, d:]  # a opção 0 é a primeira linha, é o -1 é a última, é o d: a partir da coluna d.

Of course I don’t use data science with python, but from what I read, I think that’s it. But if anyone knows it’s not what I said put here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.