What happens is this, the iloc
receives two parameters, lines and columns: iloc[linhas,colunas]
. Lines can be a single line or more than one (the same goes for column). If it is just one you simply say what the line number is, if it is more than one you can use a list or a range, which in python is represented with :
.
How this interval works: começo:fim:passo
represents a range that starts at começo
ends before fim
jumping from step to step, for example 1:10:2
will use the numbers from 1 until 10, not including the 10, jumping 2 in 2, ie: 1,3,5,7,9
. When you do not set the step, the python means that it is 1. Then 0:5
would be 0,1,2,3,4
. The step can also be negative 4:1:-1
would be 4,3,2
. when you do not set the beginning, python means that the beginning is 0. Therefore 0:14
is the same thing as :14
. Finally, if you don’t set the end, python understands that it should take the entire range.
In the case of your code we have two ranges being used in the iloc
. The first is dados.iloc[:,0:14]
where it is being informed that for the column the interval is :
, that is, neither the beginning, nor the end and not the step are being defined, which means that the python will use all lines, from line 0 to the last, inclusive. For the column 0:14
indicates that it will be all columns from column 0 to column 14, not including column 14 (from 0 to 13). So you have multiple rows and 14 columns. That is, a 2-dimensional matrix. In the second case you have iloc[:,14], which means again all lines, but only column 14. I mean, you only have one dimension.
For the pandas, dados.iloc[:,14]
would still be a Dataframe with a single column, but .values
asks pandas to return the values of this Dataframe and, according to the definition of pandas, if this Dataframe has a single dimension, it returns an array of numpy.
André, good morning! It’s not a matter of wrong or right. Some algorithms ask for a numpy array, others you can deliver as a data frame yourself. In your example, if you do
dados['income'].values
also have the numpy array, regardless of using theloc
oriloc
. With iloc or slicing usually you don’t need to worry about the name of the columns.– lmonferrari
Got it, thank you very much. In my case here I have to use Onehotencoder, so I need Arrays!
– André