A Pandas dataframe allows you to create a "sub dataframe" - a view of the original dataframe, passing a list of the desired column names as an item within the brackets.
That is, if your dataframe is in the variable df
and you want a separate variable only with the columns "name" and "address" just do:
variavel = df[["nome", "endereco"]]
.
The object returned by this operation is itself a dataframe, with all the methods and functionalities that a dataframe has - but depending on the situation, the data in the nvo dataframe may be just a view of the original dataframe, or a stand-alone copy. When in doubt, if you make any changes to the data in the new dataframe, it is better to make a copy with the method .copy()
, to make sure that the df
original will not be changed.
Here, a complete example in the interactive interport of a dataframe’s column selection:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([(1,2,3,4)] * 4, columns=["col1", "col2", "col3", "col4"])
In [3]: df
Out[3]:
col1 col2 col3 col4
0 1 2 3 4
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
In [4]: recorte = df[["col2", "col3"]]
In [5]: recorte
Out[5]:
col2 col3
0 2 3
1 2 3
2 2 3
3 2 3