Pandas: Create multiple columns in Dataframe

Question

Pandas: Create multiple columns in Dataframe

Asked 7 years, 10 months ago

Viewed 1,784 times

1

I have a spreadsheet with the following information:

    NUM_LEGISLACAO  DSC_URL    ...      COD_TIPO   DSC_TIPO
0         323/1895      NaN    ...           2.0   CONCRETO
1       2.269/1896      NaN    ...           2.0   CONCRETO
2       2.397/1896      NaN    ...           2.0   CONCRETO
3       2.380/1896      NaN    ...           2.0   CONCRETO
4       2.489/1897      NaN    ...           2.0   CONCRETO
5       2.997/1898      NaN    ...           2.0   CONCRETO

And I developed this code, in order to create two other columns with separate numbers and years:

def truncus01():
    xlsx = pd.ExcelFile(file)
    df = xlsx.parse(xlsx.sheet_names[0])

    print(len(df.NUM_LEGISLACAO))
    print(df.columns)
#    print(df[df['NUM_LEGISLACAO']])

    #df['NUM'], df['ANO'] = (df.NUM_LEGISLACAO)
    for i in df.NUM_LEGISLACAO:
        df['NUM'] = i.split('/')[0]
        df['ANO'] = i.split('/')[1]
    print(df)

But I did not succeed, year and number come repeated, as shown below:

    NUM_LEGISLACAO  DSC_URL  COD_SITUACAO  ...    DSC_TIPO   NUM   ANO
0         323/1895      NaN          11.0  ...    CONCRETO  5475  1905
1       2.269/1896      NaN           2.0  ...    CONCRETO  5475  1905
2       2.397/1896      NaN           2.0  ...    CONCRETO  5475  1905
265    90.396/1984      NaN          11.0  ...   NORMATIVO  5475  1905
266    90.804/1985      NaN           2.0  ...    CONCRETO  5475  1905
267     81195/1978      NaN           NaN  ...         NaN  5475  1905
268      5475/1905      NaN           NaN  ...         NaN  5475  1905

[269 rows x 8 columns]

What’s the right way to do it?

1 answer

Browser other questions tagged python python-3.x pandas

You are not signed in. Login or sign up in order to post.

by Gomiero • **5,286** points · Answer 1 · 2018-05-27T01:25:03+00:00

The problem of repetition happens because, within the looping:

for i in df.NUM_LEGISLACAO:
        df['NUM'] = i.split('/')[0]
        df['ANO'] = i.split('/')[1]

every iteration, the columns NUM and ANO are filled (fully) with the cycle value (i) current, "killing" previous update, so at the end of the run, these columns will contain only the last read values.

One possible way to create the two columns is through the command pandas.Series.str.split(), that will divide the die NUM_LEGISLACAO through the separator / and store the results in the columns NUM and ANO, respectively:

dt['NUM'], dt['ANO'] = dt.NUM_LEGISLACAO.str.split('/').str

Example (with question data):

dt
Out[30]:
  NUM_LEGISLACAO  COD_TIPO  DSC_TIPO
0       323/1895       2.0  CONCRETO
1     2.269/1896       2.0  CONCRETO
2     2.397/1896       2.0  CONCRETO
3     2.380/1896       2.0  CONCRETO
4     2.489/1897       2.0  CONCRETO
5     2.997/1898       2.0  CONCRETO

# AQUI: Cria as 2 colunas
dt['NUM'], dt['ANO'] = dt.NUM_LEGISLACAO.str.split('/').str

dt
Out[32]:
  NUM_LEGISLACAO  COD_TIPO  DSC_TIPO    NUM   ANO
0       323/1895       2.0  CONCRETO    323  1895
1     2.269/1896       2.0  CONCRETO  2.269  1896
2     2.397/1896       2.0  CONCRETO  2.397  1896
3     2.380/1896       2.0  CONCRETO  2.380  1896
4     2.489/1897       2.0  CONCRETO  2.489  1897
5     2.997/1898       2.0  CONCRETO  2.997  1898