Pandas: How to make a Serie fragment in the field (Column) of the Dataframe

Asked

Viewed 227 times

0

I have this excel below:

NÚMERO  "URL NÚMERO 16571 SICAN"    "URL DECRETOS 2011 PRINCIPAL"

1    CCIVIL_03/Atos/decretos/1991/D00001.html     CCIVIL_03/decreto/1990-1994/D0001.htm
4    CCIVIL_03/Atos/decretos/1889/D00004.html     CCIVIL_03/decreto/1851-1899/D0004.htm
5    CCIVIL_03/Atos/decretos/1889/D00005.html     CCIVIL_03/decreto/1851-1899/D0005.htm
5    CCIVIL_03/Atos/decretos/1934/D00005.html    
5    CCIVIL_03/Atos/decretos/1991/D00005.html     CCIVIL_03/decreto/1990-1994/D0005.htm
7    CCIVIL_03/Atos/decretos/1991/D00007.html     CCIVIL_03/decreto/1990-1994/D0007.htm
8    CCIVIL_03/Atos/decretos/1991/D00008.html     CCIVIL_03/decreto/1990-1994/D0008.htm
9    CCIVIL_03/Atos/decretos/1991/D00009.html     CCIVIL_03/decreto/1990-1994/D0009.htm
12   CCIVIL_03/Atos/decretos/1934/D00012.html  

and developed the following code to treat values:

import pandas as pd
import numpy as np
from time import time

def truncus04(filein='../brito procv.xlsx' ):
    df = pd.read_excel(filein, names=['num', 'url1', 'url2'], sheet_name='Plan7')
    df2 = df.dropna()
    df2['num'] = df['num'].dropna().apply(np.int64)
    result = (df2.url1.str.split('/'))
    print(type(result))
    # print(result)
    #df2['ano'] = df2.url1.str.split('/')[:][6]
    df2['ano'] = df2.url1.str.split('/').loc[:, (6)]

    # df2.to_csv('../brito_procv_{}.css'.format(int(time())), index=False)
    df2 = df2.sort_values(['ano', 'num'])
    df2[['num', 'ano', 'url1', 'url2']].to_csv('../brito_procv_{}.css'.format(int(time())), index=False)
    return True

My goal is to create a new listing with the ['num', 'ano', 'url1', 'url2'] fields, without the empty fields.

num,ano,url1,url2
4,1889, CCIVIL_03/Atos/decretos/1889/D00004.html, CCIVIL_03/decreto/1851-1899/D0004.htm
5,1889, CCIVIL_03/Atos/decretos/1889/D00005.html, CCIVIL_03/decreto/1851-1899/D0005.htm
291,1890, CCIVIL_03/Atos/decretos/1890/D00291.html, CCIVIL_03/decreto/1851-1899/D291.htm
456,1890, CCIVIL_03/Atos/decretos/1890/D00456.html, CCIVIL_03/decreto/1851-1899/D00456.html

How to Fix My Code to Create the Year Column?

1 answer

1

Instead of using the Pandas, how about using the library xlrd to read the file .xlsx and the library csv to save the file CSV outgoing, check it out:

import xlrd
import csv

# Abre arquivo XLSX de entrada para leitura
workbook = xlrd.open_workbook('entrada.xlsx')

# Seleciona a Work Sheet "Plan 7"
worksheet = workbook.sheet_by_name('Plan7')

# Abre arquivo CSV de saida para escrita
with open( 'saida.csv', 'w' ) as fcsv:

    # Cria um gravador de .CSV usando virgula como delimitador
    gravador = csv.writer( fcsv, delimiter=',' )

    # Grava cabecalho como primeiro registro do arquivo de saida
    gravador.writerow( [ 'num', 'ano', 'url1', 'url2' ] )

    # Itera nos registros do arquivo de entrada ignorando o cabecalho
    for i in range( 1, worksheet.nrows ):

        # Recupera registro do arquivo de entrada
        row = worksheet.row(i)

        # Realiza as devidas conversoes
        num = int(row[0].value)
        ano = row[1].value.split('/')[3]
        url1 = row[1].value
        url2 = row[2].value

        # Grava registro no arquivo de saida se url2 nao estiver vazia
        if( url2 ):
            gravador.writerow([ num, ano, url1, url2 ]);

Exit (saida.csv):

num,ano,url1,url2
1,1991,CCIVIL_03/Atos/decretos/1991/D00001.html,CCIVIL_03/decreto/1990-1994/D0001.htm
4,1889,CCIVIL_03/Atos/decretos/1889/D00004.html,CCIVIL_03/decreto/1851-1899/D0004.htm
5,1889,CCIVIL_03/Atos/decretos/1889/D00005.html,CCIVIL_03/decreto/1851-1899/D0005.htm
5,1991,CCIVIL_03/Atos/decretos/1991/D00005.html,CCIVIL_03/decreto/1990-1994/D0005.htm
7,1991,CCIVIL_03/Atos/decretos/1991/D00007.html,CCIVIL_03/decreto/1990-1994/D0007.htm
8,1991,CCIVIL_03/Atos/decretos/1991/D00008.html,CCIVIL_03/decreto/1990-1994/D0008.htm
9,1991,CCIVIL_03/Atos/decretos/1991/D00009.html,CCIVIL_03/decreto/1990-1994/D0009.htm

Browser other questions tagged

You are not signed in. Login or sign up in order to post.