Read more than one. xls file in python

Asked

Viewed 2,574 times

3

I posted a very similar question a few days ago, but this time, I’m trying to do something much more generic. I want to get the information from line 0 and 1. The code below is complete, and is generating the output of the input.xls file correctly.

import xlrd
import xlwt

# Open the input worksheet 
inputWorkbook = xlrd.open_workbook('entrada.xls', formatting_info=True)

# select the correct sheet for input
inputSheet = inputWorkbook.sheet_by_name('idade')

# Create the output worksheet
outputWorkbook = xlwt.Workbook(encoding="utf-8")

# Add a new sheet to the output worksheet
outputSheet = outputWorkbook.add_sheet('idade')

# Output sheet header's style
headerStyle = xlwt.easyxf('font: height 200, bold 1, color white;''pattern: pattern solid, fore_color black; ')

for coluna in range(inputSheet.ncols):
    outputSheet.write(0, coluna, inputSheet.cell_value(1, coluna), headerStyle);
    outputSheet.write(1, coluna, inputSheet.cell_value(2, coluna));

outputWorkbook.save("resultado.xls")

The problem: Read more than 10. xls files at once. I don’t want the information.xls input only, but also 1.xls input, 2.xls input, 3.xls input ...

  • 1

    It is very difficult to answer your question like this in the abstract, because I do not know exactly what you want to do with these files, whether you need to combine your results or not, etc. I tried to answer as best I could, but if this does not clarify your question, please [Dit] your question by asking the maximum of context possible (i.e. say what you want to do, how you are trying to do it, any relevant chunk of code, etc).

  • Hi, I edited the question to improve what I want to do.

  • 1

    Improved a lot, but still left some doubts: 1) all files entradaN.xls have equal structure? i.e. a page called idade, where you intend to extract the lines 1 and 2 of all columns (as you are doing in this example); or not? 2) Where will the information of the other entries be placed? The first was for the lines 0 and 1 from the exit, the next will go to the lines 2 and 3? Or is it all in the first two rows, and the next one will simply add more columns? Or even: only the first will have headers, the following will not... Etc.

  • 1

    That’s good. 1) Yes, all files entered in N.xls have the same structure. For example, it has 10 columns and 2 rows. 2) That’s right, I need to take the information from all columns and rows 0 and 1 of the input file, and for the pulled row it will be added on row 0 and 1, and then 2 and 3 and so on. I am studying on how to create equal role you taught below, but I am still struggling. But I am attentive to your teachings. Thank you

  • 1

    I updated the answer with a full example. In this case, the zip was unnecessary, but the overall structure of the solution remained the same.

2 answers

3


I suggest you take a look at the function zip and - if you do not yet have this concept - in list comprehensions (list comprehensions). This will help you a lot to structure your code.

If you have a ready code that handles a single file:

workbook_r = open_workbook('arquivo.xls', formatting_info=True)
worksheet_r = workbook_r.sheet_by_index(0)
...

The "natural" way to move to another file is the use of functions:

def ler_arquivo(nome):
    workbook_r = open_workbook(nome, formatting_info=True)
    worksheet_r = workbook_r.sheet_by_index(0)
    ...
    return resultados # Aquilo que você quer extrair do arquivo

ler_arquivo('arquivo.xls')

If you have N files then you will call this function N times by extracting the individual results from each one (if the files have different structures then you will have to do different functions for each "file type", of course):

arquivos = ['arquivo1.xls', 'arquivo2.xls', 'arquivo3.xls']
resultados = [ler_arquivo(nome) for nome in arquivos]

Okay, so far I probably didn’t say anything you didn’t already know. The problem now is - how combine the processing results of each file? That’s where the function comes in zip. Let’s say part of your results are values from a column in the archive:

def ler_arquivo(nome):
    ...
    coluna_que_me_interessa = []
    ... # Extrai os valores de uma coluna que te interesse e coloque na lista
    return {
        "minha_coluna":coluna_que_me_interessa,
        ... # outros dados
    }

For example, let’s say that the Excel A file contains "weights" and the B file contains "height measurements" - and you want to use both to make a calculation.

resultados1 = ler_arquivo('pessoas_pesos.xls')
resultados2 = ler_arquivo('pessoas_alturas.xls')

pesos = resultados1['minha_coluna']
alturas = resultados2['minha_coluna']

for i in range(len(pesos)):
    imc = pesos[i] / alturas[i]*alturas[i]

Instead of making this "weird" loop, you can use the function zip to combine the elements of the lists pesos and alturas two-two:

pesos = [50,70,60]
alturas = [160,180,165]

zip(pesos, alturas) # [(50,160), (70,180), (60,165)]

So that you can use the pair in a list comprehension:

def calcular_imc(peso, altura): # Exemplo; na prática, suas funções serão mais complexas
    ...

imcs = [calcular_imc(x,y) for (x,y) in zip(pesos, alturas)]

Finally, to answer your question: what if I have a number arbitrary files? In this case (and assuming the files are homogeneous, i.e. with the same structure) you can pass a list of pro arguments zip dynamically. Example:

arquivos = ['arquivo1.xls', 'arquivo2.xls', 'arquivo3.xls']
resultados = [ler_arquivo(nome) for nome in arquivos]

minha_coluna = [r['minha_coluna'] for r in resultados]
calculo = [meu_calculo(x) for x in zip(*minha_coluna)]

This will make it meu_calculo receive a list containing the values of each column [of each file] corresponding to row 1, then a list for row 2, etc.


Updating: then adapting the technique described for your specific case (in this case, the zip was unnecessary):

def ler_arquivo(nome):
    inputWorkbook = xlrd.open_workbook(nome, formatting_info=True)
    inputSheet = inputWorkbook.sheet_by_name('idade')
    return {
        "ncols":inputSheet.ncols,
         # As duas linhas que interessam
        "linha1":[inputSheet.cell_value(1, coluna) for coluna in range(inputSheet.ncols)],
        "linha2":[inputSheet.cell_value(2, coluna) for coluna in range(inputSheet.ncols)],
    }

arquivos = ["entrada{}.xls".format(i+1) for i in range(10)]
resultados = [ler_arquivo(nome) for nome in arquivos]

outputWorkbook = xlwt.Workbook(encoding="utf-8")
outputSheet = outputWorkbook.add_sheet('idade')
headerStyle = xlwt.easyxf('font: height 200, bold 1, color white;''pattern: pattern solid, fore_color black; ')

# Percorre os resultados de cada arquivo, incrementando a linha em 2
linha = 0
for resultado in resultados:
    for coluna in range(resultado["ncols"]):
        outputSheet.write(linha+0, coluna, resultado["linha1"][coluna], headerStyle)
        outputSheet.write(linha+1, coluna, resultado["linha2"][coluna])
    linha = linha + 2

outputWorkbook.save("resultado.xls")
  • 1

    It worked! I only changed the position of the outputWorkbook and outputSheet line to before the function, as it is these lines that "open" the file(s) to be read. Now!

-1

Hi, I’d like to do something like that. but instead of him taking two-line information, I want him to take two whole columns of the input file and write it in the output file. rotten help me?

  • 1

    Tiago, welcome to stackoverflow in English, make a tour and see how to create a question, do not do this in existing questions.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.