My dataframe skips a line in excel when Columns receives more than one line (pandas)

Asked

Viewed 54 times

1

I’m trying to make dataframes and pass them to excel, but when I do a dataframe with more than one header line, it skips a line before plotting the data in the spreadsheet (leaves an empty line). It seems that the problem is tied to the fact of my argument Columns be a list of lists, because it doesn’t happen in dataframes where Columns is a single list. Does anyone have any suggestions on how to resolve this, take away these empty lines?

import pandas as pd
import numpy as np

def multiple_dfs(df_list, sheets, file_name, spaces):
    writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
    row = 0
    for dataframe in df_list:
        dataframe.to_excel(writer, sheet_name=sheets, startrow=row, startcol=0)
        row = row + len(dataframe.index) + spaces + 1
    writer.save()
    
array1 = [['one', 'one', 'one', 'two', 'two', 'two'],['a', 'b', 'c', 'a', 'b', 'c']]
df1 = pd.DataFrame(np.random.randn(5, 6),columns=array1)

array2 =[[1, 2, 3], ['Sharon', 'Nick', 'Bailey'],[22,23,24]]
midx = pd.MultiIndex.from_arrays(array2, 
            names =('Number', 'Names', 'Ages')) 
df2 = pd.DataFrame(np.random.randn(5, 3),columns=midx)

array3 = ['one', 'one', 'one', 'two', 'two', 'two']
df3 = pd.DataFrame(np.random.randn(5, 6),columns=array3)

df_lista = [df1, df2, df3]
multiple_dfs(df_lista,'Frames','Frames.xlsx',4)

inserir a descrição da imagem aqui

  • Change anything if you put row = 1?

  • He would start printing the dataframes from the second line of the spreadsheet

  • Yes, exactly. I understood that the problem is that you are skipping a line between the header and the data. But, you tried row = 1?

  • I just tried, it didn’t work

1 answer

0


Did you find or knot that needs to be untied by the people of .

It is a matter of their choice. Although it is not a bug, the result is as if it were.

This is only perceived when the column name is the type MultiIndex.

See below for why this happens:

Case 1 - visually a bug

m = pd.MultiIndex.from_tuples([(1,1),(1,2)])
df = pd.DataFrame([[1,2],[3,4]], columns=m)
df.to_excel('Frames.xls', sheet_name="Frames", startrow=0, startcol=0)

Resultado Caso 1

Case 2 - starting to understand the problem

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['linha 1','linha 2'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)
df.to_excel('Frames.xls', sheet_name="Frames", startrow=0, startcol=0)

The modification was the inclusion of names for the columns

Resultado do caso 2

Case 3 - The reason for the problem

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['linha 1','linha 2'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)
df.to_excel('Frames.xls', sheet_name="Frames", startrow=0, startcol=0, index_label="Indices")

The modification was the inclusion of names for the columns and labels for the indexes.

Resultado Caso 3

Note that that blank line in Case 1 is actually a line reserved for the index label.

It seems that they beat the hammer to when the columns nay are of the type MultiIndex, the index label is in the same row as the column name and ignores the column label.

Case 4 - Described above

m = pd.Index([1,2], name="coluna")
df = pd.DataFrame([[1,2],[3,4]], columns=m)
df.to_excel('Frames.xls', sheet_name="Frames", startrow=0, startcol=0, index_label="Indices")

Resultado Caso 4

Browser other questions tagged

You are not signed in. Login or sign up in order to post.