How to read an excel table in pandas by skipping the first lines without losing information?

Asked

Viewed 3,450 times

0

Hello, I have a problem to solve a little complicated, I have several xls tables in which I have to remove some data, these tables are with their content organized strangely...

inserir a descrição da imagem aqui

TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO
TEXTE TEXTO TEXTO TEXTO TEXTO TEXTO TEXTO

Alt. 280m
Lat. 1°1'S
Lon. 4°1'W

                    DADO 1          DADO 2
    HORA UTC        0000                0100
    18-dez-2004     23,0                24,0
    19-dez-2004     24,9                24,9
    20-dez-2004     26,1                26,1
    21-dez-2004     26,6                26,1
    22-dez-2004     22,3                22,4
    23-dez-2004     25,9                26,0

This table has a large title at the top, below appear data in red that are important for my research, then a row with all the titles of the columns in blue in which I need to read, in yellow and the time that that data was collected, in green the day.

When I try to read these tables in a conventional way with python the columns appear in an embarrassing way because of the title and the data in red, I wish I could read this data.

I thought of making a script to delete the lines that will not be useful to me and then transport the data in red to two separate columns at the end, still do not know how to do this, but for some reason the first line naps and erased with my df.drop(line) of read_excel pandas.

I caught up in this problem and I don’t know how to turn around, if I should clean the data or if I can treat them like this, thank you very much to those who are willing to help.

  • Tales, could you provide the first rows of the table?(can copy and paste the first rows with any separator). From the picture, it is very difficult to see the problem

  • @Hartnäcking the first three rows are merged from the first 6 columns and present 3 sentences, are confidential information from where came the table.

  • This being the case (data secrecy), I recommend that you make a minimum replicable example. Instructions on this link: https://answall.com/help/minimal-reproducible-example

1 answer

4


A simple way around this problem is by using the function read_excel of pandas, passing the parameter skiprows with the number of rows you want to ignore before the table starts in the excel sheet:

import pandas as pd

df = pd.read_excel("file.xlsx",
                   sheet_name = "Sheet1",
                   skiprows = range(0, 10) # ignora as primeiras 10 linhas do excel
                   )
  • Thank you very much!!! I could get that data from (alt. lat. Lon.) before?

  • 1

    @14 To get this data, I think it would be better if you made a separate program to open as a string each spreadsheet and take the data that interests you using regex. There is a tutorial for this module at this link: https://www.vooo.pro/insights/tutorial-sobre-expressoes-regulares-para-beginners-em-python/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.