Extract specific information from an excel cell using pandas in Python

Asked

Viewed 365 times

0

I need to extract information from a cell in excel using pandas and pyautogui. The cell has a date contained and accurate take out the numbers without the "/" bars. Example: 25/12/2000.

What I do at the moment is to look at what the user typed in the cell date of birth and manually separate the numbers 25, 12 and 2000 in other 3 cells (day month of year)and use the pandas to access these 3 cells and get the desired effect, as direct access with pandas to cellula date of birth?

Today I manually set and separate the date into 3 cells and search the information in these cells

My Spreadsheet:

          A           B                  C                 D     E     F #transformo a celula C em D, E e F
1        nome     sobrenome      data de nascimento       dia   mes   ano
2        Joao       Mendes          25/12/2000            25    12    2000

I want a function that searches the information of day, month and year of the Cell date of birth and bring me that information one at a time and not all at once.

import pyautogui import pandas as pd

formulario = pd.read_excel(r'C:\formulario.xlsx', sheet_name='Planilha1')
## Uma amostra de dados para teste...
#formulario = pd.DataFrame({
#  'nome':['João'], 
#  'sobrenome':['Mendes'] , 
#  'data de nascimento': ['25/12/2000']
#})

for i in range(1):
        pyautogui.write(str(formulario['dia'] [0]))
        pyautogui.write(str(formulario['mes'] [0]))
        pyautogui.write(str(formulario['ano'] [0]))

Solved:

import pandas as pd
df = pd.DataFrame({'data de nascimento': ['25/12/2000']})

data = df['data de nascimento'] [0]
dia = data[:2]
mes = data[3:5]
ano = data[6:]

print(dia)
print(mes)
print(ano)

1 answer

1


For from a column containing birth dates on string in format dd/mm/yyyy to be separated into columns containing day, month and year of that date:

  1. Using a Dataframe of similar structure to that of the question....
import pandas as pd    
df = pd.DataFrame({
  'nome':['João', 'Maria', 'Daniel'], 
  'sobrenome':['Mendes','Tavares', 'Cunha'] , 
  'data de nascimento': ['25/12/2000', '21/03/1999', '22/10/1981']
})
  1. Like the data in the column data de nascimento are in string format create a temporary column data where you will receive the data conversion in the column data de nascimento using the method pandas.to_datetime()
df['data'] = pd.to_datetime(df['data de nascimento'], format='%d/%m/%Y')
  1. Create the columns dia, mês and ano in another Dataframe with the help of properties Series.dt.day, Series.dt.month and Series.dt.year and join the Dataframe original using the method DataFrame.join().
df = df.join(pd.DataFrame({
  'dia': df['data'].dt.day, 
  'mês': df['data'].dt.month, 
  'ano': df['data'].dt.year
}));
  1. Remove the temporary column data as the method DataFrame.drop()
df = df.drop(columns=['data'])
  1. Whose outcome:
print(df)

     nome sobrenome data de nascimento  dia  mês   ano
0    João    Mendes         25/12/2000   25   12  2000
1   Maria   Tavares         21/03/1999   21    3  1999
2  Daniel     Cunha         22/10/1981   22   10  1981

EDIT:

It seems that the user has difficulty reading the data from the columns of a Dataframe. There are several possibilities one of them is DataFrame.iterrows iterating on Dataframe lines as pairs (index, line).

for i, row in df.iterrows():
  print(f'Registro: {i + 1}')
  print(f'Nome: {row["nome"]}')
  print(f'Sobrenome: {row["sobrenome"]}')
  print(f'Dia : {row["dia"]}')
  print(f'Mês : {row["mês"]}')
  print(f'Ano : {row["ano"]}')
  print('-'*30)

Exit:

Registro: 1
Nome: João
Sobrenome: Mendes
Dia : 25
Mês : 12
Ano : 2000
------------------------------
Registro: 2
Nome: Maria
Sobrenome: Tavares
Dia : 21
Mês : 3
Ano : 1999
------------------------------
Registro: 3
Nome: Daniel
Sobrenome: Cunha
Dia : 22
Mês : 10
Ano : 1981
------------------------------

Test the example on Repl.it

  • I want to use 'date of birth' data and not create 'day' month and 'year'. I want some command that reads only the day, then the month and then the year and can print it in the program.

  • Take another look at example I made a modification and see if that’s what you want to do.

  • I was able to solve it like this: import pandas as pd
df = pd.DataFrame({'data de nascimento': ['25/12/2000']})

data = df['data de nascimento'] [0]
dia = data[:2]
mes = data[3:5]
ano = data[6:]

print(dia)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.