Date conversion problem (DD/MM/YYYY to YYYY-MM-DD) with 'pd.to_datetime()' - Python

Asked

Viewed 1,618 times

-3

I’m starting in the world of Python and I’m trying to develop a simple code to identify the expiration period of digital certificates where I work (it’s a personal project, but taking advantage of a real problem). I was able to extract the data from the key server and turn it into a dataframe, but when I try to convert the "valido_ate" column to a date format (pd.to_datetime()), in order to perform calculations with it, I noticed that in some cases the format looks different (It is not in every row of this column that this happened).

Note that in the "valido_de" field, which has the same code and the same data source this did not happen.

Does anyone have any tips?

Note: this is basically my first code, I do not work in the development area, so please review the issue of "clean code" for now, one day I get there hehe :)

#Imports
import requests
import pandas as pd
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import re
import xlsxwriter
from datetime import date, timedelta, datetime
from dateutil.parser import parse

#URL
url = requests.get("url_interna", verify=False)
url.content
soup = BeautifulSoup(url.content, 'html.parser')
html = list(soup.children)[3]

#Variáveis
certificates = soup.find(id="tblCertificates")
title_name = soup.find('td', class_='tdTitle tdTitleName').text
title_subject = soup.find('td', class_='tdTitle tdTitleSubject').text
title_valid_from = soup.find('td', class_='tdTitle tdTitleValidFrom').text
title_valid_to = soup.find('td', class_='tdTitle tdTitleValidTo').text
title_status = soup.find('td', class_='tdTitle tdTitleStatus').text
body_name = soup.find(class_="tdBodyName")
body_subject = soup.find(class_="tdBodySubject").get_text()
body_valid_from = soup.find(class_="tdBodyValidFrom").get_text()[14:24]
body_valid_to = soup.find(class_="tdBodyValidTo").get_text()[14:24]
body_status = soup.find(class_="tdBodyStatus").get_text()
lista_names = certificates.select(".tdBodyName")
lista_subject = certificates.select(".tdBodySubject")
lista_valid_from = certificates.select(".tdBodyValidFrom")
lista_valid_to = certificates.select(".tdBodyValidTo")
lista_status = certificates.select(".tdBodyStatus")
names = [pt.get_text() for pt in lista_subject]
valido_de = [pt.get_text()[14:24] for pt in lista_valid_from]
valido_de = pd.to_datetime(valido_de)
valido_ate = [pt.get_text()[14:24] for pt in lista_valid_to]
valido_ate = pd.to_datetime(valido_ate)
status = pd.to_datetime('today') - valido_ate

data = {
'nome': names,
'valido_de': valido_de,
'valido_ate': valido_ate,
'status': status
}

df = pd.DataFrame(data)

cert_ativos = df['status'] <= '1 days'
cert_expirados = df['status'] >= '1 days'

def corrigir_nomes(nome): #Função para remover o CPF do campo "nome"
    nome = nome.replace('0', '').replace('1', '').replace('2', '').replace('3', '').replace('4','').replace('5','').replace('6','').replace('7','').replace('8', '').replace('9', '').replace(':','')
    return nome
def true_false(data): #Função para transformar True/False em Expirado/Ativo
    data = data.replace('True', 'Expirado').replace('False', 'Ativo')
    return data
df['nome'] = df['nome'].apply(corrigir_nomes)
df['status'] = df['status'].apply(true_false)
  • 3

    For the "down votes", as it is my first publication, it would be interesting a feedback to try to understand what I am doing wrong so I can correct.

2 answers

2


After a few attempts, I was able to solve the problem by forcing the format during the conversion with pd.to_datetime():

Previous: valido_ate = pd.to_datetime(valido_ate)

Correct: valido_ate = pd.to_datetime(valido_ate, format='%d/%m/%Y')

Note: it worked in my case because the format of the date where I was looking for this information was dd/mm/yyyy, so he "understood" that I should keep this structure for the conversion.

Thank you!!!

-1

I had this same problem today and the solution was similar to yours.
However, in my case I was using format = '%d/%m/%y'.
By changing to format = '%d/%m/%Y', the code worked correctly.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.