Import CSV and convert date in dd/mm/yyyy format to yyyy-mm-dd

Asked

Viewed 1,580 times

2

I’m a complete beginner in programming, both in Python and Sql, so I have some questions that may seem basic. I get a file. CSV that comes a lot of information not compatible with Sql so always have to change manually and import using the following command:

BULK INSERT clientes
FROM 'C:\import\clientes.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\import\clientesErro.csv',
TABLOCK
)

(this example I took from the internet but that’s what I use) it happens that, I have some columns with date format and I would like to convert the dates that come in dd/mm/yya to aaa-mm-dd. I would also need to convert numbers where decimals are separated with "," example:1001.10 to 1001.10 but I think by solving the date problem I can also solve this one.

  • The statement "would like to convert the dates that come in dd/mm/yyyy to yyyy-mm-dd which is the standard accepted by sql as date format" is not correct. SQL Server accepts several input formats and stores the dates internally in a totally different format, which even varies with the type of data used (date, datetime, datetime2 etc). // To provide the date in the dd/mm/yyyy format, one form is Convert(date, entry column, 103).

  • Hello, I removed the part "which is the standard accepted by sql as date format" from my question, I think it got better.

2 answers

2


I managed to solve it this way:

(at this point I am considering that you have already managed to make the connection with your database).

Import these guys:

import os
#esse __init__é o arquivo onde salvei as conexões com o banco de dados
from __init__ import mycursor, connection
# csv é o modulo que é usado para ler o arquivo csv
import csv
# modulo datetime é usado para tratar strings como formato de data
from datetime import datetime, date, timedelta, datetime

after importing I used this command: Note: this command is not required, I only used it because my code needed to read the same file several times, so better save to a variable.

caminho_arquivo = './arquivos/seu_arquivo.txt'

now let’s read the file: Attention to the encoding I used latin-1 because my file was saved in this pattern, it may be that yours is in another encoding and this may cause errors in the accents and 'ç'.

def ler_e_inserir_arquivo():
    with open(caminho_arquivo, newline='', encoding="latin-1") as arquivo:
        conteudo = csv.reader(arquivo, delimiter=';')
        for coluna in conteudo:
            query_com_insert = (
                "INSERT INTO nome_tabela(cd_cred,operacao,sigla_assessoria,data_atual)" "VALUES (%(cd_cred)s, %(operacao)s,%(sigla_assessoria)s,%(data_atual)s) "
            )
            tratamento_dos_dados = {
                'cd_cred': coluna[0],
                'operacao': coluna[1],
                'sigla_assessoria': coluna[2],
                'data_atual': datetime.strptime(coluna[3], "%d/%m/%Y").strftime("%Y-%m-%d"),
            }
            mycursor.execute(query_com_insert, tratamento_dos_dados)
            connection.commit()

We go now by parts:

At this point I am telling you I want to enter the data (same query of sql)

INSERT INTO nome_tabela(cd_cred,operacao,sigla_assessoria,data_atual)

At this point I am passing where the values are, note that there is a %(name)s. Inside these relatives is the name of the field I defined in the data dictionary.

"VALUES (%(cd_cred)s, %(operacao)s,%(sigla_assessoria)s,%(data_atual)s)

the "treatment_dos_data" is my dictionary in it I assign a key and then its value:

when the query searches in the dictionary the word 'cd_cred' I will pass the information that is in column[0] of my file (remembering that in python the lists start at 0).

Now it comes apart that all sought, how to transform the date:

'data_atual': datetime.strptime(coluna[3], "%d/%m/%Y").strftime("%Y-%m-%d")

I used the datetime module and "said" to Payton that that string that was in column[3] should be treated as dd/mm/yyyy date formate (20/12/2017 for example) and then in sequence I use . strftime am asking python to convert my date to aaa-mm-dd standard (2017-12-20 for example).

After that step to the cursor that was set in the file init execute the commands:

        mycursor.execute(query_com_insert, tratamento_dos_dados)
        connection.commit()

ready. Now just put at the end of your file already outside the loop repeat the command:

ler_e_inserir_file()

So you are 'calling' the function that was set in the previous steps. I hope I have helped.

0

It is important to have familiarity with regular expressions:

import re

The sentence below changes the date format:

re.sub('(\d{2})\/(\d{2})\/(\d{4})', r"\3-\2-\1", '19/05/2017')

Upshot:

>>> re.sub('(\d{2})\/(\d{2})\/(\d{4})', r"\3-\2-\1", '19/05/2017')
'2017-05-19'

In the case of financial values, it would be something like this:

re.sub('(\d+),(\d{2})', r"\1.\2", '1001,10')

Upshot:

>>> re.sub('(\d+),(\d{2})', r"\1.\2", '1001,10')
'1001.10'
  • Hello Gypsy, thank you very much for the comment. This solution would be applied in Python right? In case I need to create something in Python to edit . csv and insert these lines you suggested? I was thinking of something via sql and/or in the future to migrate the edit to Python

  • Yes, it’s for Python. You need to create a script .py and run it by passing the name of the SQL file as parameter. Note that, for the re.sub, the third parameter should be replaced by the contents of your file.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.