How to handle an address base

Asked

Viewed 128 times

1

I have a database where the CPF is the PK and there is address information (such as zip code, street, number, etc). I would like to find people who live in the same residence, comparing street and number. However, there are situations in which the names of the streets were injected in a similar way, but not equal. Example: "Avenida Paulista" and "Av. Paulista". This makes it difficult to treat the base. Would anyone have any suggestions to help? From now on, thank you!! =)

  • Is the ZIP code one of the fields in the database? If you do, may I suggest a response using fuzzywuzzy. Another option is: if you can change the database, you can search all occurrences of ["Avenue","Av.",...] and exchange for what you think is best. The same for ["Street","R.",...].

  • What are you using? Pure Python? No framework?

2 answers

1

If you have Pandas installed (Pandas is a Python-compatible database library), you can do this with the following code:

import pandas as pd


Dicionario_BancoDeDados = {'123.456.789-10' : {'CEP': '11.111-000', 'Logradouro': 'Av. Paulista', 'Numero': 99},
                           '123.456.789-11' : {'CEP': '11.111-001', 'Logradouro': 'Av. Paulista', 'Numero': 99},
                           '123.456.789-12' : {'CEP': '11.111-002', 'Logradouro': 'Av. Distante', 'Numero': 99}}


BancoDeDados = pd.DataFrame.from_dict(Dicionario_BancoDeDados).T     # .T é a operação de transposição


BancoDeDados

inserir a descrição da imagem aqui

You can find people who live in the same residence by executing:

VizinhosDePredio = BancoDeDados[BancoDeDados.duplicated(['Logradouro', 'Numero'], keep=False) == True]

VizinhosDePredio

inserir a descrição da imagem aqui

In order to circumvent doubts in street names, I suggest you do so before searching for neighbors. I show you below such an implementation:

Dicionario_BancoDeDados = {'123.456.789-10' : {'CEP': '11.111-000', 'Logradouro': 'Av. Paulista',     'Numero': 99},
                           '123.456.789-11' : {'CEP': '11.111-001', 'Logradouro': 'Avenida Paulista', 'Numero': 99},
                           '123.456.789-12' : {'CEP': '11.111-002', 'Logradouro': 'Av. Distante',     'Numero': 99}}


BancoDeDados = pd.DataFrame.from_dict(Dicionario_BancoDeDados).T



def AbreviarLogradouro(Logradouro):

    Logradouro = Logradouro.replace('Avenida', 'Av.')
    Logradouro = Logradouro.replace('Rua', 'R.' )

    return Logradouro



BancoDeDados['Logradouro'] = BancoDeDados['Logradouro'].map(lambda x: AbreviarLogradouro(x))

BancoDeDados

inserir a descrição da imagem aqui

  • (disregard inconsistency of different Zip Codes for equal addresses :)

0

I believe what you need is to use the command LIKE in your sql query. As there are different forms for the text inserted in the street patio column, you could make use of the command mentioned above, follows example below:

SELECT *campo1, campo2...*, FROM *table* WHERE UPPER(logradouro) LIKE UPPER ('%Av%paulista%')

Or

SELECT *campo1, campo2...*, FROM *table* WHERE LOWER(logradouro) LIKE LOWER('%Av%paulista%')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.