Extract names and import into an excel spreadsheet

Asked

Viewed 245 times

1

I need to read files TXT and remove from them names of persons and their respective "functions" of the text. They are minutes of hearing, where I need to find the name of the parties (Plaintiff and Respondent) and the name of the parties' Attorneys. Defini find_between to find a string between two substrings, and then I kept them in process. In the olhômetro I noticed that most have a part of the text where there is a pattern where the name of the part is cited followed by the name of the lawyer who represents it.

I put here in the drive some examples of text I’m using: https://drive.google.com/open?id=1YvDdPeZdESvwyag_7jQ-doda6PievBsX

caminho = 'temp'
lista_de_nomes = os.listdir(caminho)
objeto_processo = {}


def find_between(text, first, last):
    try:
        start = text.index(first) + len(first)
        end = text.index(last, start)
        return text[start:end]
    except ValueError:
        return ""

for txt in lista_de_nomes:
    with open(path + '\\' + txt, "r") as content:
        text = content.read()
        partes = find_between(text,"preposto", "/sp")
        objeto_processo["Partes&Avogados"] = partes
        print(objeto_processo)

What I wanted to know is how I can extract this information from object_process, and transfer to an excel that contains for example information:

Plaintiff | Plaintiff’s Adv | Plaintiff’s Adv | Plaintiff’s Adv

PS: Sometimes the instead of people names are company names

  • Which of the two file types is really the default? Some are with escaped t n r characters and some are not.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.