1
I need to read files TXT and remove from them names of persons and their respective "functions" of the text. They are minutes of hearing, where I need to find the name of the parties (Plaintiff and Respondent) and the name of the parties' Attorneys. Defini find_between to find a string between two substrings, and then I kept them in process. In the olhômetro I noticed that most have a part of the text where there is a pattern where the name of the part is cited followed by the name of the lawyer who represents it.
I put here in the drive some examples of text I’m using: https://drive.google.com/open?id=1YvDdPeZdESvwyag_7jQ-doda6PievBsX
caminho = 'temp'
lista_de_nomes = os.listdir(caminho)
objeto_processo = {}
def find_between(text, first, last):
try:
start = text.index(first) + len(first)
end = text.index(last, start)
return text[start:end]
except ValueError:
return ""
for txt in lista_de_nomes:
with open(path + '\\' + txt, "r") as content:
text = content.read()
partes = find_between(text,"preposto", "/sp")
objeto_processo["Partes&Avogados"] = partes
print(objeto_processo)
What I wanted to know is how I can extract this information from object_process, and transfer to an excel that contains for example information:
Plaintiff | Plaintiff’s Adv | Plaintiff’s Adv | Plaintiff’s Adv
PS: Sometimes the instead of people names are company names
Which of the two file types is really the default? Some are with escaped t n r characters and some are not.
– Franklin Timóteo