0
I have a file . txt with 2000 lines (Whatsapp chat) from where I need to extract to a pandas dataframe the date, time and sender of the message. I can do this with the function below:
def parse(file):
data = re.search(r'\d{2}/\d{2}/\d{4}',file )
hora = re.search(r'\d{2}:\d{2}', file)
pessoa = re.search(r'(?<=\-)(.*?)(?=\:)',file)
return data.group(0), hora.group(0), pessoa.group(0)
which works perfectly for a line of the type:
file = ('20/05/2020 20:35 - Rodrigo Toledo:')
parse(file)
But I want a way to apply the parse function to all lines of the file . txt, and then turn it into a dataframe.
Could you give an example of an error? Another type of data that your code should work with
– Evilmaax
The code should always work with a txt file whose lines follow the pattern ('20/05/2020 20:35 - Rodrigo Toledo:'). txt has 2000 lines, so the parse function will need to traverse these 2000 lines, savanda each line executed in another file that will serve as a basis for creating a dataframe.
– StatsPy