Python - take data from file . txt with regex

Asked

Viewed 1,771 times

0

hello, I am trying to make a program but I am not able to remove the data from the file . txt, would be more or less these in bold.

PokerStars Hand #135235596385: Tournament #1228747530, $0.23+$0.02 USD Hold'em No Limit - Level I (10/20) - 2015/05/14 3:30:05 BRT [2015/05/14 2:30:05 ET]

guy. I have to get the 'Hand' key with the dice after '#' ending in ':', the same thing for 'Tournament' caught data after '#' and ending in ',' to save in a dictionary...

d = {'Hand' 135235596385, 'Tournament' 1228747530 }

would be more or less that.

  • 2

    What have you tried? Is there a pattern to the data or content you want to extract?

  • I don’t understand the regex rsrs, the Padra would be... in the first linham after "Hand #" catch "135235596385" after "Tournament #" catch "1228747530" I think that only with this already gives me to turn around and implement to get the other data.

  • 1

    Provide an example on the question (click the edit button) of how the "extracted" data should look inside the array.

  • Adelson, the question remains very vague. Take a look here: https://regex101.com/r/lZ4vZ9/1. and you can [Edit] the question to add more details to clarify.

  • @Adelsoninácio, you only need the regex to get this number?

  • I don’t think so, however it is best to take this data.

  • @Adelsoninácio, well I don’t know if this is what you need, but to always get what comes after Tournament #, would look like this ((?<=Tournament s#) d*) , the same thing for the Hand, that’s what it needs?

Show 2 more comments

2 answers

2

Taking into account that the file can contain several lines like this and always the value of Hand comes before the value of Tournament, you can do it this way:

>>> import re
>>> pokerstars = []
>>> with open('my_file.txt', 'r') as f:
...     for line in f:
...         data = (re.findall(r'#(\d+)[:,]', line))
...         pokerstars.append({'hand': data[0], 'tournament': data[1]})
... 
>>> pokerstars
[{'tournament': '1228747530', 'hand': '135235596385'}]

The most pythonica to read a text file is through the context manager (the with in the case).

The method findall package re (regex) returns a list of occurrences that hit with the specified regex.

  • Cool - answer is correct, but there is no reason to do two for - you can perfectly take the data and put it in a dictionary within the first for. The other tip is to avoid using a name variable at most l - because it impairs readability because it is difficult to disntiguier between l, I, 1 (the uppercase "i" in the font where I type the comment here for example is identical to the lowercase "l". The comment display font shows different)

  • You’re right, I improved the code (Y).

0

Just search for the sequence of digits after the keywords. In your example:

>>> str = "PokerStars Hand #135235596385: Tournament #1228747530, $0.23+$0.02 USD Hold'em No Limit - Level I (10/20) - 2015/05/14 3:30:05 BRT [2015/05/14 2:30:05 ET]"

>>> d = {'Hand':       re.search(r'Hand #(\d+)', str).group(1),
         'Tournament': re.search(r'Tournament #(\d+)', str).group(1)}

>>> d
{'Tournament': '1228747530', 'Hand': '135235596385'}

The regex is explained in detail in http://regexr.com/3b2e2

Browser other questions tagged

You are not signed in. Login or sign up in order to post.