Problem reading file and turns it into a key and value dictionary

Asked

Viewed 49 times

0

I have the following problem when going through a text file and I am not able to solve, I believe it is simple the solution, more need help to get to it. I have the following code:

import re



dic = {}

line = "rspamd_task_write_log: id: <CAP7Ane7t3GqwhbdnkcRiRD4vTR8wRVt=6yWTe7XYt6UC9yzjAQ@mail.test.com>, qid: <48H2js00X4zRj01>, ip: 209.85.208.65, from: <[email protected]>,\
       (default: F (no action): [-2.43/15.00] [IP_SCORE(-2.93){ip: (-2.32), ipnet: 123.123.123.0/17\
        (-4.14), asn: 15169(-3.28), country: US(-0.04);},SUSPICIOUS_RECIPS(1.50){},DMARC_POLICY_ALLOW(-0.50){test.com;none;},R_DKIM_ALLOW(-0.20){test.com:s=20161025;},R_SPF_ALLOW(\
        -0.20){+ip4:123.123.123.0/17;},MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;},ARC_NA(0.00){},ASN(0.00){asn:15169, ipnet:123.123.123.0/17, country:US;},DKI\
        M_TRACE(0.00){test.com:+;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},HAS_ATTACHMENT(0.00){},MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;},PREVIOUSLY_DELIVERED(0.00){test@\
        test.com;},RCPT_COUNT_GT_50(0.00){174;},RCVD_COUNT_TWO(0.00){2;},RCVD_TLS_ALL(0.00){},TAGGED_RCPT(0.00){},TO_DN_SOME(0.00){},TO_MATCH_ENVRCPT_SOME(0.00){}]), len: 65925\
        08, time: 379.997ms real, 201.985ms virtual, dns req: 23, digest: <3ea7e074fbea648462253b1522858d71>, rcpts: <[email protected]>, mime_rcpts: <test@test\
        l.com,[email protected],[email protected],...>"


regexp = '(\]\ \[).*(\]\))'
pontuacao = '(\[-?\d*\.\d*\/)'
status = '(\:\s.\s\(.*\)\:\s)'
_id = '(\w{3}\:\s<\w*>,\s\w{2}:)'

score = re.findall(pontuacao, line)[0].strip('[').strip('/')
action = re.findall(status, line)[0].split('(')[1].strip('): ')
qid = re.findall(_id, line)[0].split('<')[1].split('>')[0]
signature = re.search(regexp, line).group().split(',')


dic[qid] = {'score': score, 'action': action, 'metrics': signature}

print(dic)

That you’re returning me to the next exit which is exactly what I need

{'48H2js00X4zRj01': {'score': '-2.43', 'action': 'no action', 'metrics': ['] [IP_SCORE(-2.93){ip: (-2.32)', ' ipnet: 123.123.123.0/17        (-4.14)', ' asn: 15169(-3.28)', ' country: US(-0.04);}', 'SUSPICIOUS_RECIPS(1.50){}', 'DMARC_POLICY_ALLOW(-0.50){test.com;none;}', 'R_DKIM_ALLOW(-0.20){test.com:s=20161025;}', 'R_SPF_ALLOW(        -0.20){+ip4:123.123.123.0/17;}', 'MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;}', 'ARC_NA(0.00){}', 'ASN(0.00){asn:15169', ' ipnet:123.123.123.0/17', ' country:US;}', 'DKI        M_TRACE(0.00){test.com:+;}', 'FROM_EQ_ENVFROM(0.00){}', 'FROM_HAS_DN(0.00){}', 'HAS_ATTACHMENT(0.00){}', 'MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;}', 'PREVIOUSLY_DELIVERED(0.00){test@        test.com;}', 'RCPT_COUNT_GT_50(0.00){174;}', 'RCVD_COUNT_TWO(0.00){2;}', 'RCVD_TLS_ALL(0.00){}', 'TAGGED_RCPT(0.00){}', 'TO_DN_SOME(0.00){}', 'TO_MATCH_ENVRCPT_SOME(0.00){}])']}}

more as I understand it only works if it is for a string, in the case of a multi-line file does not work, because the file is a list if I use 'readlines''.

Can someone give me a help to make it work in a multi-line file and to save in a json format.

Note: All lines will be in the same pattern as this in line string.

Thanks in advance!

1 answer

2


Well, I guess just make an iteration using the readlinesas an iterator. Behold:

import re

regexp = '(\]\ \[).*(\]\))'
pontuacao = '(\[-?\d*\.\d*\/)'
status = '(\:\s.\s\(.*\)\:\s)'
_id = '(\w{3}\:\s<\w*>,\s\w{2}:)'

f=open('stack.txt', 'r')
lines = f.readlines()
f.close()

dic={}
for line in lines:
    score = re.findall(pontuacao, line)[0].strip('[').strip('/')
    action = re.findall(status, line)[0].split('(')[1].strip('): ')
    qid = re.findall(_id, line)[0].split('<')[1].split('>')[0]
    signature = re.search(regexp, line).group().split(',')

    dic[qid] = {'score': score, 'action': action, 'metrics': signature}    

dic.keys()

Output:

dict_keys(['48H2js00X4zRj01', '60H2js00X4zRj01'])

NOTE: I created a txt file copied the line you used as example, changing only the qid. The file used was the following (no spacing, called stack.txt):

rspamd_task_write_log: id: <CAP7Ane7t3GqwhbdnkcRiRD4vTR8wRVt=6yWTe7XYt6UC9yzjAQ@mail.test.com>, qid: <48H2js00X4zRj01>, ip: 209.85.208.65, from: <[email protected]>,\(default: F (no action): [-2.43/15.00] [IP_SCORE(-2.93){ip: (-2.32), ipnet: 123.123.123.0/17\(-4.14), asn: 15169(-3.28), country: US(-0.04);},SUSPICIOUS_RECIPS(1.50){},DMARC_POLICY_ALLOW(-0.50){test.com;none;},R_DKIM_ALLOW(-0.20){test.com:s=20161025;},R_SPF_ALLOW(\ -0.20){+ip4:123.123.123.0/17;},MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;},ARC_NA(0.00){},ASN(0.00){asn:15169, ipnet:123.123.123.0/17, country:US;},DKI\M_TRACE(0.00){test.com:+;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},HAS_ATTACHMENT(0.00){},MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;},PREVIOUSLY_DELIVERED(0.00){test@\test.com;},RCPT_COUNT_GT_50(0.00){174;},RCVD_COUNT_TWO(0.00){2;},RCVD_TLS_ALL(0.00){},TAGGED_RCPT(0.00){},TO_DN_SOME(0.00){},TO_MATCH_ENVRCPT_SOME(0.00){}]), len: 65925\08, time: 379.997ms real, 201.985ms virtual, dns req: 23, digest: <3ea7e074fbea648462253b1522858d71>, rcpts: <[email protected]>, mime_rcpts: <test@test\l.com,[email protected],[email protected],...>

rspamd_task_write_log: id: <CAP7Ane7t3GqwhbdnkcRiRD4vTR8wRVt=6yWTe7XYt6UC9yzjAQ@mail.test.com>, qid: <60H2js00X4zRj01>, ip: 209.85.208.65, from: <[email protected]>,\(default: F (no action): [-2.43/15.00] [IP_SCORE(-2.93){ip: (-2.32), ipnet: 123.123.123.0/17\(-4.14), asn: 15169(-3.28), country: US(-0.04);},SUSPICIOUS_RECIPS(1.50){},DMARC_POLICY_ALLOW(-0.50){test.com;none;},R_DKIM_ALLOW(-0.20){test.com:s=20161025;},R_SPF_ALLOW(\ -0.20){+ip4:123.123.123.0/17;},MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;},ARC_NA(0.00){},ASN(0.00){asn:15169, ipnet:123.123.123.0/17, country:US;},DKI\M_TRACE(0.00){test.com:+;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},HAS_ATTACHMENT(0.00){},MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;},PREVIOUSLY_DELIVERED(0.00){test@\test.com;},RCPT_COUNT_GT_50(0.00){174;},RCVD_COUNT_TWO(0.00){2;},RCVD_TLS_ALL(0.00){},TAGGED_RCPT(0.00){},TO_DN_SOME(0.00){},TO_MATCH_ENVRCPT_SOME(0.00){}]), len: 65925\08, time: 379.997ms real, 201.985ms virtual, dns req: 23, digest: <3ea7e074fbea648462253b1522858d71>, rcpts: <[email protected]>, mime_rcpts: <test@test\l.com,[email protected],[email protected],...>
  • thanks a lot for the help friend, it really worked I think I got it wrong at the time of code indentation but it helped me very much friend thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.