0
I have the following problem when going through a text file and I am not able to solve, I believe it is simple the solution, more need help to get to it. I have the following code:
import re
dic = {}
line = "rspamd_task_write_log: id: <CAP7Ane7t3GqwhbdnkcRiRD4vTR8wRVt=6yWTe7XYt6UC9yzjAQ@mail.test.com>, qid: <48H2js00X4zRj01>, ip: 209.85.208.65, from: <[email protected]>,\
(default: F (no action): [-2.43/15.00] [IP_SCORE(-2.93){ip: (-2.32), ipnet: 123.123.123.0/17\
(-4.14), asn: 15169(-3.28), country: US(-0.04);},SUSPICIOUS_RECIPS(1.50){},DMARC_POLICY_ALLOW(-0.50){test.com;none;},R_DKIM_ALLOW(-0.20){test.com:s=20161025;},R_SPF_ALLOW(\
-0.20){+ip4:123.123.123.0/17;},MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;},ARC_NA(0.00){},ASN(0.00){asn:15169, ipnet:123.123.123.0/17, country:US;},DKI\
M_TRACE(0.00){test.com:+;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},HAS_ATTACHMENT(0.00){},MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;},PREVIOUSLY_DELIVERED(0.00){test@\
test.com;},RCPT_COUNT_GT_50(0.00){174;},RCVD_COUNT_TWO(0.00){2;},RCVD_TLS_ALL(0.00){},TAGGED_RCPT(0.00){},TO_DN_SOME(0.00){},TO_MATCH_ENVRCPT_SOME(0.00){}]), len: 65925\
08, time: 379.997ms real, 201.985ms virtual, dns req: 23, digest: <3ea7e074fbea648462253b1522858d71>, rcpts: <[email protected]>, mime_rcpts: <test@test\
l.com,[email protected],[email protected],...>"
regexp = '(\]\ \[).*(\]\))'
pontuacao = '(\[-?\d*\.\d*\/)'
status = '(\:\s.\s\(.*\)\:\s)'
_id = '(\w{3}\:\s<\w*>,\s\w{2}:)'
score = re.findall(pontuacao, line)[0].strip('[').strip('/')
action = re.findall(status, line)[0].split('(')[1].strip('): ')
qid = re.findall(_id, line)[0].split('<')[1].split('>')[0]
signature = re.search(regexp, line).group().split(',')
dic[qid] = {'score': score, 'action': action, 'metrics': signature}
print(dic)
That you’re returning me to the next exit which is exactly what I need
{'48H2js00X4zRj01': {'score': '-2.43', 'action': 'no action', 'metrics': ['] [IP_SCORE(-2.93){ip: (-2.32)', ' ipnet: 123.123.123.0/17 (-4.14)', ' asn: 15169(-3.28)', ' country: US(-0.04);}', 'SUSPICIOUS_RECIPS(1.50){}', 'DMARC_POLICY_ALLOW(-0.50){test.com;none;}', 'R_DKIM_ALLOW(-0.20){test.com:s=20161025;}', 'R_SPF_ALLOW( -0.20){+ip4:123.123.123.0/17;}', 'MIME_GOOD(-0.10){multipart/mixed;multipart/alternative;text/plain;}', 'ARC_NA(0.00){}', 'ASN(0.00){asn:15169', ' ipnet:123.123.123.0/17', ' country:US;}', 'DKI M_TRACE(0.00){test.com:+;}', 'FROM_EQ_ENVFROM(0.00){}', 'FROM_HAS_DN(0.00){}', 'HAS_ATTACHMENT(0.00){}', 'MIME_TRACE(0.00){0:+;1:+;2:+;3:~;4:~;}', 'PREVIOUSLY_DELIVERED(0.00){test@ test.com;}', 'RCPT_COUNT_GT_50(0.00){174;}', 'RCVD_COUNT_TWO(0.00){2;}', 'RCVD_TLS_ALL(0.00){}', 'TAGGED_RCPT(0.00){}', 'TO_DN_SOME(0.00){}', 'TO_MATCH_ENVRCPT_SOME(0.00){}])']}}
more as I understand it only works if it is for a string, in the case of a multi-line file does not work, because the file is a list if I use 'readlines''.
Can someone give me a help to make it work in a multi-line file and to save in a json format.
Note: All lines will be in the same pattern as this in line string.
Thanks in advance!
thanks a lot for the help friend, it really worked I think I got it wrong at the time of code indentation but it helped me very much friend thank you!
– Léo B