First you load the CVE list from the file lista.txt
:
# carregar lista de CVE
with open('lista.txt') as arq:
# remover as quebras de linha do final das linhas
cve_list = [ linha.rstrip() for linha in arq ]
Note the use of with
to ensure that the file is closed at the end.
Then you scroll through the XML and for each CVE you check if it is already in the list. If it is not, add in a list of new CVE’s found:
novos = [] # guardar os novos CVE's encontrados
import xml.etree.ElementTree as ET
tree = ET.parse('cve.xml')
root = tree.getroot() # root (ExploitPackList)
for canvas in root: # para cada CANVASExploitPack
exploits = canvas[0] # pega a tag Exploits
for exploit in exploits: # para cada Exploit
cve = exploit.attrib['cve']
if cve not in cve_list and cve not in novos:
print('Novo CVE encontrado:', cve)
novos.append(cve)
Here I am assuming that the structure is exactly the one in the file:
<ExploitPackList>
<CANVASExploitPack date="Fri Jul 5 11:03:08 2013" name="White_Phosphorus">
<Exploits>
várias tags <Exploit> contendo o CVE...
</Exploits>
</CANVASExploitPack>
<CANVASExploitPack date="Fri Jul 5 11:03:08 2013" name="CANVAS">
<Exploits>
<Exploit cve="CVE-2019-5056" desc="Open-Realty <= 2.4.3 Remote Code Execution" name="openrealty_exec"/>
várias tags <Exploit> contendo o CVE...
That is, inside of <ExploitPackList>
may have several <CANVASExploitPack>
, which in turn has only one <Exploits>
, containing multiple tags <Exploit>
.
I’m also checking that the CVE is not on the list obtained from lista.txt
and also not on the list of new CVE’s found (I do not know if there is repetition in this file, so it may be that checking the list of new ones is redundant).
Then you take the list of new CVE’s and add at the end of the file (or print the message that none of them were found, if the list is empty):
if novos: # se a lista de novos não está vazia: https://docs.python.org/3/library/stdtypes.html#truth-value-testing
with open('lista.txt', 'a') as arq: # opção "a" para adicionar conteúdo no final do arquivo
for cve in novos:
arq.write(f'\n{cve}')
else:
print('Sem CVE Novo')
Of course you could check with if len(novos) > 0:
, but how an empty list is considered False
, I can just do if novos
to find out if the list novos
has some element.
I saw that file lista.txt
does not end with line break, so I included the \n
before CVE. Thus the last CVE will not have the line break after it, and successive executions of the program will keep the file with a CVE per line.
One detail is that in XML there are several CVE’s that do not start with "CVE-". If you want to validate this format as well, you can use a regex:
import xml.etree.ElementTree as ET
import re
r = re.compile(r'^CVE-\d{4}-\d{4,7}$')
tree = ET.parse('cve.xml')
root = tree.getroot()
for canvas in root:
exploits = canvas[0]
for exploit in exploits:
cve = exploit.attrib['cve']
if r.match(cve) and cve not in cve_list and cve not in novos:
print('Novo CVE encontrado:', cve)
novos.append(cve)
In this case, the markers ^
and $
indicate respectively the beginning and end of the string, thus ensuring that it only has what is indicated in regex.
The shortcut \d
corresponds to digits and quantifiers {4}
and {4,7}
indicate, respectively, "exactly 4" and "at least 4, at most 7".
I would use regex only for this validation. I know a lot of people must think that they could also use it to fetch XML data, something like:
r = re.compile(r'<Exploit cve="(CVE-\d{4}-\d{4,7})"')
with open('cve.xml') as arq:
for linha in arq:
m = r.search(linha)
if m: # se encontrou CVE no formato indicado
cve = m.group(1)
# verifica se está na lista, etc...
For simple cases it may even work, but regex is not the right tool for this task. Just change a little the XML that no longer works. For example, and if you have a commented snippet:
<!--
<Exploit cve="CVE-2020-2240" desc="IBM Lotus Domino Web Server Accept-Language HTTP Header Buffer Overflow Vulnerability" name="d2sec_lotus_domino_http"/>
<Exploit cve="CVE-2016-0915" desc="IBM Lotus Domino iCalendar Meeting Request Stack Overflow Vulnerability" name="d2sec_lotuscal2"/>
<Exploit cve="CVE-2019-4467" desc="Oracle JInitiator ActiveX Buffer Overflow" name="d2sec_jinitiator"/>
-->
The xml.etree.ElementTree
can correctly detect and ignore the above tags, but regex cannot. This is because the parser can analyze the context in which each tag is, but regex only evaluates the passage that we indicate (<Exploit cve="etc...
).
It is even possible to make a regex to check if the excerpt is within a comment, but is it worth doing something like that, while using the parser the code is the same (besides being much simpler)?
And this is just a case, because there are several other situations that will require you to change the regex, and it will become more and more complicated (see some examples here and here - although these links deal with HTML, the same explanations apply to XML).
Regex is cool, I like it a lot, but is not always the best solution.