There is no magic, tshark was smart and read the file in pieces using pointers.
tshark was written in C and certainly has a better performance in loops than python the fact is that tshark had to allocate pieces or buffers in memory to read the file piece by piece and go separating the data within the range of interest.
That line pcap = dpkt.pcap.Reader(f)
tells Python to read the entire file and put everything in the pcap variable, ie if it turns to allocate 5GB of data :-(
The smart way to do it is to move the reading pointer to some other part of the file so that you can read from the pointed location.
In python it is possible to do this:
from scapy.all import *
import dpkt
f = open("capture21dez2016.pcap")
pcap = f.read(4096)
while pcap:
#processe cada pedaço aqui
pcap = f.read(4096)
f.close()
Look at the line pcap = f.read(4096)
we are opening the file by pieces, to be exact every 4096 bytes, the f.read()
uses pointer to know where exactly was the last position read to always start reading the file from the last position, you can define how many bytes at a time want to read, I used 4096 to exemplify. You can continue using the code of this reply to find your range of interest, convert your date and time of interest into timestamp to make it easier and remember if you have already found the data within the desired range you can exit the loop and no longer need to read the rest of the file :-)
one of the problems I’m facing here in this project is precisely the use of memory. When I ran the script, the consumption beat 95% RAM (the notebook has 8GB).... I will test here. Thank you very much
– Ed S