How to filter PCAP file with Python?

Asked

Viewed 322 times

0

The PCAP file was generated on 12/21/2016 and is 5 GB so it is impractical to try to open it with wireshark (graphical user interface)

I installed tshark on Ubuntu and while reading the manual, I tried to make the following filter:

 tshark -r capture21dez2016.pcap -Y '((frame.time >= "2016-12-21 11:15:00") && (frame.time <= "2016-12-21 12:14:00.000000000"))'  -w 11h15_12h14_semAtaques.pcap

And it worked. How to use the above filter in the Python code below?

from scapy.all import *
import dpkt

f = open("capture21dez2016.pcap")
pcap = dpkt.pcap.Reader(f)
f.close()

1 answer

2


There is no magic, tshark was smart and read the file in pieces using pointers.

tshark was written in C and certainly has a better performance in loops than python the fact is that tshark had to allocate pieces or buffers in memory to read the file piece by piece and go separating the data within the range of interest.

That line pcap = dpkt.pcap.Reader(f) tells Python to read the entire file and put everything in the pcap variable, ie if it turns to allocate 5GB of data :-(

The smart way to do it is to move the reading pointer to some other part of the file so that you can read from the pointed location.

In python it is possible to do this:

from scapy.all import *
import dpkt

f = open("capture21dez2016.pcap")


pcap = f.read(4096)
while pcap:

    #processe cada pedaço aqui

    pcap = f.read(4096)

f.close()

Look at the line pcap = f.read(4096) we are opening the file by pieces, to be exact every 4096 bytes, the f.read() uses pointer to know where exactly was the last position read to always start reading the file from the last position, you can define how many bytes at a time want to read, I used 4096 to exemplify. You can continue using the code of this reply to find your range of interest, convert your date and time of interest into timestamp to make it easier and remember if you have already found the data within the desired range you can exit the loop and no longer need to read the rest of the file :-)

  • one of the problems I’m facing here in this project is precisely the use of memory. When I ran the script, the consumption beat 95% RAM (the notebook has 8GB).... I will test here. Thank you very much

Browser other questions tagged

You are not signed in. Login or sign up in order to post.