How to filter PCAP file with Python?

Question

How to filter PCAP file with Python?

Asked 8 years, 9 months ago

Viewed 322 times

0

The PCAP file was generated on 12/21/2016 and is 5 GB so it is impractical to try to open it with wireshark (graphical user interface)

I installed tshark on Ubuntu and while reading the manual, I tried to make the following filter:

 tshark -r capture21dez2016.pcap -Y '((frame.time >= "2016-12-21 11:15:00") && (frame.time <= "2016-12-21 12:14:00.000000000"))'  -w 11h15_12h14_semAtaques.pcap

And it worked. How to use the above filter in the Python code below?

from scapy.all import *
import dpkt

f = open("capture21dez2016.pcap")
pcap = dpkt.pcap.Reader(f)
f.close()

1 answer

Browser other questions tagged python python-3.x python-2.7 networks

You are not signed in. Login or sign up in order to post.

by ederwander • **6,431** points · Answer 1 · 2017-03-05T14:35:42+00:00

There is no magic, tshark was smart and read the file in pieces using pointers.

tshark was written in C and certainly has a better performance in loops than python the fact is that tshark had to allocate pieces or buffers in memory to read the file piece by piece and go separating the data within the range of interest.

That line pcap = dpkt.pcap.Reader(f) tells Python to read the entire file and put everything in the pcap variable, ie if it turns to allocate 5GB of data :-(

The smart way to do it is to move the reading pointer to some other part of the file so that you can read from the pointed location.

In python it is possible to do this:

from scapy.all import *
import dpkt

f = open("capture21dez2016.pcap")


pcap = f.read(4096)
while pcap:

    #processe cada pedaço aqui

    pcap = f.read(4096)

f.close()

Look at the line pcap = f.read(4096) we are opening the file by pieces, to be exact every 4096 bytes, the f.read() uses pointer to know where exactly was the last position read to always start reading the file from the last position, you can define how many bytes at a time want to read, I used 4096 to exemplify. You can continue using the code of this reply to find your range of interest, convert your date and time of interest into timestamp to make it easier and remember if you have already found the data within the desired range you can exit the loop and no longer need to read the rest of the file :-)