Calculation of Shannon entropy in network traffic (saved in CAP file) using Python

Asked

Viewed 562 times

2

I have a dump file (CAP format) of a network traffic capture made with Ubuntu’s tcp dump. Until a certain time, it is a traffic free of attacks. Then, begin a series of attacks of type TCP SYN flooding. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and compare them.

Does anyone know of a Python library that calculates the entropy of Shannon of a network traffic?

I found the following code, what do you think?

import numpy as np
import collections

sample_ips = [
    "131.084.001.031",
    "131.084.001.031",
    "131.284.001.031",
    "131.284.001.031",
    "131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)

Imagine I had these Ips only in traffic collected at a certain time.

I would take several trades on different days to see how entropy behaves, thus having several different entropy. What would be the best way to plot a graph using Python to check entropy behavior?

  • Why don’t you scroll through that file (since you already have it) with python and process what you need to calculate from there?

  • @Miguel, my problem is to have no idea how to implement the calculation of Shannon entropy. Is there something ready in Python?

  • 1

    See this http://pythonfiddle.com/shannon-entropy-calculation/

1 answer

3


Um I don’t know any lib for what you need, I use entropy calculations for Audio, to help define how different(random, disorganized) an audio frame in the spectrum is, makes sense what you want to do, depending on the entropy calculation returned you can define whether an attack exists yes, the more organized, the less random the TCP-DUMP traffic is, the greater the chances of an attack having occurred. The code shown seems to be correct with the equation I use for entropy:

inserir a descrição da imagem aqui

Where Ti are the data of your TCP-DUMP, in your case you seem to be only picking up the occurrence of IP’s in a certain time interval, before calculating the entropy you need to normalize the input data, again it seems that this step is OK, their data were normalized in the following row prob = counts/counts.sum()

On the Plot the most obvious way is to store each entropy and its given collection day to then make a simple Plot using matplotlib.pyplot, would be something like plot(dia,entropia), perhaps by observations you can define a Threshold to then automatically classify which days had attack, remember the higher the value of entropy the higher the chances of an attack having happened (usually the closer to 1, the less random their values are), perhaps it is more interesting to go forward instead of analyzing every day to do an hourly analysis :-)

  • could then calculate the entropy of traffic without attacks every 5 min and make a chart. Idem for attacks and compare. What do you think? Thanks for the reply, @ederwander

  • of course it is at your discretion, you just need to have a window with enough time to have the input data, depending on the volume of traffic occurring 5 minutes is sufficient, after the observations you could create a system to alert the network admin in a timely manner, so he could take action against the attack ...

  • would be charts of entropy x time, right?

  • yes, entropy charts for each calculated time interval ...

  • thank you so much again!

  • what type of model/statistical formula I can use to "guarantee" the validity of experiments?

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.