3
Probabilistic Considerations on the Calculation of Shannon Entropy in a Network Traffic
I have a dump file (CAP format) of a network traffic capture made with Debian tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP-type flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and compare them.
I’m using the Python code:
import numpy as np
import collections
sample_ips = [
"131.084.001.031",
"131.084.001.031",
"131.284.001.031",
"131.284.001.031",
"131.284.001.000",
]
C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts = np.array(C.values(),dtype=float)
prob = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)
When doing the calculation in this way, some doubts arise:
- I am considering a discrete probability distribution with equippable sample space. Is this reasonable? How do I justify this? I don’t know how distribution is...
2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis: "The value of entropy allows us to detect the attack" Is it coherent? What would be a good hypothesis test for the case (the sample space has size around 40)
When you say that the sample space has size around 40 you are saying that it has ~40 files . CAP that contains an attack at some point?
– klaus