Probabilistic Considerations on the Calculation of Shannon Entropy in a Network Traffic

Asked

Viewed 474 times

3

Probabilistic Considerations on the Calculation of Shannon Entropy in a Network Traffic

I have a dump file (CAP format) of a network traffic capture made with Debian tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP-type flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and compare them.

I’m using the Python code:

import numpy as np
import collections

sample_ips = [
    "131.084.001.031",
    "131.084.001.031",
    "131.284.001.031",
    "131.284.001.031",
    "131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)

When doing the calculation in this way, some doubts arise:

  1. I am considering a discrete probability distribution with equippable sample space. Is this reasonable? How do I justify this? I don’t know how distribution is...

2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis: "The value of entropy allows us to detect the attack" Is it coherent? What would be a good hypothesis test for the case (the sample space has size around 40)

  • When you say that the sample space has size around 40 you are saying that it has ~40 files . CAP that contains an attack at some point?

1 answer

2

1) If you can reach the same conclusion for the probability distributions in samples with different time intervals on different days your answer is yes.

2) an experiment must be prepared/thought about how it will be done, then it must be described on paper at each step of it, without skipping any stage, it must be possible to be redone by someone else who doubts the data obtained. After writing should be done, repeated times with different data, and all should reach the same conclusion.

You can read more about Metodologia Cientifica, will help you.

  • :" If you get identical probability distribution data for different time intervals on different days your answer is yes." I don’t understand

  • I’ll edit, it got wrong

Browser other questions tagged

You are not signed in. Login or sign up in order to post.