How to calculate Shannon entropy based on HTTP header

Asked

Viewed 597 times

9

Shannon’s entropy is given by the formula:

Shannon

Where Ti will be the data extracted from my network dump (dump.pcap).

The end of an HTTP header on a normal connection is marked by \r\n\r\n: header HTTP completo

Example of an incomplete HTTP header (could be a denial of service attack):

inserir a descrição da imagem aqui

My goal is to calculate the entropy of the number of packets with \r\n\r\n and without \r\n\r\n in order to compare them.

I can read the PCAP file like this:

import pyshark

pkts = pyshark.FileCapture('dump.pcap')

The entropy based on the IP numbers I made:

import numpy as np
import collections

sample_ips = [
    "131.084.001.031",
    "131.084.001.031",
    "131.284.001.031",
    "131.284.001.031",
    "131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)

Any idea? It is possible/makes sense to calculate entropy based on the number of packages with \r\n\r\n and without \r\n\r\n? Or is it something that doesn’t make sense? Any idea how to calculate?

The network dump is here: https://ufile.io/y5c7k

Some lines from it:

dump com filtro HTTP

30  2017/246 11:20:00.304515    192.168.1.18    192.168.1.216   HTTP    339 GET / HTTP/1.1 


GET / HTTP/1.1
Host: 192.168.1.216
accept-language: en-US,en;q=0.5
accept-encoding: gzip, deflate
accept: */*
user-agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0
Connection: keep-alive
content-type: application/x-www-form-urlencoded; charset=UTF-8

1 answer

4

I don’t know what the structure of your package returned by pyshark looks like, but I imagine it has 2 information, the IP address and the contents of the package. Imagining that you have these 2 information in a Dict, you could do something like:

pkgs = [
    {
        'ip': '127.0.0.1',
        'content': 'Im a http header\r\n\r\n<html><body>',
    },
    {
        'ip': '127.0.0.1',
        'content': 'Im a not a http header',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a http header\r\n\r\n<html><body>',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a not a http header',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a not a http header too',
    }
]

def is_http(content):
    return '\r\n\r\n' in content

classified_pkgs = [(p['ip'], is_http(p['content'])) for p in pkgs]
>> [('127.0.0.1', True),
>> ('127.0.0.1', False),
>> ('127.0.0.2', True),
>> ('127.0.0.2', False),
>> ('127.0.0.2', False)]

Then simply calculate the probabilities as you calculated before:

import numpy as np
import collections

counter = collections.Counter(classified_pkgs)
counts  = np.array(list(counter.values()),dtype=float)

prob = counts/counts.sum()
shannon_entropy = (-prob * np.log2(prob)).sum()
print (shannon_entropy)
  • I made the network dump file available here: https://ufile.io/y5c7k

  • I didn’t understand how to get this information from the package from the dump...

  • @Eds your link has 53MB, glue some lines in your question...

  • @Magichat : what do you want me to extract? I stick to the question!

  • 1

    So put an excerpt from your file...

  • @Magichat: edited. Is this good? Thanks! I used wireshark http filter!

  • 1

    To be honest it’s not good, you’re putting the image puts a few lines of the file code that you’re capturing some data...

  • @Magichat: Improved? I pasted a line of the image. Need more?

  • @Magichat: You could help?

  • @Eds to be honest I didn’t understand your doubt... https://chat.stackexchange.com/rooms/11910/pilooverflow. sometimes there is easier chat to understand

  • @Magichat: I don’t know how to check the string " r n r n" to create the counter: com_string and sem_string

  • @Eds but that’s not exactly what this answer answers?

  • @Magichat: I read the PCAP but didn’t understand how to check the string "in practice"!

  • @Eds I believe you have to make one parse in your pcap through a regex... I won’t be able to help you, now pq num is so simple, but I think if you search on regex if you can....

Show 9 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.