How to extract information from an HTTP header with Python?

Asked

Viewed 705 times

6

We know that in the HTTP protocol, the end of the header is indicated by "\r\n\r\n". Example:

HTTP normal

It may be that, for some reason, the customer does not send the "\r\n\r\n" to the server (could be an attack, for example):

Header Incompleto

Suppose I have a network traffic capture in PCAP format called dump.pcap. I can read it with the following Python code:

import pyshark

pkts = pyshark.FileCapture('dump.pcap')

I would like to read the dump.pcap file (above code) and after that count how many packages have the \r\n\r\n and how many packages do not have the \r\n\r\n Is it possible to accomplish this in Python? How could I do?

  • 2

    A request without the blank line is not a valid request, I believe. The server will keep the connection waiting for the rest of the headers or body of the message and end with timeout if it does not receive them. If even this data is captured in your file, you should search the first line of the message to recognize where another message starts.

  • @Anderson Carlos Woss: exactly! is a sloworis attack (denial of service). I did not understand very well how to do? would help?

  • I would try to assemble the sequence of TCP packages to get the HTTP requests complete. After that, evaluate whether you have the line breaks needed after the last HTTP Header field and the next HTTP request. It is easy to identify the beginning of requests, as they all start with an HTTP verb (GET, POST, HEAD, PATCH, ...), followed by a path (/xampp/, as in your example).

  • @Bruno Coimbra : Would you kindly show a Python code? I have no idea how to do.

  • 2

    You defined the problem well, but you didn’t show anything in terms of trying to solve, except load the file (not even read a package). My suggestion is that you continue reading the pyshark documentation and try to solve until you get to the problem that is contained in your question, and then put the code here.

1 answer

3

Good morning buddy,

Your problem briefly is that you want to count how many occurrences of this pattern you had in your messages, one way you do this check is by using REGEX.

Suppose your headers can be converted into a string and each string being placed as an item in a list, you have to go through the list looking for the occurrence of the line break. In the case with are 4 characters of new line( r n n) just check this occurrence.

import re

x = len(re.findall("[\s{4}]+", txt)) #qualquer ocorrencia dos 4 caracters
y = len(re.findall("[\s{4}]+", txt)) #ocorrencias apenas no final da string

The variable x will have the number of occurrences of the 4 characters followed at any position, and the variable y will count only at the end of the string.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.