How to get sniff mirrored packet without establishing TCP/IP connection on python socket?

My question is: Can I sniff the mirrored packets as if I am accepting python socket?

I am receiving a file on Server A by using get_file_by_socket.py :

import socket
import tqdm
import os
import hashlib
import time

SERVER_HOST = "192.168.1.1"
SERVER_PORT = 5201
counter = 1
BUFFER_SIZE = 4096
SEPARATOR = "<SEPARATOR>"
s = socket.socket()
s.bind((SERVER_HOST, SERVER_PORT))
s.listen(5)
print(f"[*] Listening as {SERVER_HOST}:{SERVER_PORT}")
client_socket, address = s.accept()
print("client_scoket = ",client_socket,address)
print(f"[+] {address} is connected")
received = client_socket.recv(BUFFER_SIZE).decode()
filename,filesize = received.split(SEPARATOR)
filename = os.path.basename(filename)
filesize = int(filesize)
file_hash = hashlib.md5()
progress = tqdm.tqdm(range(filesize), f"Receiving {filename}", unit="B",unit_scale=True, unit_divisor=1024)
with open(filename,"wb") as f:
    while True:
        bytes_read = client_socket.recv(BUFFER_SIZE)
        if not bytes_read:
            break
        f.write(bytes_read)
        file_hash.update(bytes_read)
        print(f"{counter}. Bytes_read={bytes_read}")
        #print(f"{counter}. ")
        counter = counter + 1
        time.sleep(0.001)
        progress.update(len(bytes_read))

client_socket.close()
s.close() 

I am sending the file using send_file_by_socket.py from Host B :

import socket
import tqdm
import os
import sys
SEPARATOR = "<SEPARATOR>"
BUFFER_SIZE = 4096
host = sys.argv[1]  #"192.168.1.1"
print("host=",host)
port = 5201
filename = sys.argv[2] #"twibot20.json" 
print("filename=",filename)
filesize = os.path.getsize(filename)
s = socket.socket()
#s.setsockopt(socket.SOL_SOCKET,25,'enp2s0')
print(f"[+] Connecting to {host}:{port}")
s.connect((host,port))
print("[+] Connected.")
s.send(f"{filename}{SEPARATOR}{filesize}".encode())
progress = tqdm.tqdm(range(filesize), f"Sending {filename}", unit="B", unit_scale = True, unit_divisor=1024)
with open(filename, "rb") as f:
    while True :
        bytes_read = f.read(BUFFER_SIZE)
        if not bytes_read:
            break
        s.sendall(bytes_read)
        progress.update(len(bytes_read))
s.close()

The sender sends the file and the server receives it successfully. The transfer rate around 2.3Gb/s. Now I am mirroring the packets while the transfer is happening. I sniff the packets using sniff_mirrored_packets.py :

def get_if():
    ifs=get_if_list()
    iface=None
    for i in get_if_list():
        if "enp1s0f1" in i:
            iface=i
            break;
    if not iface:
        print("Cannot find eth0 interface")
        exit(1)
    return iface


def handle_pkt(pkt):
    global file_hash
    global counter
    try :
        setir = pkt[IP].load
    except :
        setir = ""
    if "<SEPARATOR>" in str(setir):
        setir = ""
    if setir!="" :
        file_hash.update(setir)
    print("{}. Hash = {} ".format(counter,file_hash.hexdigest()))
    #pkt.show2()
    sys.stdout.flush()
    counter = counter +1


def main():
    ifaces = [i for i in os.listdir('/sys/class/net/') ]
    iface = get_if()
    print(("sniffing on %s" % iface))
    sys.stdout.flush()
    sniff(filter='tcp and port 5201',iface = iface,
          prn = lambda x: handle_pkt(x))

if __name__ == '__main__':
    main()

The problem is socket transfer rate is too high, that’s why I included :

time.sleep(0.001)

on get_file_by_socket.py on the server-side, since sniffing speed on the mirror side is too slow. When I send a 3MB file from Host B, I get around 200 out of 1000 packets in the mirror-side using the tshark. When I time.sleep(0.001) on the server-side, only then I do receive all 1000 packets on the mirror-side.

My questions are:

  1. How do I get transfer data from mirrored port without establishing TCP/IP handshake python socket? Can I get the mirrored packets the same as on get_file_by_socket.py by ignoring TCP handshake which is happening between Host B and Server A. (I implemented get_file_by_socket.py like code on the mirror-side but it stuck in handshake because the mirrored packets don’t have any handshake in it). The sniffing method that I am using is too slow in comparison with socket transfer rate.
  2. What other methods can be used to catch up with the socket transfer rate?

Hi @nagmat,

You should consider that this is a P4-related forum and we might not have the time or knowledge to attend questions that are not specific to P4 or topics related to P4 (maybe SDN). I always try to answer if I think I can help. But consider that next time, other people might not be able to answer for the reason mentioned before :slight_smile:. Still, I think this question could also be related to telemetry (INT) to some extent so let me try to answer you considering I might not have the best answer for them:

You could limit the mirrored packets to PSH or URG instead of SYN, ACK, FIN or RST.

If you mirror packets without changing destination MAC, IP, checksums… then your TCP server will never be able to process them because those packets are not meant for it (you can use wireshark while you use the method you mentioned in order to see which problem you server/PC is encountering). I also do not recommend changing those parameters. You should be able to filter and sniff only the packets you need. Sniffing all packets is not productive and optimal in pretty much any case. If you sniff X packets/s at your max sniffing capacity (say 10Gb/s), then imagine sniffing packets for a second host at the same time.

If you want to be fast do not use Scapy. I woudl only use spcay for single packet debugging, rather than many of them.

  • I cannot remeber if the Kernel discards packets that have another host’s MAC and IP address, but if you can see the packets in Wirehsark… then I would recommend to first capture all traffic in a pcap file for later debugging. Try to filter by protocol (TCP), source IP and and dst port for incoming packets. You can use something like sudo tcpdump tcp and src 1.2.3.4 and dst port 12345 -i eth0 -w capture.pcap. I have not tested it but should be something like that command. I think that tcpdump is set to promiscuous mode by default, so a command like the one I wrote might work.

  • Nowadays, I believe you should be able to sniff packets sent to other hosts and capture the traffic/payload of the data exchange but I have never programmed the server to do so. And I am not aware if you need any changes in the OS at a lower level, since buffering, IP and TCP processing are part of the Kernel. I know, however, about promiscuous mode when sniffing packets in Wifi. So you should have an option like this when programming raw sockets. See the next link for more information: https://levelup.gitconnected.com/write-a-linux-packet-sniffer-from-scratch-with-raw-socket-and-bpf-c53734b51850.

  • Alternatively, you could encapsulate the relevant packets in UDP. Let’s say… packets that contain relevant information for you, I assume PSH or URG. Maybe not SYN, ACK, RST or FIN packets. You could try to establish a lower MTU in the network so that TCP packets could be encapsulated into UDP packets and (maybe) be able to program a UDP server that can cope with the aforementioned data rate (or be close to), the one you achieved in your first tests. Do not use Scapy to sniff packets unless you do it with a minimal rate or to visualize the information of very few packets. Consider you are providing your applciation layer with raw packets that are slowly dissasembled and printed.

I hope I was not wrong in anything I wrote but if anyone spots a better way to capture packets or if I wrote anything wrong please write an answer.

Cheers,

1 Like

Hi @ederollora

Thanks for the response! In the beginning, I couldn’t capture all the packets mirrored since some packets were dropped in the kernel. I could sniff it only after increasing the kernel buffer size by:

sudo tcpdump -s 65535 -v -i enp1s0f1 -nn -B 424096 -w dtn4.pcap

After capturing all the packets I am post-processing to build the final version of the text file. I am having issues while post-processing the Pcap file. I am trying to reassemble the final file after processing the TCP packets but have some issues. Have you encountered a reassemble for TCP packets that are saved in the pcap file?

Kind regards,

Hi,

You are probably looking for something like tcpflow. But I found a few examples in the internet,

Cheers,

script: https://stackoverflow.com/a/38655207/2317111
tcpflow example: https://serverfault.com/a/1091470

1 Like

Hi @ederollora

Thanks for sharing the links.
When I execute below command to get 1Gb file,

tshark -nlr dtn4_stream.pcap -qz "follow,tcp,raw,0" | xxd -r -p > testout.txt 

It takes around 14 seconds, which is quite long for me. Are there any methods to decrease the post-processing times.

Other than that, I execute the below command which takes less than 2 seconds and gives me exact stream which need to be processed

tshark -r dtn4.pcap -Y "tcp.stream eq 0" -w dtn4_stream.pcap 

After that I am constructing new file from newly formed stream using the below code :

import sys
import struct
import os

from scapy.all import *
import hashlib

filename='dtn4_stream.pcap'
cur_seq = 0
started = False
SEPARATOR = "<SEPARATOR>"


def handle_pkt(pkt):
    global file_hash
    global f
    #global loop_counter
    global cur_seq
    global started

    try :
        setir = pkt[IP].load
    except :
        setir = ""
    if "<SEPARATOR>" in str(setir) and not started :
        started = True
        setir2 = setir.decode()
        filename,filesize = setir2.split(SEPARATOR)
        f = open(filename,'wb')
        print("Filename = {} , filesize = {} ".format(filename,filesize))
        setir = ""
    if started and "<SEPARATOR>" in str(setir):
        print("Finishing. ")
        f.write(setir[:-16])
        f.close()
        return
    seq_num = pkt[IP].seq
    if setir != "" and seq_num >= cur_seq:
        cur_seq = pkt[IP].seq  +  pkt[IP].len - 52
        f.write(setir)
    sys.stdout.flush()


print("Starting ")
for packet in PcapReader(filename):
    handle_pkt(packet)

But again, this process takes around 20 seconds. My question is how can I optimize it to be around 6 seconds?

Kind regards,
Nagmat

Hi,

This is definitely not a process I am totally familiar with so I cannot help you as much as you probably need. I would try to identify where the problem is and maybe we can try to help you. Please try to respon to these questions:

  • Which is the exact part of the code that is taking most of the time?
  • Can you actually use another library to iterate over a pcap in a faster way?
  • Is the pcap file too big or too many packets?
  • Can you filter the pcap before iterating over it?
  • Are there other popular methods to build the packet payload from a pcap that other people use? (I would start in stackoverflow)
  • Are there other languages that show a better performance?

*Please consider this is far, far away from P4, SDN or any similar topic and that this question might make more sense in another public forum. I still try to help though but try not be very optimistic though :slight_smile:

Cheers,

what is it that you’re trying to do? To get the sequence numbers of tcp stream 0, it’d be tshark -r dtn4.pcap -Y "tcp.stream eq 0" -Tfields -e tcp.seq
Use grep, awk and tshark -G fields|grep -E "tcp\." to get the info you want.

1 Like

Thanks for quick response,

I will try to provide information regarding your questions :

  1. Which is the exact part of the code that is taking most of the time?

In shell code : tshark -nlr dtn4.pcap -qz “follow,tcp,raw,0” | xxd -r -p > testout.txt
“xxd -r -p” part is taking most of time (Around 11 seconds)

  1. Can you actually use another library to iterate over a pcap in a faster way?
    That is the thing I am looking for.

  2. Is the pcap file too big or too many packets?
    For 1G file there are 90k packets in it.

  3. Can you filter the pcap before iterating over it?
    I am getting the exact stream from the sniffed pcap file before starting iterating.
    tshark -r dtn4.pcap -Y “tcp.stream eq 0” -w dtn4_stream.pcap

Kind regards,

I think I understand your issue.

To be honest, I believe that the problem is that 1Gb file might be too big for your performance expectations.

  • Are all the 90K packets from the same TCP flow? Do you observe that packets from TCP stream 0 use the same 4 tuple? (same source/dst IPs and source/dst TCP ports). I guess not, else the exchange of information could be too long for the fast processing time you are willing to achieve. The more packets you process, the worse xxd performs. Not because there are too many packets but because your are probably extracting too many payload bytes and piping them (this is the issue) to xxd. Consider that you are still processing all the information at the application layer (it is not the Kernel dealing with it) so everything will take a longer time I guess. xxd is probably choking with that much information to push out to a file.

  • Out of curiosity, how big is the testout.txt file?

Think about sending an image file via TCP. If you can properly filter flows, I bet it would take no time to actually parse the packets.

Without having a proper understanding of the commands you use, make the following test if not all 90K packets are relevant for your use case and TCP stream 0:

  1. Open the pcap with Wireshark and remove all but exactly the packets of the stream you are interested about.
  2. Now process only the packets in that file and measure the performance.

After that:

  1. Now take the same original 90K packet pcap and double the packets.
  2. Test it again and check if the performance is gradually worse (double packets, maybe double time in terms of xxd processing). I guess that the more info you get from a pcap and pipe it to xxd, the worse performance you will get.

Considering the information you provided, I am pretty sure you will achieve better results with not as big pcap files to process. I can be wrong, so it is good you can tell your experience. You might be able to programmatically chunk pcaps so that xxd can perform faster. If there is too much information to redirect to a file, it will always take longer time.

Cheers,

1 Like