Topics

[RFC/PATCH 2/2] tools: Add sofdsnoop to spy on fds passed through socket

Jiri Olsa
 

The sofdsnoop traces FDs passed through unix sockets.

# ./sofdsnoop
ACTION TID COMM SOCKET FD NAME
SEND 2576 Web Content 24:socket:[39763] 51 /dev/shm/org.mozilla.ipc.2576.23874
RECV 2576 Web Content 49:socket:[809997] 51
SEND 2576 Web Content 24:socket:[39763] 58 N/A
RECV 2464 Gecko_IOThread 75:socket:[39753] 55

Every file descriptor that is passed via unix sockets os displayed
on separate line together with process info (TID/COMM columns),
ACTION details (SEND/RECV), file descriptor number (FD) and its
translation to file if available (NAME).

examples:
./sofdsnoop # trace file descriptors passes
./sofdsnoop -T # include timestamps
./sofdsnoop -p 181 # only trace PID 181
./sofdsnoop -t 123 # only trace TID 123
./sofdsnoop -d 10 # trace for 10 seconds only
./sofdsnoop -n main # only print process names containing "main"
---
README.md | 1 +
man/man8/spfdsnoop.8 | 85 ++++++++
snapcraft/snapcraft.yaml | 3 +
tests/python/test_tools_smoke.py | 4 +
tools/sofdsnoop.py | 355 +++++++++++++++++++++++++++++++
tools/sofdsnoop_example.txt | 69 ++++++
6 files changed, 517 insertions(+)
create mode 100644 man/man8/spfdsnoop.8
create mode 100755 tools/sofdsnoop.py
create mode 100644 tools/sofdsnoop_example.txt

diff --git a/README.md b/README.md
index 92b90e5d7c4c..54733ebb58d1 100644
--- a/README.md
+++ b/README.md
@@ -133,6 +133,7 @@ pair of .c and .py files, and some are directories of files.
- tools/[runqlen](tools/runqlen.py): Run queue length as a histogram. [Examples](tools/runqlen_example.txt).
- tools/[runqslower](tools/runqslower.py): Trace long process scheduling delays. [Examples](tools/runqslower_example.txt).
- tools/[shmsnoop](tools/shmsnoop.py): Trace System V shared memory syscalls. [Examples](tools/shmsnoop_example.txt).
+- tools/[sofdsnoop](tools/sofdsnoop.py): Trace FDs passed through unix sockets. [Examples](tools/sofdsnoop_example.txt).
- tools/[slabratetop](tools/slabratetop.py): Kernel SLAB/SLUB memory cache allocation rate top. [Examples](tools/slabratetop_example.txt).
- tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
- tools/[solisten](tools/solisten.py): Trace TCP socket listen. [Examples](tools/solisten_example.txt).
diff --git a/man/man8/spfdsnoop.8 b/man/man8/spfdsnoop.8
new file mode 100644
index 000000000000..ffad57c591c1
--- /dev/null
+++ b/man/man8/spfdsnoop.8
@@ -0,0 +1,85 @@
+.TH sofdsnoop 8 "2018-11-08" "USER COMMANDS"
+.SH NAME
+sofdsnoop \- Trace FDs passed through unix sockets. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B sofdsnoop [-h] [-T] [-p PID] [-t TID] [-n NAME] [-d DURATION]
+.SH DESCRIPTION
+sofdsnoop traces FDs passed through unix sockets
+
+Every file descriptor that is passed via unix sockets os displayed
+on separate line together with process info (TID/COMM columns),
+ACTION details (SEND/RECV), file descriptor number (FD) and its
+translation to file if available (NAME).
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-T
+Include a timestamp column.
+.TP
+\-p PID
+Trace this process ID only (filtered in-kernel).
+.TP
+\-t TID
+Trace this thread ID only (filtered in-kernel).
+.TP
+\-d DURATION
+Total duration of trace in seconds.
+.TP
+\-n NAME
+Only print command lines matching this command name (regex)
+.SH EXAMPLES
+.TP
+Trace all sockets:
+#
+.B sofdsnoop
+.TP
+Trace all sockets, and include timestamps:
+#
+.B sofdsnoop \-T
+.TP
+Only trace sockets where the process contains "server":
+#
+.B sofdsnoop \-n server
+.SH FIELDS
+.TP
+TIME(s)
+Time of SEDN/RECV actions, in seconds.
+.TP
+ACTION
+Operation on the fd SEND/RECV.
+.TP
+TID
+Process TID
+.TP
+COMM
+Parent process/command name.
+.TP
+SOCKET
+The socket carrier.
+.TP
+FD
+file descriptor number
+.TP
+NAME
+file name for SEND lines
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Jiri Olsa
+.SH SEE ALSO
+opensnoop(1)
diff --git a/snapcraft/snapcraft.yaml b/snapcraft/snapcraft.yaml
index fcff9624d72e..e408686f2553 100644
--- a/snapcraft/snapcraft.yaml
+++ b/snapcraft/snapcraft.yaml
@@ -192,6 +192,9 @@ apps:
shmsnoop:
command: wrapper shmsnoop
aliases: [shmsnoop]
+ sofdsnoop:
+ command: wrapper sofdsnoop
+ aliases: [sofdsnoop]
phpcalls:
command: wrapper phpcalls
aliases: [phpcalls]
diff --git a/tests/python/test_tools_smoke.py b/tests/python/test_tools_smoke.py
index 3aad12eaa3af..f6fca6376740 100755
--- a/tests/python/test_tools_smoke.py
+++ b/tests/python/test_tools_smoke.py
@@ -266,6 +266,10 @@ class SmokeTests(TestCase):
def test_shmsnoop(self):
self.run_with_int("shmsnoop.py")

+ @skipUnless(kernel_version_ge(4,8), "requires kernel >= 4.8")
+ def test_sofdsnoop(self):
+ self.run_with_int("sofdsnoop.py")
+
def test_slabratetop(self):
self.run_with_duration("slabratetop.py 1 1")

diff --git a/tools/sofdsnoop.py b/tools/sofdsnoop.py
new file mode 100755
index 000000000000..ae5ab9f59f94
--- /dev/null
+++ b/tools/sofdsnoop.py
@@ -0,0 +1,355 @@
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# sofdsnoop traces file descriptors passed via socket
+# For Linux, uses BCC, eBPF. Embedded C.
+#
+# USAGE: sofdsnoop
+#
+# Copyright (c) 2018 Jiri Olsa.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 30-Jul-2018 Jiri Olsa Created this.
+
+from __future__ import print_function
+from bcc import ArgString, BPF
+import os
+import argparse
+import ctypes as ct
+from datetime import datetime, timedelta
+
+# arguments
+examples = """examples:
+ ./sofdsnoop # trace file descriptors passes
+ ./sofdsnoop -T # include timestamps
+ ./sofdsnoop -p 181 # only trace PID 181
+ ./sofdsnoop -t 123 # only trace TID 123
+ ./sofdsnoop -d 10 # trace for 10 seconds only
+ ./sofdsnoop -n main # only print process names containing "main"
+
+"""
+parser = argparse.ArgumentParser(
+ description="Trace file descriptors passed via socket",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=examples)
+parser.add_argument("-T", "--timestamp", action="store_true",
+ help="include timestamp on output")
+parser.add_argument("-p", "--pid",
+ help="trace this PID only")
+parser.add_argument("-t", "--tid",
+ help="trace this TID only")
+parser.add_argument("-n", "--name",
+ type=ArgString,
+ help="only print process names containing this name")
+parser.add_argument("-d", "--duration",
+ help="total duration of trace in seconds")
+args = parser.parse_args()
+debug = 0
+
+ACTION_SEND=0
+ACTION_RECV=1
+MAX_FD=10
+
+if args.duration:
+ args.duration = timedelta(seconds=int(args.duration))
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <uapi/linux/limits.h>
+#include <linux/sched.h>
+#include <linux/socket.h>
+#include <net/sock.h>
+
+#define MAX_FD 10
+#define ACTION_SEND 0
+#define ACTION_RECV 1
+
+struct val_t {
+ u64 id;
+ u64 ts;
+ int action;
+ int sock_fd;
+ int fd_cnt;
+ int fd[MAX_FD];
+ char comm[TASK_COMM_LEN];
+};
+
+BPF_HASH(detach_ptr, u64, struct cmsghdr *);
+BPF_HASH(sock_fd, u64, int);
+BPF_PERF_OUTPUT(events);
+
+static void set_fd(int fd)
+{
+ u64 id = bpf_get_current_pid_tgid();
+
+ sock_fd.update(&id, &fd);
+}
+
+static int get_fd(void)
+{
+ u64 id = bpf_get_current_pid_tgid();
+ int *fd;
+
+ fd = sock_fd.lookup(&id);
+ return fd ? *fd : -1;
+}
+
+static void put_fd(void)
+{
+ u64 id = bpf_get_current_pid_tgid();
+
+ sock_fd.delete(&id);
+}
+
+static int sent_1(struct pt_regs *ctx, struct val_t *val, int num, void *data)
+{
+ val->fd_cnt = min(num, MAX_FD);
+
+ if (bpf_probe_read(&val->fd[0], MAX_FD * sizeof(int), data))
+ return -1;
+
+ events.perf_submit(ctx, val, sizeof(*val));
+ return 0;
+}
+
+#define SEND_1 \
+ if (sent_1(ctx, &val, num, (void *) data)) \
+ return 0; \
+ \
+ num -= MAX_FD; \
+ if (num < 0) \
+ return 0; \
+ \
+ data += MAX_FD;
+
+#define SEND_2 SEND_1 SEND_1
+#define SEND_4 SEND_2 SEND_2
+#define SEND_8 SEND_4 SEND_4
+#define SEND_260 SEND_8 SEND_8 SEND_8 SEND_2
+
+static int send(struct pt_regs *ctx, struct cmsghdr *cmsg, int action)
+{
+ struct val_t val = { 0 };
+ int *data, num, fd;
+ u64 tsp = bpf_ktime_get_ns();
+
+ data = (void *) ((char *) cmsg + sizeof(struct cmsghdr));
+ num = (cmsg->cmsg_len - sizeof(struct cmsghdr)) / sizeof(int);
+
+ val.id = bpf_get_current_pid_tgid();
+ val.action = action;
+ val.sock_fd = get_fd();
+ val.ts = tsp / 1000;
+
+ if (bpf_get_current_comm(&val.comm, sizeof(val.comm)) != 0)
+ return 0;
+
+ SEND_260
+ return 0;
+}
+
+static bool allow_pid(u64 id)
+{
+ u32 pid = id >> 32; // PID is higher part
+ u32 tid = id; // Cast and get the lower part
+
+ FILTER
+
+ return 1;
+}
+
+int trace_scm_send_entry(struct pt_regs *ctx, struct socket *sock, struct msghdr *hdr)
+{
+ struct cmsghdr *cmsg = NULL;
+
+ if (!allow_pid(bpf_get_current_pid_tgid()))
+ return 0;
+
+ if (hdr->msg_controllen >= sizeof(struct cmsghdr))
+ cmsg = hdr->msg_control;
+
+ if (!cmsg || (cmsg->cmsg_type != SCM_RIGHTS))
+ return 0;
+
+ return send(ctx, cmsg, ACTION_SEND);
+};
+
+int trace_scm_detach_fds_entry(struct pt_regs *ctx, struct msghdr *hdr)
+{
+ struct cmsghdr *cmsg = NULL;
+ u64 id = bpf_get_current_pid_tgid();
+
+ if (!allow_pid(id))
+ return 0;
+
+ if (hdr->msg_controllen >= sizeof(struct cmsghdr))
+ cmsg = hdr->msg_control;
+
+ if (!cmsg)
+ return 0;
+
+ detach_ptr.update(&id, &cmsg);
+ return 0;
+};
+
+int trace_scm_detach_fds_return(struct pt_regs *ctx)
+{
+ struct cmsghdr **cmsgp;
+ u64 id = bpf_get_current_pid_tgid();
+
+ if (!allow_pid(id))
+ return 0;
+
+ cmsgp = detach_ptr.lookup(&id);
+
+ if (!cmsgp)
+ return 0;
+
+ return send(ctx, *cmsgp, ACTION_RECV);
+}
+
+int trace_sendmsg_entry(struct pt_regs *ctx, struct pt_regs *_p)
+{
+ struct pt_regs p;
+ int fd;
+
+ if (!allow_pid(bpf_get_current_pid_tgid()))
+ return 0;
+
+ if (bpf_probe_read(&p, sizeof(p), (void *) _p))
+ return 0;
+
+ fd = (int) PT_REGS_PARM1(&p);
+
+ set_fd(fd);
+ return 0;
+}
+
+int trace_sendmsg_return(struct pt_regs *ctx)
+{
+ if (!allow_pid(bpf_get_current_pid_tgid()))
+ return 0;
+
+ put_fd();
+ return 0;
+}
+
+int trace_recvmsg_entry(struct pt_regs *ctx, struct pt_regs *_p)
+{
+ struct pt_regs p;
+ int fd;
+
+ if (!allow_pid(bpf_get_current_pid_tgid()))
+ return 0;
+
+ if (bpf_probe_read(&p, sizeof(p), (void *) _p))
+ return 0;
+
+ fd = (int) PT_REGS_PARM1(&p);
+
+ set_fd(fd);
+ return 0;
+}
+
+int trace_recvmsg_return(struct pt_regs *ctx)
+{
+ if (!allow_pid(bpf_get_current_pid_tgid()))
+ return 0;
+
+ put_fd();
+ return 0;
+}
+
+"""
+
+if args.tid: # TID trumps PID
+ bpf_text = bpf_text.replace('FILTER',
+ 'if (tid != %s) { return 0; }' % args.tid)
+elif args.pid:
+ bpf_text = bpf_text.replace('FILTER',
+ 'if (pid != %s) { return 0; }' % args.pid)
+else:
+ bpf_text = bpf_text.replace('FILTER', '')
+
+# initialize BPF
+b = BPF(text=bpf_text)
+
+syscall_fnname = b.get_syscall_fnname("sendmsg")
+if BPF.ksymname(syscall_fnname) != -1:
+ b.attach_kprobe(event=syscall_fnname, fn_name="trace_sendmsg_entry")
+ b.attach_kretprobe(event=syscall_fnname, fn_name="trace_sendmsg_return")
+
+syscall_fnname = b.get_syscall_fnname("recvmsg")
+if BPF.ksymname(syscall_fnname) != -1:
+ b.attach_kprobe(event=syscall_fnname, fn_name="trace_recvmsg_entry")
+ b.attach_kretprobe(event=syscall_fnname, fn_name="trace_recvmsg_return")
+
+b.attach_kprobe(event="__scm_send", fn_name="trace_scm_send_entry")
+b.attach_kprobe(event="scm_detach_fds", fn_name="trace_scm_detach_fds_entry")
+b.attach_kretprobe(event="scm_detach_fds", fn_name="trace_scm_detach_fds_return")
+
+TASK_COMM_LEN = 16 # linux/sched.h
+
+initial_ts = 0
+
+class Data(ct.Structure):
+ _fields_ = [
+ ("id", ct.c_ulonglong),
+ ("ts", ct.c_ulonglong),
+ ("action", ct.c_int),
+ ("sock_fd", ct.c_int),
+ ("fd_cnt", ct.c_int),
+ ("fd", ct.c_int * MAX_FD),
+ ("comm", ct.c_char * TASK_COMM_LEN),
+ ]
+
+# header
+if args.timestamp:
+ print("%-14s" % ("TIME(s)"), end="")
+print("%-6s %-6s %-16s %-25s %-5s %s" %
+ ("ACTION", "TID", "COMM", "SOCKET", "FD", "NAME"))
+
+def get_file(pid, fd):
+ proc = "/proc/%d/fd/%d" % (pid, fd)
+ try:
+ return os.readlink(proc)
+ except OSError as err:
+ return "N/A"
+
+# process event
+def print_event(cpu, data, size):
+ event = ct.cast(data, ct.POINTER(Data)).contents
+ tid = event.id & 0xffffffff;
+
+ cnt = min(MAX_FD, event.fd_cnt);
+
+ if args.name and bytes(args.name) not in event.comm:
+ return
+
+ for i in range(0, cnt):
+ global initial_ts
+
+ if not initial_ts:
+ initial_ts = event.ts
+
+ if args.timestamp:
+ delta = event.ts - initial_ts
+ print("%-14.9f" % (float(delta) / 1000000), end="")
+
+ print("%-6s %-6d %-16s " %
+ ("SEND" if event.action == ACTION_SEND else "RECV",
+ tid, event.comm.decode()), end = '')
+
+ sock = "%d:%s" % (event.sock_fd, get_file(tid, event.sock_fd))
+ print("%-25s " % sock, end = '')
+
+ fd = event.fd[i]
+ fd_file = get_file(tid, fd) if event.action == ACTION_SEND else ""
+ print("%-5d %s" % (fd, fd_file))
+
+# loop with callback to print_event
+b["events"].open_perf_buffer(print_event, page_cnt=64)
+start_time = datetime.now()
+while not args.duration or datetime.now() - start_time < args.duration:
+ b.perf_buffer_poll(timeout=1000)
diff --git a/tools/sofdsnoop_example.txt b/tools/sofdsnoop_example.txt
new file mode 100644
index 000000000000..740a26fdf4df
--- /dev/null
+++ b/tools/sofdsnoop_example.txt
@@ -0,0 +1,69 @@
+Demonstrations of sofdsnoop, the Linux eBPF/bcc version.
+
+sofdsnoop traces FDs passed through unix sockets
+
+# ./sofdsnoop.py
+ACTION TID COMM SOCKET FD NAME
+SEND 2576 Web Content 24:socket:[39763] 51 /dev/shm/org.mozilla.ipc.2576.23874
+RECV 2576 Web Content 49:socket:[809997] 51
+SEND 2576 Web Content 24:socket:[39763] 58 N/A
+RECV 2464 Gecko_IOThread 75:socket:[39753] 55
+
+Every file descriptor that is passed via unix sockets os displayed
+on separate line together with process info (TID/COMM columns),
+ACTION details (SEND/RECV), file descriptor number (FD) and its
+translation to file if available (NAME).
+
+The file descriptor (fd) value is bound to a process. The SEND
+lines display the fd value within the sending process. The RECV
+lines display the fd value of the sending process. That's why
+there's translation to name only on SEND lines, where we are
+able to find it in task proc records.
+
+This works by tracing sendmsg/recvmsg system calls to provide
+the socket fds, and scm_send_entry/scm_detach_fds to provide
+the file descriptor details.
+
+A -T option can be used to include a timestamp column,
+and a -n option to match on a command name. Regular
+expressions are allowed. For example, matching commands
+containing "server" with timestamps:
+
+# ./sofdsnoop.py -T -n Web
+TIME(s) ACTION TID COMM SOCKET FD NAME
+0.000000000 SEND 2576 Web Content 24:socket:[39763] 51 /dev/shm/org.mozilla.ipc.2576.25404 (deleted)
+0.000413000 RECV 2576 Web Content 49:/dev/shm/org.mozilla.ipc.2576.25404 (deleted) 51
+0.000558000 SEND 2576 Web Content 24:socket:[39763] 58 N/A
+0.000952000 SEND 2576 Web Content 24:socket:[39763] 58 socket:[817962]
+
+
+A -p option can be used to trace only selected process:
+
+# ./sofdsnoop.py -p 2576 -T
+TIME(s) ACTION TID COMM SOCKET FD NAME
+0.000000000 SEND 2576 Web Content 24:socket:[39763] 51 N/A
+0.000138000 RECV 2576 Web Content 49:N/A 5
+0.000191000 SEND 2576 Web Content 24:socket:[39763] 58 N/A
+0.000424000 RECV 2576 Web Content 51:/dev/shm/org.mozilla.ipc.2576.25319 (deleted) 49
+
+USAGE message:
+usage: sofdsnoop.py [-h] [-T] [-p PID] [-t TID] [-n NAME] [-d DURATION]
+
+Trace file descriptors passed via socket
+
+optional arguments:
+ -h, --help show this help message and exit
+ -T, --timestamp include timestamp on output
+ -p PID, --pid PID trace this PID only
+ -t TID, --tid TID trace this TID only
+ -n NAME, --name NAME only print process names containing this name
+ -d DURATION, --duration DURATION
+ total duration of trace in seconds
+
+examples:
+ ./sofdsnoop # trace file descriptors passes
+ ./sofdsnoop -T # include timestamps
+ ./sofdsnoop -p 181 # only trace PID 181
+ ./sofdsnoop -t 123 # only trace TID 123
+ ./sofdsnoop -d 10 # trace for 10 seconds only
+ ./sofdsnoop -n main # only print process names containing "main"
--
2.17.2