29.11.2018, 07:07, "Y Song" <ys114321@...>:
On Mon, Nov 26, 2018 at 4:29 AM Aleksei Zakharov <zakharov.a.g@...> wrote:
Hi, everyone!
I've written a small script using bcc python lib for qemu block io latency tracing.
I attach two uprobes to block_acct_start and block_acct_done.
On some servers i can see unexpectedly high latencies:
~# ./qemuioslower 40000
Tracing for qemu I/O... Ctrl-C to end
TIME LATENCY SIZE(KB) IO TYPE
0.0 18446744073709551 8192 write
9.7294584 18446744073709551 0 flush
24.815983 18446744073709551 4096 write
25.422378 18446744073709551 0 flush
For me it looks like bpf_ktime_get_ns in stop() function returned time that is less, than bpf_ktime_get_ns() in start(), but why does this happen?
May someone help me with understanding this behavior?
Maybe you want to check with bcc issue
https://github.com/iovisor/bcc/issues/728.
Basically there is a kernel bug related to bpf_ktime_get_ns() and it
is fixed in 4.9. Not sure about your kernel version.
Thanks for your answer!
I've tried to reproduce this with perf example from issue:
perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1
But it looks normal.
Also, i'm using 4.15 kernel, where this bug should be fixed already.
I tried to run different tools from bcc-tools, and i didn't find any strange output.
Is there any special case (non-buggy behavior) where such latencies could be shown?
We use servers with 2-4 numa nodes, might this be the cause?
Also, I tried to trace latency between scsi_req_enqueue and scsi_req_complete,
and i didn't see such high latencies in this case.
BPF code:
#include <uapi/linux/ptrace.h>
BPF_HASH(req, u64);
enum BlockAcctType {
BLOCK_ACCT_READ,
BLOCK_ACCT_WRITE,
BLOCK_ACCT_FLUSH,
BLOCK_MAX_IOTYPE,
};
struct data_t {
u64 ts;
u64 lat;
u64 bytes;
enum BlockAcctType type;
};
typedef struct BlockAcctCookie {
int64_t bytes;
int64_t start_time_ns;
enum BlockAcctType type;
} BlockAcctCookie;
BPF_PERF_OUTPUT(events);
void start(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return;
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
u64 ts = bpf_ktime_get_ns();
req.update(&key, &ts);
}
void stop(struct pt_regs *ctx,void *first, struct BlockAcctCookie *cookie) {
struct data_t data = {};
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
data.ts = bpf_ktime_get_ns();
data.type = cookie->type;
data.bytes = cookie->bytes;
u64 *s_time = req.lookup(&key);
if (s_time != 0) {
data.lat = (data.ts - *s_time);
data.lat /= 1000;
if (data.lat > MIN_US) {
events.perf_submit(ctx, &data, sizeof(data));
}
req.delete(&key);
}
}
Kernel: Linux hostname 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
bcc packages versions:
bcc-tools 0.7.0-1
libbcc 0.7.0-1
libcc1-0:amd64 5.4.0-6ubuntu1~16.04.10
python-bcc 0.7.0-1
--
Regards,
Aleksei Zakharov
--
Regards,
Aleksei Zakharov