Unexpectedly high latencies are showed by self-written script (bpf_ktime_get_ns issue?)


Aleksei Zakharov
 

Hi, everyone!
I've written a small script using bcc python lib for qemu block io latency tracing.
I attach two uprobes to block_acct_start and block_acct_done.

On some servers i can see unexpectedly high latencies:
~# ./qemuioslower 40000
Tracing for qemu I/O... Ctrl-C to end
TIME LATENCY SIZE(KB) IO TYPE
0.0 18446744073709551 8192 write
9.7294584 18446744073709551 0 flush
24.815983 18446744073709551 4096 write
25.422378 18446744073709551 0 flush

For me it looks like bpf_ktime_get_ns in stop() function returned time that is less, than bpf_ktime_get_ns() in start(), but why does this happen?
May someone help me with understanding this behavior?

BPF code:
#include <uapi/linux/ptrace.h>
BPF_HASH(req, u64);
enum BlockAcctType {
BLOCK_ACCT_READ,
BLOCK_ACCT_WRITE,
BLOCK_ACCT_FLUSH,
BLOCK_MAX_IOTYPE,
};
struct data_t {
u64 ts;
u64 lat;
u64 bytes;
enum BlockAcctType type;
};
typedef struct BlockAcctCookie {
int64_t bytes;
int64_t start_time_ns;
enum BlockAcctType type;
} BlockAcctCookie;
BPF_PERF_OUTPUT(events);

void start(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return;
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
u64 ts = bpf_ktime_get_ns();
req.update(&key, &ts);
}

void stop(struct pt_regs *ctx,void *first, struct BlockAcctCookie *cookie) {
struct data_t data = {};
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
data.ts = bpf_ktime_get_ns();
data.type = cookie->type;
data.bytes = cookie->bytes;
u64 *s_time = req.lookup(&key);
if (s_time != 0) {
data.lat = (data.ts - *s_time);
data.lat /= 1000;
if (data.lat > MIN_US) {
events.perf_submit(ctx, &data, sizeof(data));
}
req.delete(&key);
}
}

Kernel: Linux hostname 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
bcc packages versions:
bcc-tools 0.7.0-1
libbcc 0.7.0-1
libcc1-0:amd64 5.4.0-6ubuntu1~16.04.10
python-bcc 0.7.0-1


-- 
Regards,
Aleksei Zakharov


Yonghong Song
 

On Mon, Nov 26, 2018 at 4:29 AM Aleksei Zakharov <zakharov.a.g@...> wrote:

Hi, everyone!
I've written a small script using bcc python lib for qemu block io latency tracing.
I attach two uprobes to block_acct_start and block_acct_done.

On some servers i can see unexpectedly high latencies:
~# ./qemuioslower 40000
Tracing for qemu I/O... Ctrl-C to end
TIME LATENCY SIZE(KB) IO TYPE
0.0 18446744073709551 8192 write
9.7294584 18446744073709551 0 flush
24.815983 18446744073709551 4096 write
25.422378 18446744073709551 0 flush

For me it looks like bpf_ktime_get_ns in stop() function returned time that is less, than bpf_ktime_get_ns() in start(), but why does this happen?
May someone help me with understanding this behavior?
Maybe you want to check with bcc issue
https://github.com/iovisor/bcc/issues/728.
Basically there is a kernel bug related to bpf_ktime_get_ns() and it
is fixed in 4.9. Not sure about your kernel version.


BPF code:
#include <uapi/linux/ptrace.h>
BPF_HASH(req, u64);
enum BlockAcctType {
BLOCK_ACCT_READ,
BLOCK_ACCT_WRITE,
BLOCK_ACCT_FLUSH,
BLOCK_MAX_IOTYPE,
};
struct data_t {
u64 ts;
u64 lat;
u64 bytes;
enum BlockAcctType type;
};
typedef struct BlockAcctCookie {
int64_t bytes;
int64_t start_time_ns;
enum BlockAcctType type;
} BlockAcctCookie;
BPF_PERF_OUTPUT(events);

void start(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return;
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
u64 ts = bpf_ktime_get_ns();
req.update(&key, &ts);
}

void stop(struct pt_regs *ctx,void *first, struct BlockAcctCookie *cookie) {
struct data_t data = {};
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
data.ts = bpf_ktime_get_ns();
data.type = cookie->type;
data.bytes = cookie->bytes;
u64 *s_time = req.lookup(&key);
if (s_time != 0) {
data.lat = (data.ts - *s_time);
data.lat /= 1000;
if (data.lat > MIN_US) {
events.perf_submit(ctx, &data, sizeof(data));
}
req.delete(&key);
}
}

Kernel: Linux hostname 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
bcc packages versions:
bcc-tools 0.7.0-1
libbcc 0.7.0-1
libcc1-0:amd64 5.4.0-6ubuntu1~16.04.10
python-bcc 0.7.0-1


--
Regards,
Aleksei Zakharov




Aleksei Zakharov
 

29.11.2018, 07:07, "Y Song" <ys114321@...>:
On Mon, Nov 26, 2018 at 4:29 AM Aleksei Zakharov <zakharov.a.g@...> wrote:
 Hi, everyone!
 I've written a small script using bcc python lib for qemu block io latency tracing.
 I attach two uprobes to block_acct_start and block_acct_done.

 On some servers i can see unexpectedly high latencies:
 ~# ./qemuioslower 40000
 Tracing for qemu I/O... Ctrl-C to end
 TIME LATENCY SIZE(KB) IO TYPE
 0.0 18446744073709551 8192 write
 9.7294584 18446744073709551 0 flush
 24.815983 18446744073709551 4096 write
 25.422378 18446744073709551 0 flush

 For me it looks like bpf_ktime_get_ns in stop() function returned time that is less, than bpf_ktime_get_ns() in start(), but why does this happen?
 May someone help me with understanding this behavior?
Maybe you want to check with bcc issue
https://github.com/iovisor/bcc/issues/728.
Basically there is a kernel bug related to bpf_ktime_get_ns() and it
is fixed in 4.9. Not sure about your kernel version.
Thanks for your answer!
I've tried to reproduce this with perf example from issue:
perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1
But it looks normal.
Also, i'm using 4.15 kernel, where this bug should be fixed already.
I tried to run different tools from bcc-tools, and i didn't find any strange output.

Is there any special case (non-buggy behavior) where such latencies could be shown?
We use servers with 2-4 numa nodes, might this be the cause?

Also, I tried to trace latency between scsi_req_enqueue and scsi_req_complete,
and i didn't see such high latencies in this case.


 BPF code:
 #include <uapi/linux/ptrace.h>
 BPF_HASH(req, u64);
 enum BlockAcctType {
     BLOCK_ACCT_READ,
     BLOCK_ACCT_WRITE,
     BLOCK_ACCT_FLUSH,
     BLOCK_MAX_IOTYPE,
 };
 struct data_t {
     u64 ts;
     u64 lat;
     u64 bytes;
     enum BlockAcctType type;
 };
 typedef struct BlockAcctCookie {
     int64_t bytes;
     int64_t start_time_ns;
     enum BlockAcctType type;
 } BlockAcctCookie;
 BPF_PERF_OUTPUT(events);

 void start(struct pt_regs *ctx) {
     u32 pid = bpf_get_current_pid_tgid();
     if (FILTER_PID)
         return;
     u64 key = 0;
     bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
     u64 ts = bpf_ktime_get_ns();
     req.update(&key, &ts);
 }

 void stop(struct pt_regs *ctx,void *first, struct BlockAcctCookie *cookie) {
     struct data_t data = {};
     u64 key = 0;
     bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
     data.ts = bpf_ktime_get_ns();
     data.type = cookie->type;
     data.bytes = cookie->bytes;
     u64 *s_time = req.lookup(&key);
     if (s_time != 0) {
         data.lat = (data.ts - *s_time);
         data.lat /= 1000;
         if (data.lat > MIN_US) {
                 events.perf_submit(ctx, &data, sizeof(data));
         }
         req.delete(&key);
     }
 }

 Kernel: Linux hostname 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
 bcc packages versions:
 bcc-tools 0.7.0-1
 libbcc 0.7.0-1
 libcc1-0:amd64 5.4.0-6ubuntu1~16.04.10
 python-bcc 0.7.0-1

 --
 Regards,
 Aleksei Zakharov

 
-- 
Regards,
Aleksei Zakharov


Yonghong Song
 

On Mon, Dec 3, 2018 at 3:51 AM Aleksei Zakharov <zakharov.a.g@...> wrote:



29.11.2018, 07:07, "Y Song" <ys114321@...>:
On Mon, Nov 26, 2018 at 4:29 AM Aleksei Zakharov <zakharov.a.g@...> wrote:
Hi, everyone!
I've written a small script using bcc python lib for qemu block io latency tracing.
I attach two uprobes to block_acct_start and block_acct_done.

On some servers i can see unexpectedly high latencies:
~# ./qemuioslower 40000
Tracing for qemu I/O... Ctrl-C to end
TIME LATENCY SIZE(KB) IO TYPE
0.0 18446744073709551 8192 write
9.7294584 18446744073709551 0 flush
24.815983 18446744073709551 4096 write
25.422378 18446744073709551 0 flush

For me it looks like bpf_ktime_get_ns in stop() function returned time that is less, than bpf_ktime_get_ns() in start(), but why does this happen?
May someone help me with understanding this behavior?
Maybe you want to check with bcc issue
https://github.com/iovisor/bcc/issues/728.
Basically there is a kernel bug related to bpf_ktime_get_ns() and it
is fixed in 4.9. Not sure about your kernel version.
Thanks for your answer!
I've tried to reproduce this with perf example from issue:
perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1
But it looks normal.
Also, i'm using 4.15 kernel, where this bug should be fixed already.
I tried to run different tools from bcc-tools, and i didn't find any strange output.

Is there any special case (non-buggy behavior) where such latencies could be shown?
We use servers with 2-4 numa nodes, might this be the cause?
If start() and stop() execute on two different cpus, their raw time
may be backward
since two different cpus mostly like have slightly different realtime
clock numbers.
Could you this be an issue for you? This all depends on your kernel timer setup.


Also, I tried to trace latency between scsi_req_enqueue and scsi_req_complete,
and i didn't see such high latencies in this case.


BPF code:
#include <uapi/linux/ptrace.h>
BPF_HASH(req, u64);
enum BlockAcctType {
BLOCK_ACCT_READ,
BLOCK_ACCT_WRITE,
BLOCK_ACCT_FLUSH,
BLOCK_MAX_IOTYPE,
};
struct data_t {
u64 ts;
u64 lat;
u64 bytes;
enum BlockAcctType type;
};
typedef struct BlockAcctCookie {
int64_t bytes;
int64_t start_time_ns;
enum BlockAcctType type;
} BlockAcctCookie;
BPF_PERF_OUTPUT(events);

void start(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (FILTER_PID)
return;
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
u64 ts = bpf_ktime_get_ns();
req.update(&key, &ts);
}

void stop(struct pt_regs *ctx,void *first, struct BlockAcctCookie *cookie) {
struct data_t data = {};
u64 key = 0;
bpf_probe_read(&key, sizeof(key), (void *)PT_REGS_PARM1(ctx));
data.ts = bpf_ktime_get_ns();
data.type = cookie->type;
data.bytes = cookie->bytes;
u64 *s_time = req.lookup(&key);
if (s_time != 0) {
data.lat = (data.ts - *s_time);
data.lat /= 1000;
if (data.lat > MIN_US) {
events.perf_submit(ctx, &data, sizeof(data));
}
req.delete(&key);
}
}

Kernel: Linux hostname 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
bcc packages versions:
bcc-tools 0.7.0-1
libbcc 0.7.0-1
libcc1-0:amd64 5.4.0-6ubuntu1~16.04.10
python-bcc 0.7.0-1

--
Regards,
Aleksei Zakharov

--
Regards,
Aleksei Zakharov