File descriptor leakage in BCC upon multiple load_func and unload_func calls


Mauricio Vasquez
 

Dear All,

One of our applications is giving errors because there are many file descriptors open. After some debugging we found that bcc appears to be leaking file descriptors.

Please considerer the following example where a very simple ebpf program is compiled and then it is loaded and unloaded some times:

$ cat <<EOF >test_fd_leakage.cpp

#include <bcc/BPF.h>
#include <iostream>

const std::string BPF_PROGRAM0 = R"(
//int handle_rx(struct xdp_md *ctx) {
//  return XDP_PASS;
//}
int handle_rx(struct __sk_buff *skb) {
  return 0;
}
)";

int main(void) {
  ebpf::StatusTuple res(0);
  ebpf::BPF bpf;
  int fd;

  std::cout << "PID=" << getpid() << std::endl;

  res = bpf.init(BPF_PROGRAM0);
  if (res.code() != 0) {
    std::cout << "failed to init XDP program: {0}" << res.msg() << std::endl;
    return -1;
  }

  for (int i = 0; i < 10; i++) {
    //res = bpf.load_func("handle_rx", BPF_PROG_TYPE_XDP, fd);
    res = bpf.load_func("handle_rx", BPF_PROG_TYPE_SCHED_CLS, fd);
    if (res.code() != 0) {
      std::cout << "failed to load XDP program: {0}" << res.msg() << std::endl;
      return -1;
    }

    res = bpf.unload_func("handle_rx");
    if (res.code() != 0) {
      std::cout << "failed to unload XDP program: {0}" << res.msg() << std::endl;
      return -1;
    }
  }

  std::cout << "about to close " << std::endl;
  getchar();

  return 0;
}

EOF

$ g++ -o test_fd_leakage test_fd_leakage.cpp --std=c++11 -lbcc

$ sudo ./test_fd_leakage


Before closing the program:

$ sudo ls -la /proc/$PID/fd
total 0
dr-x------ 2 root root  0 Mar 13 23:23 .
dr-xr-xr-x 9 root root  0 Mar 13 23:23 ..
lrwx------ 1 root root 64 Mar 13 23:23 0 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 1 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 10 -> socket:[1728287]
lrwx------ 1 root root 64 Mar 13 23:23 11 -> socket:[1728288]
lrwx------ 1 root root 64 Mar 13 23:23 12 -> socket:[1728289]
lrwx------ 1 root root 64 Mar 13 23:23 13 -> socket:[1728290]
lrwx------ 1 root root 64 Mar 13 23:23 14 -> socket:[1728291]
lrwx------ 1 root root 64 Mar 13 23:23 15 -> socket:[1728292]
lrwx------ 1 root root 64 Mar 13 23:23 16 -> socket:[1728293]
lrwx------ 1 root root 64 Mar 13 23:23 17 -> socket:[1728294]
lrwx------ 1 root root 64 Mar 13 23:23 18 -> socket:[1728295]
lrwx------ 1 root root 64 Mar 13 23:23 19 -> socket:[1728296]
lrwx------ 1 root root 64 Mar 13 23:23 2 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 20 -> socket:[1728297]
lrwx------ 1 root root 64 Mar 13 23:23 21 -> socket:[1728298]
lrwx------ 1 root root 64 Mar 13 23:23 22 -> socket:[1728299]
lrwx------ 1 root root 64 Mar 13 23:23 23 -> socket:[1728300]
lrwx------ 1 root root 64 Mar 13 23:23 4 -> socket:[1728279]
lrwx------ 1 root root 64 Mar 13 23:23 5 -> socket:[1728280]
lrwx------ 1 root root 64 Mar 13 23:23 6 -> socket:[1728283]
lrwx------ 1 root root 64 Mar 13 23:23 7 -> socket:[1728284]
lrwx------ 1 root root 64 Mar 13 23:23 8 -> socket:[1728285]
lrwx------ 1 root root 64 Mar 13 23:23 9 -> socket:[1728286]

The number of "socket:[xxxxxxx]" file descriptors is twice the number of iterations of the for loop in the test program.

I am using kernel 4.16rc4 and bcc 7cfb73b.

Does somebody have an idea of what is causing this?


Thanks in advance,

Mauricio


Teng Qin
 

Confirmed that I could repro.

I kprobed into __alloc_fd, it seems the extra socket FDs are allocated
in annotate_prog_tag:

1209030 1209030 test_leak __alloc_fd 6
kretprobe_trampoline+0x0 [kernel]
SyS_socket+0x6f [kernel]
do_syscall_64+0x53 [kernel]
return_from_SYSCALL_64+0x0 [kernel]
__GI___socket+0x7 [libc-2.17.so]
ebpf::BPFModule::annotate_prog_tag(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, int, bpf_insn*,
int)+0x2a [test_leak]
[unknown]

1209030 1209030 test_leak __alloc_fd 7
kretprobe_trampoline+0x0 [kernel]
SYSC_accept4+0xca [kernel]
sys_accept+0x10 [kernel]
do_syscall_64+0x53 [kernel]
return_from_SYSCALL_64+0x0 [kernel]
__accept_nocancel+0x7 [libpthread-2.17.so]
ebpf::BPFModule::annotate_prog_tag(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, int, bpf_insn*,
int)+0x2a [test_leak]
[unknown]

If no one has any idea directly off their mind, I will try debug this tomorrow:)

On Tue, Mar 13, 2018 at 9:38 PM, Mauricio Vasquez via iovisor-dev
<iovisor-dev@...> wrote:
Dear All,

One of our applications is giving errors because there are many file
descriptors open. After some debugging we found that bcc appears to be
leaking file descriptors.

Please considerer the following example where a very simple ebpf program is
compiled and then it is loaded and unloaded some times:

$ cat <<EOF >test_fd_leakage.cpp

#include <bcc/BPF.h>
#include <iostream>

const std::string BPF_PROGRAM0 = R"(
//int handle_rx(struct xdp_md *ctx) {
// return XDP_PASS;
//}
int handle_rx(struct __sk_buff *skb) {
return 0;
}
)";

int main(void) {
ebpf::StatusTuple res(0);
ebpf::BPF bpf;
int fd;

std::cout << "PID=" << getpid() << std::endl;

res = bpf.init(BPF_PROGRAM0);
if (res.code() != 0) {
std::cout << "failed to init XDP program: {0}" << res.msg() <<
std::endl;
return -1;
}

for (int i = 0; i < 10; i++) {
//res = bpf.load_func("handle_rx", BPF_PROG_TYPE_XDP, fd);
res = bpf.load_func("handle_rx", BPF_PROG_TYPE_SCHED_CLS, fd);
if (res.code() != 0) {
std::cout << "failed to load XDP program: {0}" << res.msg() <<
std::endl;
return -1;
}

res = bpf.unload_func("handle_rx");
if (res.code() != 0) {
std::cout << "failed to unload XDP program: {0}" << res.msg() <<
std::endl;
return -1;
}
}

std::cout << "about to close " << std::endl;
getchar();

return 0;
}

EOF

$ g++ -o test_fd_leakage test_fd_leakage.cpp --std=c++11 -lbcc

$ sudo ./test_fd_leakage


Before closing the program:

$ sudo ls -la /proc/$PID/fd
total 0
dr-x------ 2 root root 0 Mar 13 23:23 .
dr-xr-xr-x 9 root root 0 Mar 13 23:23 ..
lrwx------ 1 root root 64 Mar 13 23:23 0 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 1 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 10 -> socket:[1728287]
lrwx------ 1 root root 64 Mar 13 23:23 11 -> socket:[1728288]
lrwx------ 1 root root 64 Mar 13 23:23 12 -> socket:[1728289]
lrwx------ 1 root root 64 Mar 13 23:23 13 -> socket:[1728290]
lrwx------ 1 root root 64 Mar 13 23:23 14 -> socket:[1728291]
lrwx------ 1 root root 64 Mar 13 23:23 15 -> socket:[1728292]
lrwx------ 1 root root 64 Mar 13 23:23 16 -> socket:[1728293]
lrwx------ 1 root root 64 Mar 13 23:23 17 -> socket:[1728294]
lrwx------ 1 root root 64 Mar 13 23:23 18 -> socket:[1728295]
lrwx------ 1 root root 64 Mar 13 23:23 19 -> socket:[1728296]
lrwx------ 1 root root 64 Mar 13 23:23 2 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 20 -> socket:[1728297]
lrwx------ 1 root root 64 Mar 13 23:23 21 -> socket:[1728298]
lrwx------ 1 root root 64 Mar 13 23:23 22 -> socket:[1728299]
lrwx------ 1 root root 64 Mar 13 23:23 23 -> socket:[1728300]
lrwx------ 1 root root 64 Mar 13 23:23 4 -> socket:[1728279]
lrwx------ 1 root root 64 Mar 13 23:23 5 -> socket:[1728280]
lrwx------ 1 root root 64 Mar 13 23:23 6 -> socket:[1728283]
lrwx------ 1 root root 64 Mar 13 23:23 7 -> socket:[1728284]
lrwx------ 1 root root 64 Mar 13 23:23 8 -> socket:[1728285]
lrwx------ 1 root root 64 Mar 13 23:23 9 -> socket:[1728286]

The number of "socket:[xxxxxxx]" file descriptors is twice the number of
iterations of the for loop in the test program.

I am using kernel 4.16rc4 and bcc 7cfb73b.

Does somebody have an idea of what is causing this?


Thanks in advance,

Mauricio

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Nirmoy Das
 

Looks like fd leakage is happening in bpf_prog_compute_tag

please try with

diff --git a/src/cc/libbpf.c b/src/cc/libbpf.c
index d23d2bc..7b695af 100644
--- a/src/cc/libbpf.c
+++ b/src/cc/libbpf.c
@@ -408,6 +408,8 @@ int bpf_prog_compute_tag(const struct bpf_insn
*insns, int prog_len,
return -1;
}
*ptag = __builtin_bswap64(u.tag);
+ close(shafd);
+ close(shafd2);
return 0;
}



On 03/14/2018 05:38 AM, Mauricio Vasquez via iovisor-dev wrote:
Dear All,

One of our applications is giving errors because there are many file
descriptors open. After some debugging we found that bcc appears to be
leaking file descriptors.

Please considerer the following example where a very simple ebpf program
is compiled and then it is loaded and unloaded some times:

$ cat <<EOF >test_fd_leakage.cpp

#include <bcc/BPF.h>
#include <iostream>

const std::string BPF_PROGRAM0 = R"(
//int handle_rx(struct xdp_md *ctx) {
//  return XDP_PASS;
//}
int handle_rx(struct __sk_buff *skb) {
  return 0;
}
)";

int main(void) {
  ebpf::StatusTuple res(0);
  ebpf::BPF bpf;
  int fd;

  std::cout << "PID=" << getpid() << std::endl;

  res = bpf.init(BPF_PROGRAM0);
  if (res.code() != 0) {
    std::cout << "failed to init XDP program: {0}" << res.msg() <<
std::endl;
    return -1;
  }

  for (int i = 0; i < 10; i++) {
    //res = bpf.load_func("handle_rx", BPF_PROG_TYPE_XDP, fd);
    res = bpf.load_func("handle_rx", BPF_PROG_TYPE_SCHED_CLS, fd);
    if (res.code() != 0) {
      std::cout << "failed to load XDP program: {0}" << res.msg() <<
std::endl;
      return -1;
    }

    res = bpf.unload_func("handle_rx");
    if (res.code() != 0) {
      std::cout << "failed to unload XDP program: {0}" << res.msg() <<
std::endl;
      return -1;
    }
  }

  std::cout << "about to close " << std::endl;
  getchar();

  return 0;
}

EOF

$ g++ -o test_fd_leakage test_fd_leakage.cpp --std=c++11 -lbcc

$ sudo ./test_fd_leakage


Before closing the program:

$ sudo ls -la /proc/$PID/fd
total 0
dr-x------ 2 root root  0 Mar 13 23:23 .
dr-xr-xr-x 9 root root  0 Mar 13 23:23 ..
lrwx------ 1 root root 64 Mar 13 23:23 0 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 1 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 10 -> socket:[1728287]
lrwx------ 1 root root 64 Mar 13 23:23 11 -> socket:[1728288]
lrwx------ 1 root root 64 Mar 13 23:23 12 -> socket:[1728289]
lrwx------ 1 root root 64 Mar 13 23:23 13 -> socket:[1728290]
lrwx------ 1 root root 64 Mar 13 23:23 14 -> socket:[1728291]
lrwx------ 1 root root 64 Mar 13 23:23 15 -> socket:[1728292]
lrwx------ 1 root root 64 Mar 13 23:23 16 -> socket:[1728293]
lrwx------ 1 root root 64 Mar 13 23:23 17 -> socket:[1728294]
lrwx------ 1 root root 64 Mar 13 23:23 18 -> socket:[1728295]
lrwx------ 1 root root 64 Mar 13 23:23 19 -> socket:[1728296]
lrwx------ 1 root root 64 Mar 13 23:23 2 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 20 -> socket:[1728297]
lrwx------ 1 root root 64 Mar 13 23:23 21 -> socket:[1728298]
lrwx------ 1 root root 64 Mar 13 23:23 22 -> socket:[1728299]
lrwx------ 1 root root 64 Mar 13 23:23 23 -> socket:[1728300]
lrwx------ 1 root root 64 Mar 13 23:23 4 -> socket:[1728279]
lrwx------ 1 root root 64 Mar 13 23:23 5 -> socket:[1728280]
lrwx------ 1 root root 64 Mar 13 23:23 6 -> socket:[1728283]
lrwx------ 1 root root 64 Mar 13 23:23 7 -> socket:[1728284]
lrwx------ 1 root root 64 Mar 13 23:23 8 -> socket:[1728285]
lrwx------ 1 root root 64 Mar 13 23:23 9 -> socket:[1728286]

The number of "socket:[xxxxxxx]" file descriptors is twice the number of
iterations of the for loop in the test program.

I am using kernel 4.16rc4 and bcc 7cfb73b.

Does somebody have an idea of what is causing this?


Thanks in advance,

Mauricio

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Regards,
Nirmoy
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB
21284 (AG Nürnberg) Maxfeldstr. 5
D-90409 Nürnberg / Phone: +49-911-740 18-4


Mauricio Vasquez
 

This solves the issue we have in our application.

Thanks to both of you!

Mauricio.

On 03/14/2018 04:19 AM, Nirmoy Das via iovisor-dev wrote:
Looks like fd leakage is happening in bpf_prog_compute_tag

please try with

diff --git a/src/cc/libbpf.c b/src/cc/libbpf.c
index d23d2bc..7b695af 100644
--- a/src/cc/libbpf.c
+++ b/src/cc/libbpf.c
@@ -408,6 +408,8 @@ int bpf_prog_compute_tag(const struct bpf_insn
*insns, int prog_len,
return -1;
}
*ptag = __builtin_bswap64(u.tag);
+ close(shafd);
+ close(shafd2);
return 0;
}



On 03/14/2018 05:38 AM, Mauricio Vasquez via iovisor-dev wrote:
Dear All,

One of our applications is giving errors because there are many file
descriptors open. After some debugging we found that bcc appears to be
leaking file descriptors.

Please considerer the following example where a very simple ebpf program
is compiled and then it is loaded and unloaded some times:

$ cat <<EOF >test_fd_leakage.cpp

#include <bcc/BPF.h>
#include <iostream>

const std::string BPF_PROGRAM0 = R"(
//int handle_rx(struct xdp_md *ctx) {
//  return XDP_PASS;
//}
int handle_rx(struct __sk_buff *skb) {
  return 0;
}
)";

int main(void) {
  ebpf::StatusTuple res(0);
  ebpf::BPF bpf;
  int fd;

  std::cout << "PID=" << getpid() << std::endl;

  res = bpf.init(BPF_PROGRAM0);
  if (res.code() != 0) {
    std::cout << "failed to init XDP program: {0}" << res.msg() <<
std::endl;
    return -1;
  }

  for (int i = 0; i < 10; i++) {
    //res = bpf.load_func("handle_rx", BPF_PROG_TYPE_XDP, fd);
    res = bpf.load_func("handle_rx", BPF_PROG_TYPE_SCHED_CLS, fd);
    if (res.code() != 0) {
      std::cout << "failed to load XDP program: {0}" << res.msg() <<
std::endl;
      return -1;
    }

    res = bpf.unload_func("handle_rx");
    if (res.code() != 0) {
      std::cout << "failed to unload XDP program: {0}" << res.msg() <<
std::endl;
      return -1;
    }
  }

  std::cout << "about to close " << std::endl;
  getchar();

  return 0;
}

EOF

$ g++ -o test_fd_leakage test_fd_leakage.cpp --std=c++11 -lbcc

$ sudo ./test_fd_leakage


Before closing the program:

$ sudo ls -la /proc/$PID/fd
total 0
dr-x------ 2 root root  0 Mar 13 23:23 .
dr-xr-xr-x 9 root root  0 Mar 13 23:23 ..
lrwx------ 1 root root 64 Mar 13 23:23 0 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 1 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 10 -> socket:[1728287]
lrwx------ 1 root root 64 Mar 13 23:23 11 -> socket:[1728288]
lrwx------ 1 root root 64 Mar 13 23:23 12 -> socket:[1728289]
lrwx------ 1 root root 64 Mar 13 23:23 13 -> socket:[1728290]
lrwx------ 1 root root 64 Mar 13 23:23 14 -> socket:[1728291]
lrwx------ 1 root root 64 Mar 13 23:23 15 -> socket:[1728292]
lrwx------ 1 root root 64 Mar 13 23:23 16 -> socket:[1728293]
lrwx------ 1 root root 64 Mar 13 23:23 17 -> socket:[1728294]
lrwx------ 1 root root 64 Mar 13 23:23 18 -> socket:[1728295]
lrwx------ 1 root root 64 Mar 13 23:23 19 -> socket:[1728296]
lrwx------ 1 root root 64 Mar 13 23:23 2 -> /dev/pts/7
lrwx------ 1 root root 64 Mar 13 23:23 20 -> socket:[1728297]
lrwx------ 1 root root 64 Mar 13 23:23 21 -> socket:[1728298]
lrwx------ 1 root root 64 Mar 13 23:23 22 -> socket:[1728299]
lrwx------ 1 root root 64 Mar 13 23:23 23 -> socket:[1728300]
lrwx------ 1 root root 64 Mar 13 23:23 4 -> socket:[1728279]
lrwx------ 1 root root 64 Mar 13 23:23 5 -> socket:[1728280]
lrwx------ 1 root root 64 Mar 13 23:23 6 -> socket:[1728283]
lrwx------ 1 root root 64 Mar 13 23:23 7 -> socket:[1728284]
lrwx------ 1 root root 64 Mar 13 23:23 8 -> socket:[1728285]
lrwx------ 1 root root 64 Mar 13 23:23 9 -> socket:[1728286]

The number of "socket:[xxxxxxx]" file descriptors is twice the number of
iterations of the for loop in the test program.

I am using kernel 4.16rc4 and bcc 7cfb73b.

Does somebody have an idea of what is causing this?


Thanks in advance,

Mauricio

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Regards,
Nirmoy


Nirmoy Das
 

Raised a PR https://github.com/iovisor/bcc/pull/1629 for that!

On 03/14/2018 01:45 PM, Mauricio Vasquez wrote:
This solves the issue we have in our application.

Thanks to both of you!

Mauricio.


Regards
Nirmoy
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB
21284 (AG Nürnberg) Maxfeldstr. 5
D-90409 Nürnberg / Phone: +49-911-740 18-4