DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

Sockmap - TCP Future

原文链接

Proper TCP socket splicing reduces the load on userspace processes and enables more efficient data forwarding.

关于L7代理部分,跟实际的linux调用之间很有多的关系,但也带来很多问题:

  1. Syscall cost: making multiple syscalls for every forwarded packet is costly.
  2. Wakeup latency: the user-space process must be woken up often to forward the data. Depending on the scheduler, this may result in poor tail latency.
  3. Copying cost: copying data from kernel to userspace and then immediately back to the kernel is not free and adds up to a measurable cost.

Linux has an amazing splice(2) syscall. It can tell the kernel to move data between a TCP buffer on a socket and a buffer on a pipe. The data remains in the buffers, on the kernel side. This solves the problem of needlessly having to copy the data between userspace and kernel-space.

In recent years Linux Kernel introduced an eBPF virtual machine. With it, user-space programs can run specialized, non-turing-complete bytecode in the kernel context. (有关eBPF部分?,内核虚拟机)

1
2
3
4
5
6
sock_map = bpf_create_map(BPF_MAP_TYPE_SOCKMAP, sizeof(int), sizeof(int), 2, 0)

prog_parser = bpf_load_program(BPF_PROG_TYPE_SK_SKB, ...)
prog_verdict = bpf_load_program(BPF_PROG_TYPE_SK_SKB, ...)
bpf_prog_attach(prog_parser, sock_map, BPF_SK_SKB_STREAM_PARSER)
bpf_prog_attach(prog_verdict, sock_map, BPF_SK_SKB_STREAM_VERDICT)
1
2
3
int idx = 0;
int val = sd;
bpf_map_update_elem(sock_map, &idx, &val, BPF_ANY);

This technology has multiple benefits. First, the data is never copied to userspace. Secondly, we never need to wake up the userspace program. All the action is done in the kernel.

We need one more piece of code, to hang the userspace program until the socket is closed.

1
2
3
4
5
/* Wait for the socket to close. Let SOCKMAP do the magic. */
struct pollfd fds[1] = {
{.fd = sd, .events = POLLRDHUP},
};
poll(fds, 1, -1);

It’s the first technology on Linux that truly allows the user-space process to offload TCP splicing to the kernel.