Aya: your tRusty eBPF companion

Aya: your tRusty eBPF companion
July 22, 2022
Author:

Aya is a library that makes it possible to write eBPF programs fully in Rust and is focused on providing an experience that is as friendly as possible for developers. In this post we are going to explain what eBPF is, why Aya started, and what’s unique about it.

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a technology that makes it possible to run sandboxed programs inside a virtual machine with its own minimal set of instructions.

It originated in the Linux kernel, where the eBPF VM (being the part of the kernel) triggers eBPF programs when a specific event happens in the system. There are more and more events added in new Linux kernel versions. For each type of event there is a separate kind of eBPF program. Currently in Linux, the most known program types are:

  • Kprobes (kernel probes), fentry - can be attached to any kernel function during runtime.
  • Tracepoints - are hooks placed in various places in the Linux kernel code, which are more stable than Kprobes, that can change faster between kernel versions.
  • TC classifier - can be attached to egress and ingress qdisc (“queuing discipline” in Linux networking) for inspecting network interfaces and performing operations like accepting, dropping, redirecting, sending them to the queue again, etc.
  • XDP - similar to TC classifier, but attaches to the NIC driver and receives raw packets before they go through any layers of kernel networking stack. The limitation is that it can receive only ingress packets.
  • LSM - stands for Linux Security Modules, they are programs that are able to decide whether a particular security-related action is allowed to happen or not.

eBPF projects usually are built from two parts:

  • eBPF program itself, running in the kernel and reacting to events.
  • User space program, which loads eBPF programs into the kernel and manages their lifetime.

There are ways to share data between eBPF programs and user space:

  • Maps - data structures used by eBPF programs and, depending on the type, also by the user space. With standard map types like HashMap, both eBPF and user space can read and write to them.
  • Perf / ring buffers - (PerfEventArray) - buffers to which eBPF program can push events (in form of custom structures) to the user space. This is a way to notify the user space program immediately.

Although eBPF started in Linux, nowadays there is also implementation in Windows. And eBPF is not even limited to operating system kernels. There are several user space implementations of eBPF, such as rbpf - a user space VM used in production by projects like Solana.

What is Aya and how did it start?

Today, eBPF programs are usually written either in C or eBPF assembly. But in 2021, the Rust compiler got support for compiling eBPF programs. Since then, Rust Nightly can compile a limited subset of Rust into an eBPF object file.

If you are interested in reading about the implementation details, we recommend to check out this blog post by Alessandro Decina, who is the author of the pull request.

Aya is the first library that supports writing the whole eBPF project (both the user space and kernel space parts) in Rust, without any dependency on libbpf or clang. In most of environments, Rust Nightly is the only dependency needed for building. Some environments where rustc doesn’t expose its internal LLVM.so library (i.e. aarch64) require installing a shared LLVM library. But there is no need for libbpf, clang, or bcc!

As mentioned before, the main focus of Aya is developer experience - making it as easy as possible to write eBPF programs. Now we are going to go into details how Aya achieves that.

More (type) safety

Although the eBPF verifier ensures memory safety, using Rust over C is still beneficial in terms of type safety. Both Rust and macros inside Aya are strict in terms of what types are used in which context.

Let’s look at this example in C.

Headers:

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

The program:

SEC("xdp")
int incorrect_xdp(struct __sk_buff *skb) {
return XDP_PASS;
}

It will compile without any problems:

$ clang -O2 -emit-llvm -c incorrect_xdp.c -o - | llc -march=bpf -filetype=obj -o bpf.o
$

... despite the fact that the function signature of that program is incorrect. struct __sk_buff *skb is an argument provided to TC classifier programs, not XDP, which has an argument of type struct xdp_md *ctx. Clang is not catching that mistake during compilation.

Let’s try to make a similar mistake with Rust:

#[xdp(name = "incorrect_xdp")]
pub fn incorrect_xdp(ctx: SkBuffContext) -> u32 {
xdp_action::XDP_PASS
}$ cargo xtask build-ebpf
[...]
Compiling incorrect-xdp-ebpf v0.1.0 (/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf)
Running `rustc --crate-name incorrect_xdp --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C panic=abort -C lto -C codegen-units=1 -C metadata=c92607119e7c631d -C extra-filename=-c92607119e7c631d --out-dir /home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps --target bpfel-unknown-none -L dependency=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps -L dependency=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/debug/deps --extern aya_bpf=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libaya_bpf-85e7be8a52b56ed9.rlib --extern aya_log_ebpf=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libaya_log_ebpf-1b46466744bed2bc.rlib --extern 'noprelude:compiler_builtins=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libcompiler_builtins-bb297dda66d0a4e2.rlib' --extern 'noprelude:core=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libcore-65086a797df2a9a7.rlib' --extern incorrect_xdp_common=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libincorrect_xdp_common-114ad60c902270da.rlib -Z unstable-options`
error[E0308]: mismatched types
--> src/main.rs:7:1
|
7 | #[xdp(name = "incorrect_xdp")]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `SkBuffContext`, found struct `XdpContext`
8 | pub fn incorrect_xdp(ctx: SkBuffContext) -> u32 {
[...]

The Rust compiler was able to detect the mismatch between SkBuffContext (context of TC classifier program) and XdpContext (context of XDP program, which we should use when using the xdp macro).

Error handling

The usual way of error handling in C is by returning an integer indicating success or error in a function and comparing that integer when calling it. In that case, since the return value is an error code, the actual result of a successful work is usually stored in a pointer provided as an argument. To make it very simple, the basic example (which gets triggered when new process is cloned and saves the PID in HashMap) looks like:

struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, pid_t);
__type(value, u32);
} pids SEC(".maps");

SEC("fentry/kernel_clone")
int BPF_PROG(kernel_clone, struct kernel_clone_args *args)
{
/* Get the pid */
pid_t pid = bpf_get_current_pid_tgid() >> 32;
/* Save the pid in map */
u32 val = 0;
int err = bpf_map_update_elem(&pids, &pid, &val, 0);
if (err < 0)
return err;
return 0;
}

Aya lets you use the Result enum and handle errors as it's done in the most of Rust projects. The only trick is to create two or more functions - one that has a C function signature, which returns only the integer type (the actual eBPF program) and others that return Result (used by the first function). Example:

#[map(name = "pids")]
static mut PIDS: HashMap<u32, u32> = HashMap::<u32, u32>::with_max_entries(1024, 0);

#[fentry(name = "kernel_clone")]
pub fn kernel_clone(ctx: FEntryContext) -> u32 {
match unsafe { try_kernel_clone(ctx) } {
Ok(ret) => ret,
Err(_) => 1,
}
}

fn try_kernel_clone(ctx: FEntryContext) -> Result<u32, c_long> {
// Get the pid
let pid = ctx.pid();
// Save the pid in map.
unsafe { PIDS.insert(&pid, &0, 0)? };
Ok(0)
}

The difference becomes significant when the eBPF code becomes larger and there are multiple errors to handle.

The Rust toolchain is all you need

That’s right, to start playing with Aya and eBPF, usually you need to install only the Rust toolchain (nightly) and few crates. Detailed instructions are here. Cargo is enough to build the whole project and the produced binary will load the eBPF program into the kernel. Clang, bpftool, or iproute2 are not needed.

With Aya and Rust, you can use libraries in your eBPF program as long as they support no_std usage. More details about no_std are here. It’s also possible to release eBPF code as crates.

An example of a crate that is very often used in Aya-based eBPF programs is memoffset, which obtains offsets of struct members. We are going to see it in code examples later.

aya-log

Aya-log is a library that lets people to easily log from their eBPF programs to the user space program. There is no need to use bpf_printk(), bpftool prog tracelog and the kernel trace buffer which is centralized. Aya-log sends the logs through a PerfEventArray to the user space part of the project, which is what eBPF developers often implement from scratch, but there is no need to do so with Aya!

Logging in Aya is as simple as:

#[fentry(name = "kernel_clone")]
pub fn kernel_clone(ctx: FEntryContext) -> u32 {
let pid = ctx.pid();
info!(&ctx, "new process: pid: {}", pid);
0
}

And then it’s visible in the user space process:

aya-template

cargo-generate is a tool that helps with creating new Rust projects by using a git repository with a template. You can use it to create a new eBPF project based on Aya, using our aya-template repository.

Starting a new project is as simple as:

cargo install cargo-generate
cargo generate https://github.com/aya-rs/aya-template

Then cargo-generate asks question about the project, which mostly depend on the chosen type of eBPF program.

cargo-generate with aya

And the project layout with firewall user space crate, firewall-common crate for shared code, firewall-ebpf with eBPF code, and xtask for build commands:

Sharing the common types and logic between user space and eBPF

Many eBPF projects keep the eBPF (kernel space) part in C, but the user space part in other languages like Go or Rust. In such cases, when some structures are used in both parts of the project, bindings to C structure definitions have to be generated.

In projects based entirely on Aya and Rust, it’s a common practice to keep common types in a crate called [project-name]-common. That crate is created by default when creating a new project using aya-template. That crate can contain, for example, struct definitions used in maps.

Async support

User space part of projects based on Aya can be asynchronous, both Tokio and async-std are supported. Aya can load eBPF programs, perform operations on eBPF maps and perf buffers in asynchronous context.

How Deepfence leverages Aya

Packet inspection overview

We are analyzing network traffic on virtual machines, Docker, Kubernetes, and public cloud environments.

On each node, that analysis is done in two places for different purposes:

  • Inline in eBPF program, with the TC classifier context. On this level, we:
  • Perform a network layer inspection (L3) to check the source and destination addresses.
  • Perform a transport layer inspection (L4) to check the local and remote port.
  • Based on that information, we apply network policies. If the given address (and port) are in our blocklist, we drop the packet. We also have allowlist logic when there is a wildcard blocklist policy.
  • Perform basic application level (L7) inspection:
  • HTTP - some HTTP headers might contain information about the client that were masked by load balancers, which then we also use for enforcing network policies and dropping the packet.
  • User space after retrieving a packet from eBPF (via PerfEventArray) for further analysis.
  • We are matching all the packets with our sets of security rules, which are compatible with Suricata and ModSecurity.
  • When some packet matches any rule, we raise an alert.
  • Each rule has different alert severity - critical, high, medium, or low. Our thresholds for each severity are configurable.
  • After some threshold was reached, we automatically create a new network policy, which is going to block the particular traffic inline, in eBPF.

Example of TC classifier eBPF program

At the beginning, we mentioned the TC classifier type of eBPF program. All incoming traffic comes to TC and is then redirected to a bound socket where the data can be consumed in user space. It's the same logic for outgoing traffic but in reverse, the data goes via the socket API and then goes through TC. This means by attaching to TC, you can intercept the kernel socket buffer (sk_buff for those who ever ventured the kernel code) and analyze all of it. On top of accessing the content, you can also make decisions such as dropping the packet, or letting it through.

This is the example of eBPF program applying a simple ingress network policy:

const ETH_HDR_LEN: usize = mem::size_of::<ethhdr>();

const ETH_P_IP: u16 = 0x0800;
const IPPROTO_TCP: u8 = 6;
const IPPROTO_UDP: u8 = 17;

#[map]
static mut BLOCKLIST_V4_INGRESS: HashMap<u32, u8> = HashMap::with_max_entries(1024, 0);

#[classifier(name = "tc_cls_ingress")]
pub fn tc_cls_ingress(ctx: SkBuffContext) -> i32 {
match { try_tc_filter_ingress(ctx) } {
Ok(_) => TC_ACT_PIPE,
Err(_) => TC_ACT_SHOT,
}
}

fn try_tc_cls_ingress(ctx: SkBuffContext) -> Result<(), i64> {
let eth_proto = u16::from_be(ctx.load(offset_of!(ethhdr, h_proto))?);
let ip_proto = ctx.load::<u8>(ETH_HDR_LEN + offset_of!(iphdr, protocol))?;
if !(eth_proto == ETH_P_IP && (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP)) {
return Ok(());
}

let saddr = u32::from_be(ctx.load(ETH_HDR_LEN + offset_of!(iphdr, saddr))?);

if unsafe { BLOCKLIST_V4_INGRESS.get(&saddr) }.is_some() {
error!(&ctx, "blocked packet");
return Err(-1);
}

info!(&ctx, "accepted packet");
Ok(())
}

Rust will compile this code into an eBPF ELF format (tc-filter in our example) that the Linux eBPF VM will be able to execute.

eBPF loading and attaching in user space:

let mut bpf = Bpf::load(include_bytes_aligned!(
".../target/bpfel-unknown-none/release/tc-filter"
))?;
let prog: &mut SchedClassifier = bpf.program_mut("tc_cls_ingress").unwrap().try_into()?;
prog.load()?;
let _ = tc::qdisc_add_clsact("eth0");
prog.attach("eth0", TcAttachType::Ingress)?;

The code above loads the eBPF binary, loads the specific programs and adds a new qdisc to TC. qdisc is short for “queue discipline” and they are mandatory. They allow for multiple eBPF programs to attach together on the same interface. Finally we attach the eBPF classifier tc_cls_ingress to TC on ingress for the interface eth0. So any incoming packets reaching TC will call the tc_cls_ingress function. The same can be done on egress.

Now that we have seen how eBPF programs can be built and triggered, let’s go deeper and see what they can do with the socket buffer. A socket buffer fully encapsulates one TCP or one UDP packet. This means that if you want to reconstruct HTTP messages, you will need to stack TCP packets and reorder them properly.

Sending data to user space with PerfEventArray

There are multiple ways to communicate back and forth with eBPF programs and user space programs. To quickly transmit data from eBPF to user space, PerfEventArray is the most efficient ways to do. Fortunately Aya also brings nice utilities around it.

Coming back to PerfEventArray, it was initially designed to just report metrics about traffic performance - hence the name - but in reality, you can use those arrays to pass any data you want. Aya makes it dead easy as the same data type can be used for PerfEventArray and your user space program.

In the future we want to support ring buffers in Aya, which bring better performance and are supported by newer kernels. The ongoing work is in progress.

Let’s see how to transmit socket buffers to user space.

First, we define a custom data type:

#[derive(Copy, Clone, Debug, Hash, Eq, PartialEq)]
#[repr(C)]
pub struct OobBuffer {
pub direction: TrafficDirection,
pub size: usize,
}

PerfEventArray in eBPF programs:

static mut OOB_DATA: PerfEventArray<OobBuffer> = PerfEventArray::new(0);

#[classifier(name = "my_ingress_endpoint")]
fn tc_cls_ingress(mut skb: SkBuffContext) -> i32 {
unsafe {
OOB_DATA.output(
skb,
&OobBuffer {
direction: TrafficDirection::Ingress,
size: skb.len() as usize,
},
skb.len(),
)
}

return TC_ACT_PIPE
}

PerfEventArray in user space program:

let oob_events: AsyncPerfEventArray<_> =
bpf.map_mut("OOB_DATA").unwrap().try_into().unwrap();

for cpu_id in online_cpus()? {
let mut oob_cpu_buf = oob_events.open(cpu_id, Some(256))?;
spawn(&format!("cpu_{}_perf_read", cpu_id), async move {
loop {
let bufs = (0..sizes.buf_count)
.map(|_| BytesMut::with_capacity(128 * 4096))
.collect::<Vec<_>>();
let events = oob_cpu_buf.read_events(&mut bufs).await.unwrap();
// Play with the recieved events in bufs
}
});
}

The PerfEventArray is a ring buffer and is bound with a map name OOB_DATA across eBPF programs and user space. The faster you can retrieve events from the buffer, the fewer you are going to miss. Here, we open the PerfEventArray and we spawn tokio tasks across all CPUs. There is one PerfEventArray allocated per CPU. Then we start reading from it asynchronously. When an event is sent from eBPF, user space task is awoken and starts reading the event. Note that our PerfEventArray data is composed of: custom type and the appended socket buffer. So to retrieve the underlying socket buffer, we can simply offset the custom type and access the remaining bytes.

Here we are, we have a way to attach eBPF program to TC and retrieve socket buffers to user space. The funny work can start!

Processing packets in user space

The way Deepfence does deep packet inspection is by reordering the TCP frames and reconstructing HTTP messages from the socket buffer data gathered by eBPF. Once the HTTP payload is reconstructed, we apply rule matching to detect whether something malicious is present. If so, we generate an alert and notify customers. We deal the same way with the other application layer (L7) protocols.

A rule defines how to detect a specific malicious content on HTTP payloads. It also includes meta information on its purpose (the reason for its creation, or what it detects). It usually relies on the haystack finding approach but also regular expression matching approaches. Matching happens on different parts of the HTTP message, it can be headers, port, or even HTML body. Needless to say that such operations are CPU intensive. Here is a rule example:

alert http $HOME_NET any -> any any (msg:"ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted";
flow:established,to_server; threshold: type both, count 1, seconds 300, track by_src;
http.header; content:"Authorization|3a 20|Basic"; nocase; content:!"YW5vbnltb3VzOg==";
within:32; content:!"Proxy-Authorization|3a 20|Basic"; nocase;
content:!"KG51bGwpOihudWxsKQ==";
reference:url,doc.emergingthreats.net/bin/view/Main/2006380;
classtype:policy-violation; sid:2006380; rev:15;
metadata:created_at 2010_07_30, former_category POLICY, updated_at 2022_06_14;)

This rule above is an emerging threat rule. It can be roughly translated into: any HTTP payload containing Authorization text and anything but YW5vbnltb3VzOg== next to it, should trigger an alert saying unencrypted passwords were found.

Finding the right set of rules is challenging, and applying them at the right time is crucial. At Deepfence, we aggregate different rules from different sources and different format, like emerging threat rules but also mod security core rule set for instance. But users can also provide their own rules. We apply them to the live traffic captured by eBPF program to achieve real time alerting.

Watching processes and containers

Deepfence, apart from network tracing, focuses also on monitoring processes container workloads. That can be achieved by using tracepoint eBPF program triggered by new processes in the system.

#[map]
pub static mut RUNC_EVENT_SCRATCH: PerCpuArray<RuncEvent> = PerCpuArray::with_max_entries(1, 0);
#[map]
pub static mut RUNC_EVENTS: PerfEventArray<RuncEvent> = PerfEventArray::new(0);

#[tracepoint(name = "runc_tracepoint")]
pub fn runc_tracepoint(ctx: TracePointContext) -> i64 {
match { try_runc_tracepoint(ctx) } {
Ok(ret) => ret,
Err(_) => ret,
}
}

fn try_runc_tracepoint(ctx: TracePointContext) -> Result<i64, i64> {
// To check offset values:
// sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/format
const FILENAME_POINTER_OFFSET: usize = 8;
let buf = unsafe {
let ptr = FILENAME_BUF.get_ptr_mut(0).ok_or(0i64)?;
&mut *ptr
};
let filename = unsafe {
let len = bpf_probe_read_kernel_str(
(ctx.as_ptr() as *const u8).add(ctx.read_at::<u16>(FILENAME_POINTER_OFFSET)? as usize),
&mut buf.buf,
)?;
core::str::from_utf8_unchecked(&buf.buf[..len])
};
if filename.ends_with("runc\0") {
let pid = bpf_get_current_pid_tgid() as u32;
let event = &RuncEvent { pid };
unsafe { RUNC_EVENTS.output(&ctx, &event, 0) };
}
Ok(0)
}

This program basically:

  • Gets triggered by sched_process_exec tracepoint - when new processes are spawned by executing a binary.
  • Checks if the filename ends with runc.
  • If yes, outputs an event to the user space via a PerfEventArray.

Of course with such simple filtering, if someone calls some binary foobar-runc (and it has nothing to do with the real runc), we have a problem. But let's deal with that in the user space.

The definition of RuncEvent is here, it just contains a PID:

#[derive(Debug, Clone)]
#[repr(C)]
pub struct RuncEvent {
pub pid: u32,
}

Then we can consume the event in the user space:

let oob_events: AsyncPerfEventArray<_> =
bpf.map_mut("RUNC_EVENTS").unwrap().try_into().unwrap();

for cpu_id in online_cpus()? {
let mut runc_buf = oob_events.open(cpu_id, Some(256))?;
spawn(&format!("cpu_{}_runc_perf_read", cpu_id), async move {
loop {
let bufs = (0..sizes.buf_count)
.map(|_| BytesMut::with_capacity(128 * 4096))
.collect::<Vec<_>>();
let events = runc_buf.read_events(&mut bufs).await.unwrap();
for i in 0..events.read {
let buf = &mut buffers[i];
let ptr = buf.as_ptr() as *const SuidEvent;
let event = unsafe { ptr.read_unaligned() };
handle_runc_event(event.pid).unwrap();
}
}
});
}

It’s better to define our logic in some other function, the loop above is already complex enough. We can try to look for the actual process:

fn handle_runc_event(pid: u32) -> Result<(), anyhow::Error> {
let p = match Process::new(pid as i32)?;
// do something with `p`, parse its cmdline
// and check if it's actually runc
}

That way of monitoring runc processes is agnostic to container engines (Docker, podman) or orchestration systems (Kubernetes and different CRI implementations), so it’s universal and fast. And based on container creation events, we are able to start parsing container configuration.

We use that solution for monitoring and scanning of container filesystems whenever a new container is created.

Conclusion

In this post we introduced you to eBPF and Aya, and how Deepfence leverages those technologies to reliably detect real customer security issues.

If you have questions or want to hear more, we encourage you to read our Deep Packet Inspection documentation and join the Deepfence Slack workspace!

If you want to learn about Aya, check out the Aya book.

Aya has very active community on Discord, where conversations happen pretty much everyday. We invite you to join and feel free to ask any questions related to Aya and eBPF!

Finally, if you’re interested in hacking on eBPF, Rust, and Kubernetes, reach out – careers(at)deepfence(dot)io