Aya is a library that makes it possible to write eBPF programs fully in Rust and is focused on providing an experience that is as friendly as possible for developers. In this post we are going to explain what eBPF is, why Aya started, and what’s unique about it.
eBPF (extended Berkeley Packet Filter) is a technology that makes it possible to run sandboxed programs inside a virtual machine with its own minimal set of instructions.
It originated in the Linux kernel, where the eBPF VM (being the part of the kernel) triggers eBPF programs when a specific event happens in the system. There are more and more events added in new Linux kernel versions. For each type of event there is a separate kind of eBPF program. Currently in Linux, the most known program types are:
eBPF projects usually are built from two parts:
There are ways to share data between eBPF programs and user space:
Although eBPF started in Linux, nowadays there is also implementation in Windows. And eBPF is not even limited to operating system kernels. There are several user space implementations of eBPF, such as rbpf - a user space VM used in production by projects like Solana.
Today, eBPF programs are usually written either in C or eBPF assembly. But in 2021, the Rust compiler got support for compiling eBPF programs. Since then, Rust Nightly can compile a limited subset of Rust into an eBPF object file.
If you are interested in reading about the implementation details, we recommend to check out this blog post by Alessandro Decina, who is the author of the pull request.
Aya is the first library that supports writing the whole eBPF project (both the user space and kernel space parts) in Rust, without any dependency on libbpf or clang. In most of environments, Rust Nightly is the only dependency needed for building. Some environments where rustc doesn’t expose its internal LLVM.so library (i.e. aarch64) require installing a shared LLVM library. But there is no need for libbpf, clang, or bcc!
As mentioned before, the main focus of Aya is developer experience - making it as easy as possible to write eBPF programs. Now we are going to go into details how Aya achieves that.
Although the eBPF verifier ensures memory safety, using Rust over C is still beneficial in terms of type safety. Both Rust and macros inside Aya are strict in terms of what types are used in which context.
Let’s look at this example in C.
Headers:
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
The program:
SEC("xdp")
int incorrect_xdp(struct __sk_buff *skb) {
return XDP_PASS;
}
It will compile without any problems:
$ clang -O2 -emit-llvm -c incorrect_xdp.c -o - | llc -march=bpf -filetype=obj -o bpf.o
$
... despite the fact that the function signature of that program is incorrect. struct __sk_buff *skb is an argument provided to TC classifier programs, not XDP, which has an argument of type struct xdp_md *ctx. Clang is not catching that mistake during compilation.
Let’s try to make a similar mistake with Rust:
#[xdp(name = "incorrect_xdp")]
pub fn incorrect_xdp(ctx: SkBuffContext) -> u32 {
xdp_action::XDP_PASS
}$ cargo xtask build-ebpf
[...]
Compiling incorrect-xdp-ebpf v0.1.0 (/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf)
Running `rustc --crate-name incorrect_xdp --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C opt-level=3 -C panic=abort -C lto -C codegen-units=1 -C metadata=c92607119e7c631d -C extra-filename=-c92607119e7c631d --out-dir /home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps --target bpfel-unknown-none -L dependency=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps -L dependency=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/debug/deps --extern aya_bpf=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libaya_bpf-85e7be8a52b56ed9.rlib --extern aya_log_ebpf=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libaya_log_ebpf-1b46466744bed2bc.rlib --extern 'noprelude:compiler_builtins=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libcompiler_builtins-bb297dda66d0a4e2.rlib' --extern 'noprelude:core=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libcore-65086a797df2a9a7.rlib' --extern incorrect_xdp_common=/home/vadorovsky/repos/aya-examples/incorrect-xdp/incorrect-xdp-ebpf/../target/bpfel-unknown-none/debug/deps/libincorrect_xdp_common-114ad60c902270da.rlib -Z unstable-options`
error[E0308]: mismatched types
--> src/main.rs:7:1
|
7 | #[xdp(name = "incorrect_xdp")]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `SkBuffContext`, found struct `XdpContext`
8 | pub fn incorrect_xdp(ctx: SkBuffContext) -> u32 {
[...]
The Rust compiler was able to detect the mismatch between SkBuffContext (context of TC classifier program) and XdpContext (context of XDP program, which we should use when using the xdp macro).
The usual way of error handling in C is by returning an integer indicating success or error in a function and comparing that integer when calling it. In that case, since the return value is an error code, the actual result of a successful work is usually stored in a pointer provided as an argument. To make it very simple, the basic example (which gets triggered when new process is cloned and saves the PID in HashMap) looks like:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, pid_t);
__type(value, u32);
} pids SEC(".maps");
SEC("fentry/kernel_clone")
int BPF_PROG(kernel_clone, struct kernel_clone_args *args)
{
/* Get the pid */
pid_t pid = bpf_get_current_pid_tgid() >> 32;
/* Save the pid in map */
u32 val = 0;
int err = bpf_map_update_elem(&pids, &pid, &val, 0);
if (err < 0)
return err;
return 0;
}
Aya lets you use the Result enum and handle errors as it's done in the most of Rust projects. The only trick is to create two or more functions - one that has a C function signature, which returns only the integer type (the actual eBPF program) and others that return Result (used by the first function). Example:
#[map(name = "pids")]
static mut PIDS: HashMap<u32, u32> = HashMap::<u32, u32>::with_max_entries(1024, 0);
#[fentry(name = "kernel_clone")]
pub fn kernel_clone(ctx: FEntryContext) -> u32 {
match unsafe { try_kernel_clone(ctx) } {
Ok(ret) => ret,
Err(_) => 1,
}
}
fn try_kernel_clone(ctx: FEntryContext) -> Result<u32, c_long> {
// Get the pid
let pid = ctx.pid();
// Save the pid in map.
unsafe { PIDS.insert(&pid, &0, 0)? };
Ok(0)
}
The difference becomes significant when the eBPF code becomes larger and there are multiple errors to handle.
That’s right, to start playing with Aya and eBPF, usually you need to install only the Rust toolchain (nightly) and few crates. Detailed instructions are here. Cargo is enough to build the whole project and the produced binary will load the eBPF program into the kernel. Clang, bpftool, or iproute2 are not needed.
With Aya and Rust, you can use libraries in your eBPF program as long as they support no_std usage. More details about no_std are here. It’s also possible to release eBPF code as crates.
An example of a crate that is very often used in Aya-based eBPF programs is memoffset, which obtains offsets of struct members. We are going to see it in code examples later.
Aya-log is a library that lets people to easily log from their eBPF programs to the user space program. There is no need to use bpf_printk(), bpftool prog tracelog and the kernel trace buffer which is centralized. Aya-log sends the logs through a PerfEventArray to the user space part of the project, which is what eBPF developers often implement from scratch, but there is no need to do so with Aya!
Logging in Aya is as simple as:
#[fentry(name = "kernel_clone")]
pub fn kernel_clone(ctx: FEntryContext) -> u32 {
let pid = ctx.pid();
info!(&ctx, "new process: pid: {}", pid);
0
}
And then it’s visible in the user space process:
cargo-generate is a tool that helps with creating new Rust projects by using a git repository with a template. You can use it to create a new eBPF project based on Aya, using our aya-template repository.
Starting a new project is as simple as:
cargo install cargo-generate
cargo generate https://github.com/aya-rs/aya-template
Then cargo-generate asks question about the project, which mostly depend on the chosen type of eBPF program.
And the project layout with firewall user space crate, firewall-common crate for shared code, firewall-ebpf with eBPF code, and xtask for build commands:
Many eBPF projects keep the eBPF (kernel space) part in C, but the user space part in other languages like Go or Rust. In such cases, when some structures are used in both parts of the project, bindings to C structure definitions have to be generated.
In projects based entirely on Aya and Rust, it’s a common practice to keep common types in a crate called [project-name]-common. That crate is created by default when creating a new project using aya-template. That crate can contain, for example, struct definitions used in maps.
User space part of projects based on Aya can be asynchronous, both Tokio and async-std are supported. Aya can load eBPF programs, perform operations on eBPF maps and perf buffers in asynchronous context.
We are analyzing network traffic on virtual machines, Docker, Kubernetes, and public cloud environments.
On each node, that analysis is done in two places for different purposes:
At the beginning, we mentioned the TC classifier type of eBPF program. All incoming traffic comes to TC and is then redirected to a bound socket where the data can be consumed in user space. It's the same logic for outgoing traffic but in reverse, the data goes via the socket API and then goes through TC. This means by attaching to TC, you can intercept the kernel socket buffer (sk_buff for those who ever ventured the kernel code) and analyze all of it. On top of accessing the content, you can also make decisions such as dropping the packet, or letting it through.
This is the example of eBPF program applying a simple ingress network policy:
const ETH_HDR_LEN: usize = mem::size_of::<ethhdr>();
const ETH_P_IP: u16 = 0x0800;
const IPPROTO_TCP: u8 = 6;
const IPPROTO_UDP: u8 = 17;
#[map]
static mut BLOCKLIST_V4_INGRESS: HashMap<u32, u8> = HashMap::with_max_entries(1024, 0);
#[classifier(name = "tc_cls_ingress")]
pub fn tc_cls_ingress(ctx: SkBuffContext) -> i32 {
match { try_tc_filter_ingress(ctx) } {
Ok(_) => TC_ACT_PIPE,
Err(_) => TC_ACT_SHOT,
}
}
fn try_tc_cls_ingress(ctx: SkBuffContext) -> Result<(), i64> {
let eth_proto = u16::from_be(ctx.load(offset_of!(ethhdr, h_proto))?);
let ip_proto = ctx.load::<u8>(ETH_HDR_LEN + offset_of!(iphdr, protocol))?;
if !(eth_proto == ETH_P_IP && (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP)) {
return Ok(());
}
let saddr = u32::from_be(ctx.load(ETH_HDR_LEN + offset_of!(iphdr, saddr))?);
if unsafe { BLOCKLIST_V4_INGRESS.get(&saddr) }.is_some() {
error!(&ctx, "blocked packet");
return Err(-1);
}
info!(&ctx, "accepted packet");
Ok(())
}
Rust will compile this code into an eBPF ELF format (tc-filter in our example) that the Linux eBPF VM will be able to execute.
eBPF loading and attaching in user space:
let mut bpf = Bpf::load(include_bytes_aligned!(
".../target/bpfel-unknown-none/release/tc-filter"
))?;
let prog: &mut SchedClassifier = bpf.program_mut("tc_cls_ingress").unwrap().try_into()?;
prog.load()?;
let _ = tc::qdisc_add_clsact("eth0");
prog.attach("eth0", TcAttachType::Ingress)?;
The code above loads the eBPF binary, loads the specific programs and adds a new qdisc to TC. qdisc is short for “queue discipline” and they are mandatory. They allow for multiple eBPF programs to attach together on the same interface. Finally we attach the eBPF classifier tc_cls_ingress to TC on ingress for the interface eth0. So any incoming packets reaching TC will call the tc_cls_ingress function. The same can be done on egress.
Now that we have seen how eBPF programs can be built and triggered, let’s go deeper and see what they can do with the socket buffer. A socket buffer fully encapsulates one TCP or one UDP packet. This means that if you want to reconstruct HTTP messages, you will need to stack TCP packets and reorder them properly.
There are multiple ways to communicate back and forth with eBPF programs and user space programs. To quickly transmit data from eBPF to user space, PerfEventArray is the most efficient ways to do. Fortunately Aya also brings nice utilities around it.
Coming back to PerfEventArray, it was initially designed to just report metrics about traffic performance - hence the name - but in reality, you can use those arrays to pass any data you want. Aya makes it dead easy as the same data type can be used for PerfEventArray and your user space program.
In the future we want to support ring buffers in Aya, which bring better performance and are supported by newer kernels. The ongoing work is in progress.
Let’s see how to transmit socket buffers to user space.
First, we define a custom data type:
#[derive(Copy, Clone, Debug, Hash, Eq, PartialEq)]
#[repr(C)]
pub struct OobBuffer {
pub direction: TrafficDirection,
pub size: usize,
}
PerfEventArray in eBPF programs:
static mut OOB_DATA: PerfEventArray<OobBuffer> = PerfEventArray::new(0);
#[classifier(name = "my_ingress_endpoint")]
fn tc_cls_ingress(mut skb: SkBuffContext) -> i32 {
unsafe {
OOB_DATA.output(
skb,
&OobBuffer {
direction: TrafficDirection::Ingress,
size: skb.len() as usize,
},
skb.len(),
)
}
return TC_ACT_PIPE
}
PerfEventArray in user space program:
let oob_events: AsyncPerfEventArray<_> =
bpf.map_mut("OOB_DATA").unwrap().try_into().unwrap();
for cpu_id in online_cpus()? {
let mut oob_cpu_buf = oob_events.open(cpu_id, Some(256))?;
spawn(&format!("cpu_{}_perf_read", cpu_id), async move {
loop {
let bufs = (0..sizes.buf_count)
.map(|_| BytesMut::with_capacity(128 * 4096))
.collect::<Vec<_>>();
let events = oob_cpu_buf.read_events(&mut bufs).await.unwrap();
// Play with the recieved events in bufs
}
});
}
The PerfEventArray is a ring buffer and is bound with a map name OOB_DATA across eBPF programs and user space. The faster you can retrieve events from the buffer, the fewer you are going to miss. Here, we open the PerfEventArray and we spawn tokio tasks across all CPUs. There is one PerfEventArray allocated per CPU. Then we start reading from it asynchronously. When an event is sent from eBPF, user space task is awoken and starts reading the event. Note that our PerfEventArray data is composed of: custom type and the appended socket buffer. So to retrieve the underlying socket buffer, we can simply offset the custom type and access the remaining bytes.
Here we are, we have a way to attach eBPF program to TC and retrieve socket buffers to user space. The funny work can start!
The way Deepfence does deep packet inspection is by reordering the TCP frames and reconstructing HTTP messages from the socket buffer data gathered by eBPF. Once the HTTP payload is reconstructed, we apply rule matching to detect whether something malicious is present. If so, we generate an alert and notify customers. We deal the same way with the other application layer (L7) protocols.
A rule defines how to detect a specific malicious content on HTTP payloads. It also includes meta information on its purpose (the reason for its creation, or what it detects). It usually relies on the haystack finding approach but also regular expression matching approaches. Matching happens on different parts of the HTTP message, it can be headers, port, or even HTML body. Needless to say that such operations are CPU intensive. Here is a rule example:
alert http $HOME_NET any -> any any (msg:"ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted";
flow:established,to_server; threshold: type both, count 1, seconds 300, track by_src;
http.header; content:"Authorization|3a 20|Basic"; nocase; content:!"YW5vbnltb3VzOg==";
within:32; content:!"Proxy-Authorization|3a 20|Basic"; nocase;
content:!"KG51bGwpOihudWxsKQ==";
reference:url,doc.emergingthreats.net/bin/view/Main/2006380;
classtype:policy-violation; sid:2006380; rev:15;
metadata:created_at 2010_07_30, former_category POLICY, updated_at 2022_06_14;)
This rule above is an emerging threat rule. It can be roughly translated into: any HTTP payload containing Authorization text and anything but YW5vbnltb3VzOg== next to it, should trigger an alert saying unencrypted passwords were found.
Finding the right set of rules is challenging, and applying them at the right time is crucial. At Deepfence, we aggregate different rules from different sources and different format, like emerging threat rules but also mod security core rule set for instance. But users can also provide their own rules. We apply them to the live traffic captured by eBPF program to achieve real time alerting.
Deepfence, apart from network tracing, focuses also on monitoring processes container workloads. That can be achieved by using tracepoint eBPF program triggered by new processes in the system.
#[map]
pub static mut RUNC_EVENT_SCRATCH: PerCpuArray<RuncEvent> = PerCpuArray::with_max_entries(1, 0);
#[map]
pub static mut RUNC_EVENTS: PerfEventArray<RuncEvent> = PerfEventArray::new(0);
#[tracepoint(name = "runc_tracepoint")]
pub fn runc_tracepoint(ctx: TracePointContext) -> i64 {
match { try_runc_tracepoint(ctx) } {
Ok(ret) => ret,
Err(_) => ret,
}
}
fn try_runc_tracepoint(ctx: TracePointContext) -> Result<i64, i64> {
// To check offset values:
// sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/format
const FILENAME_POINTER_OFFSET: usize = 8;
let buf = unsafe {
let ptr = FILENAME_BUF.get_ptr_mut(0).ok_or(0i64)?;
&mut *ptr
};
let filename = unsafe {
let len = bpf_probe_read_kernel_str(
(ctx.as_ptr() as *const u8).add(ctx.read_at::<u16>(FILENAME_POINTER_OFFSET)? as usize),
&mut buf.buf,
)?;
core::str::from_utf8_unchecked(&buf.buf[..len])
};
if filename.ends_with("runc\0") {
let pid = bpf_get_current_pid_tgid() as u32;
let event = &RuncEvent { pid };
unsafe { RUNC_EVENTS.output(&ctx, &event, 0) };
}
Ok(0)
}
This program basically:
Of course with such simple filtering, if someone calls some binary foobar-runc (and it has nothing to do with the real runc), we have a problem. But let's deal with that in the user space.
The definition of RuncEvent is here, it just contains a PID:
#[derive(Debug, Clone)]
#[repr(C)]
pub struct RuncEvent {
pub pid: u32,
}
Then we can consume the event in the user space:
let oob_events: AsyncPerfEventArray<_> =
bpf.map_mut("RUNC_EVENTS").unwrap().try_into().unwrap();
for cpu_id in online_cpus()? {
let mut runc_buf = oob_events.open(cpu_id, Some(256))?;
spawn(&format!("cpu_{}_runc_perf_read", cpu_id), async move {
loop {
let bufs = (0..sizes.buf_count)
.map(|_| BytesMut::with_capacity(128 * 4096))
.collect::<Vec<_>>();
let events = runc_buf.read_events(&mut bufs).await.unwrap();
for i in 0..events.read {
let buf = &mut buffers[i];
let ptr = buf.as_ptr() as *const SuidEvent;
let event = unsafe { ptr.read_unaligned() };
handle_runc_event(event.pid).unwrap();
}
}
});
}
It’s better to define our logic in some other function, the loop above is already complex enough. We can try to look for the actual process:
fn handle_runc_event(pid: u32) -> Result<(), anyhow::Error> {
let p = match Process::new(pid as i32)?;
// do something with `p`, parse its cmdline
// and check if it's actually runc
}
That way of monitoring runc processes is agnostic to container engines (Docker, podman) or orchestration systems (Kubernetes and different CRI implementations), so it’s universal and fast. And based on container creation events, we are able to start parsing container configuration.
We use that solution for monitoring and scanning of container filesystems whenever a new container is created.
In this post we introduced you to eBPF and Aya, and how Deepfence leverages those technologies to reliably detect real customer security issues.
If you have questions or want to hear more, we encourage you to read our Deep Packet Inspection documentation and join the Deepfence Slack workspace!
If you want to learn about Aya, check out the Aya book.
Aya has very active community on Discord, where conversations happen pretty much everyday. We invite you to join and feel free to ask any questions related to Aya and eBPF!
Finally, if you’re interested in hacking on eBPF, Rust, and Kubernetes, reach out – careers(at)deepfence(dot)io