~bpf.sol | 0x00: Intro
WTF is SBFv2 and how Solana runs arbitrary code on-chain
The Solana VM enables safe, deterministic, and preemptive execution of arbitrary C and Rust code. Yet, public awareness of the program runtime's inner workings is notoriously spotty.
The bpf.wtf team has been working hard shed some light on the historical blind spots: How do programs look at the instruction level? How does the JIT compiler generate x86_64 code?
Basic knowledge in protocol design and binary reverse engineering may be helpful but a CS background is not required. Let's dive in.
The BPF technology has been originally used to filter and capture network packets, hence the name.
"Classic BPF", which is a decade older than me, started as a minimalistic instruction set. It allowed attaching arbitrary short programs to a network interface, invoked for each packet.
It was born out of a need for reconfigurable packet processing that meets the performance demands of low-level networking. Logic had to execute up to millions of times per second.
BPF programs execute in the kernel, circumventing the costly kernel/userspace boundary. Notably, program execution is sandboxed to allow safe execution of arbitrary bytecode.
During the following 29 years, BPF developed into eBPF, a new 64-bit general-purpose instruction set. Along came a new JIT compiler that turns programs into native machine code for common CPU architectures, including x86_64, armv8, POWER, and RISC-V.
Meanwhile, the LLVM compiler project added the BPF backend, allowing the Clang C and Rust compilers to compile code to BPF.
Today, eBPF is as widespread as ever.
- It defends network infrastructure at terabit scale via XDP and socket filters. You can even run XDP to commercially available "SmartNIC" hardware!
- Within the Cilium project, it powers routing and load balancing for clusters of thousands of servers.
- It serves to express flexible system security policies via Seccomp filters.
Still, most implementations of eBPF had severe limitations by design – the bytecode verifier.
BPF on Solana
So how does this all relate to the Solana project?
If you're new: The Solana project can be described as a publicly accessible database of programs replicated across thousands of nodes. Access is permissionless: Anyone can deploy arbitrary programs ("smart contract"), and anyone can invoke them. As of now, it processes ~10000 program invocations per second from people and bots all over the world.
And you guessed it – the programs use a variant of the eBPF technology! To be more precise, they consist of SBF, which probably stands for Solana Bytecode Format.
TL;DR: Trading compile-time checks for runtime checks.
The near removal of the bytecode verifier stands out the most, making the SBF VM Turing-complete. Moving safety checks to the runtime allows arbitrary memory access, indirect jumps, loops, and other interesting behavior. Also: more syscalls/pre-compiles and the removal of unused features like input/output opcodes (some x86 déjà vu).
Rejecting most forms of instructions to jump backwards (direct or indirect) is an infamous limitation of the Linux eBPF verifier. It avoids the halting problem by limiting the control-flow graph of a program to a DAG. In other words, it proves that all possible execution paths of an eBPF program are finite. Obviously, the people still need loops. Clang does loop unrolling when targeting eBPF.
But that's just one way to avoid the halting problem. Solana has opted for the pragmatic approach to just kill all executions that take too long (a.k.a preemptive execution). The compute unit meter measures the number of instructions and syscalls executed. If it exceeds the limit, the transaction reverts, the sender pays a tiny fine (tx fee), and the chain goes on.
Last but not least, the VM has been rewritten in Rust to be embedded into the Solana validator process.
We hope this first look at Solana on-chain bytecode was informative. Now you can participate in Discord flamewars whether SBF is "just eBPF". In the next post, we'll take a look at the security constraints of the VM at large.
I can't count but here's the next post.
While there are far too many contributors to the BPF technology itself, I'd like to highlight the authors of the BPF runtime implementation that powers Solana today.
- Quentin Monnet, a PhD computer scientist and networking researcher. He is the original author of rbpf.
- Alessandro Decina, who pioneered Rust support for BPF.
- Lichtso, Jack May, Dmitri Makarov, et al., engineers at Solana Labs.
Ping us if someone is missing from this list!