Need Moar Glass
The age of the ELF and DWARF is upon us.
The predominant ideology around public-permissionless blockchains is absolute code transparency, going as far as to question the right of closed-source projects to exist.
Even though data on platforms like Solana is inherently accessible to everyone, there's the harsh reality of (accidental) code obfuscation: The disassembly of a Solana program release build looks like you would expect it to be shown in an average movie hacker scene.
And for a while, this has been the extent of the insights gained by looking at Solana programs. Now add 9 digits worth of assets under management to the mix. You start to face a problem: How can we ensure that programs consisting of tens of thousands of BPF instructions behave as we expect them to?
What we can see today
So, for the past months, buildoors pushed substantial upgrades to low-level tooling for Solana. Here are some of the tools that are available today to untangle on-chain programs.
Ghidra is an open-source framework for reverse engineering developed by the NSA's Research Directorate (yeah, that NSA). Originally used for malware analysis, its wide range of supported architectures makes it ubiquitous wherever source code is missing.
A notable gap is BPF support, the instruction set used by the Solana VM. SolDragon by Neodyme is starting to fix that. While not ready yet (as of 2022-04), it brings us closer to a decompiler and binary differ.
Observability extends to the human element too.
You'd think that DeFi firms treated whitehats like their kneecaps depended on them (they kinda do). To the horrors of the researchers at Neodyme, disclosing vulnerabilities to a program deployer can be quite the challenge.
And so, the solana-security-txt standard was born. It ships a Rust macro to embed standardized contact info into Solana programs. The result is a dead-simple process to tell web3 devs what's wrong, analogous to the securitytxt.org standard for websites.
Open-source smart contracts are not quite enough. We need to verify that the provided source code translates exactly into the bytecode deployed on-chain.
The Anchor framework recently gained the Verifiable Builds to catch even the tiniest bit flip.
Is this enough?
The aforementioned tools give us better visibility into Solana byte code and strong assurance that said bytes are derived from specific source code.
Yet, we need more glass!
Let's start at the byte code this time; consider this fictional scenario: You got some alpha about a vulnerability at the 28306th instruction of some contract, and go disassemble.
You look bewildered at this mix of bytes. You grep the source code for
0xf0f0f0f0f0f0f0f and nothing comes up. What's going on here? Why would a smart contract use such a weird integer?
It slowly dawns on you that the state of Solana low-level is still all-or-nothing.
The above was a dramatic re-enactment of when I realized that we don't actually have a way to correlate BPF sub-routines to Rust functions.
Unfortunately, we're still ignoring the intermediate build steps and valuable DWARF debug symbols that would have mapped byte code to line numbers.
We also can't interactively debug BPF code yet.
Shining light on Solana ELFs using DWARFs
The bpf.wtf team formed after we had noticed we're hacking on the same ideas. As a loose group of devs, we'll ship various non-profit open-source contributions and public goods for Solana program devs & security researchers.
The following is a non-exhaustive list of the things we'll be working on to build help support the security research ecosystem.
Starting with bpf.sol – a series of writeups of the internals making up the Solana program runtime. We'll try to release posts every two weeks each documenting a part of the virtual machine as we descend down the stack.
DWARF is the industry standard for debug info in ELF executables. It enriches a binary with various info that gets lost when compiling to machine code, such as symbol names, data type info, and mappings to source code (line numbers).
So to kick off, we've fixed Solana's LLVM fork to re-enable DWARF support for the
bpfel+solana target. The ability to create debug symbols for Solana C or Rust on-chain programs is a first major upgrade in visibility.
The VM maintainers at Solana Labs have helped us get our first contribution merged. Expect full debug info support in the next release of
Even with DWARF support, debug info is stripped from release builds by default because of binary bloat. In fact, some Rust programs produce larger
release+debuginfo builds than just
GDB and LLDB support
Next up, @wj has been working on a proof-of-concept connecting the rbpf virtual machine to a debugger via the GDB remote serial protocol.
Devs and hackers will be able to introspect every aspect of program execution (registers, stack, memory, read-only data) with per-instruction granularity.
We expect to polish and ship this feature in Q2/2022.
This integration involves work on two fronts.
- First the "backend", i.e. the target being debugged, needs to be modified to accept debugger commands: Setting breakpoints, interrupting execution, etc. User Sladuca managed to do a lot of progress last year though development appears to have stopped: https://github.com/solana-labs/solana/issues/14756
- The frontends (GDB and LLDB) have to be taught the machine architecture and ABI details like stack frame layouts and calling convention.
The bpf.wtf project is continuing on both fronts, mainly focusing on LLDB.
Visual debugger frontend
Let's get with the times – the GDB command line is nice, but off-putting to noobies (like me) due to its learning curve. One of our major milestones is to integrate a GUI-based debugger frontend with the Solana VM.
One such option is CodeLLDB for example, a Visual Studio Code plugin.
Solana VM on WebAssembly
It's well known that Solana validators are beasts – infra people run them with 512GB of memory and powerful server CPUs. Still, the Sealevel smart contract runtime takes less than a millisecond of CPU time to actually execute code. The vast majority of validator resources are spent on moving accounts from/to memory. In theory, any isolated on-chain program can easily run on a Raspberry Pi.
With a bit of effort, we were able to wrap the Sealevel runtime in an (unreleased) portable C library creatively named
Client-side execution of contracts further enables ways to simulate transactions that are impossible with the RPC API. This includes features necessary for interactive debugging (single-stepping, breakpoints, machine introspection).
The team managed to create a build of the
rbpf virtual machine targeting
wasm32-unknown-unknown (wasm-pack) through a bit of refactoring. Once done, a Wasm build with a JS wrapper will be released as an NPM package.
Interactive Solana Explorer Debugger
To recap, we're working on …
- Debug info (DWARF)
- Debugging of historical executions
- A visual debugging frontend
- Contract execution outside of blockchain nodes
As you may have noticed, these milestones set us up to achieve our final goal:
>simulating/debugging Solana programs directly in the public Solana explorer.
As an unpaid/not-for-profit team of nerds, we're not sure if we'll ever reach the endgame. If it works out, you'll see us on Breakpoint. :)
Anyways, thanks for checking out our work! We hope you're as hyped as we are.
Follow us on Twitter for the latest developments and occasional shitposts. If you want to help out, please DM us.