~bpf.sol | 0x03: SBF Instruction Set
All Solana Bytecode Format (SBF) opcodes
Sources
The family of eBPF ISAs lacks a specification document as you may know it from other architectures. Instead, each flavor of eBPF is defined by its VM implementation. As such, Solana's SBF is currently defined by the solana_rbpf Rust crate.
All info below was compiled together from these sources:
solana_rbpf::ebpf
, which defines SBF opcodes.
https://github.com/solana-labs/rbpf/blob/main/src/ebpf.rssolana_rbpf::disassembler
, which implements rbpf-style disassembly.
https://github.com/solana-labs/rbpf/blob/main/src/disassembler.rs- The
BPF
target of Solana's LLVM fork, which defines the register model, bytecode encoding, and pseudocode assembly form.
https://github.com/solana-labs/llvm-project/tree/solana-rustc/13.0-2021-09-30/llvm/lib/Target/BPF - The Capstone
BPF
backend, which includes an eBPF disassembler
https://github.com/capstone-engine/capstone/tree/next/arch/BPF
Note: This source is not authoritative. It may be outdated or conflict with actual on-chain behavior. If you find any such bugs, please consider sending corrections. And if this document helps you build software, please attribute this source – thanks!
Assembler
Several flavors of SBF disassembly exist. Unfortunately, they are not mutually compatible.
- rbpf-style: prefix notation with C-style indirect addressing.
Seen inrbpf-cli
and tools based on the Rust implementation of Solana. - Capstone-style: similar to rbpf-style with different ALU mnemonics.
Seen in binary analysis tools likeradare2
. - LLVM-style: infix notation resembling C pseudocode.
Seen with LLVM tools used in the build process.
Opcode Tables
Legacy Load/Store class
The BPF_LD
opcode family consists of special purpose opcodes inherited from Linux eBPF where they were used to access network packet input data.
Opcode | rbpf | Capstone | LLVM | Notes |
---|---|---|---|---|
0x18 | lddw | lddw | r1 = 0x42 ll | Two insns long |
0x30 | ldabsb | r1 = *(u8 *)skb[3] | Deprecated | |
0x28 | ldabsh | r1 = *(u16 *)skb[3] | Deprecated | |
0x20 | ldabsw | r1 = *(u32 *)skb[3] | Deprecated | |
0x38 | ldabsdw | r1 = *(u64 *)skb[3] | Deprecated | |
0x50 | ldindb | r1 = *(u8 *)skb[r0] | Deprecated | |
0x48 | ldindh | r1 = *(u16 *)skb[r0] | Deprecated | |
0x40 | ldindw | r1 = *(u32 *)skb[r0] | Deprecated | |
0x58 | ldinddw | r1 = *(u64 *)skb[r0] | Deprecated |
ldabs*
used to access instruction input data starting at 0x4_0000_0000
and ldind*
was the indirect version thereof. These instructions are disabled now and have been replaced by the new load/store class.
Only lddw
remains enabled on mainnet today. It moves a 64-bit immediate into a GPR. This makes it more of a 64-bit variant of mov32
than an actual load instruction. (Note that mov64
with an immediate is a synonym to mov32
– the name is misleading).
Load/Store class
The BPF_LDX
and BPF_STX
classes provide common memory operations.
Memory is referenced using a base address from a GPR and an immediate offset.
Three operation groups are available:
ldx*
: Load value from memory into register (8-bit, 16-bit, 32-bit, 64-bit)st*
: Store immediate value into memory (8-bit, 16-bit, 32-bit, 32-bit zero extended to 64-bit)stx*
: Store value from register into memory (8-bit, 16-bit, 32-bit, 64-bit)
Opcode | rbpf | Capstone | LLVM | Notes |
---|---|---|---|---|
0x71 | ldxb | ldxb | r1 = *(u8 *)(r2 + 42) | |
0x69 | ldxh | ldxh | r1 = *(u16 *)(r2 + 42) | |
0x61 | ldxw | ldxw | r1 = *(u32 *)(r2 + 42) | |
0x79 | ldxdw | ldxdw | r1 = *(u64 *)(r2 + 42) | |
0x72 | stb | stb | *(u8 *)(r2 + 42) = 69 | |
0x6a | sth | sth | *(u16 *)(r2 + 42) = 69 | |
0x62 | stw | stw | *(u32 *)(r2 + 42) = 69 | |
0x7a | stdw | stdw | *(u64 *)(r2 + 42) = 69 | Immediate is 32-bit |
0x73 | stxb | stxb | *(u8 *)(r2 + 42) = r1 | |
0x6b | stxh | stxh | *(u16 *)(r2 + 42) = r1 | |
0x63 | stxw | stxw | *(u32 *)(r2 + 42) = r1 | |
0x7b | stxdw | stxdw | *(u64 *)(r2 + 42) = r1 |
64-bit ALU class
The 64-bit ALU instructions operate on general-purpose registers. The stack pointer (r10) can only be used as a source operand.
Each operation has two forms:
- rD ← OP(rD, imm)
- rD ← OP(rD, rS)
Opcode | rbpf | Capstone | LLVM | Notes |
---|---|---|---|---|
0x07 | add64 | add64 | r1 += 0x42 | |
0x0f | add64 | add64 | r1 += r2 | |
0x17 | sub64 | sub64 | r1 -= 0x42 | |
0x1f | sub64 | sub64 | r1 -= r2 | |
0x27 | mul64 | mul64 | r1 *= 0x42 | |
0x2f | mul64 | mul64 | r1 *= r2 | |
0x37 | div64 | div64 | r1 /= 0x42 | |
0x3f | div64 | div64 | r1 /= r2 | |
0x47 | or64 | or64 | r1 |= 0x42 | |
0x4f | or64 | or64 | r1 |= r2 | |
0x57 | and64 | and64 | r1 &= 0x42 | |
0x5f | and64 | and64 | r1 &= r2 | |
0x67 | lsh64 | lsh64 | r1 <<= 0x42 | |
0x6f | lsh64 | lsh64 | r1 <<= r2 | |
0x77 | rsh64 | rsh64 | r1 >>= 0x42 | |
0x7f | rsh64 | rsh64 | r1 >>= r2 | |
0x87 | neg64 | neg64 | r1 = -r1 | |
0x97 | mod64 | mod64 | r1 %= 0x42 | |
0x9f | mod64 | mod64 | r1 %= r2 | |
0xa7 | xor64 | xor64 | r1 ^= 0x42 | |
0xaf | xor64 | xor64 | r1 ^= r2 | |
0xb7 | mov64 | mov64 | r1 = 0x42 | same as mov32 |
0xbf | mov64 | mov64 | r1 = r2 | |
0xc7 | arsh64 | arsh64 | ||
0xcf | arsh64 | arsh64 | ||
0xe7 | sdiv64 | |||
0xef | sdiv64 |
32-bit ALU class
The 32-bit ALU instructions mostly follow their 64-bit counterparts. They operate on the lower word of the input registers. The upper half of destination registers gets implicitly zeroed.
Opcode | rbpf | Capstone | LLVM |
---|---|---|---|
0x04 | add32 | add | w1 += 0x42 |
0x0c | add32 | add | w1 += w2 |
0x14 | sub32 | sub | w1 -= 0x42 |
0x1c | sub32 | sub | w1 -= w2 |
0x24 | mul32 | mul | w1 *= 0x42 |
0x3c | mul32 | mul | w1 *= w2 |
0x34 | div32 | div | w1 /= 0x42 |
0x3c | div32 | div | w1 /= w2 |
0x44 | or32 | or | w1 |= 0x42 |
0x4c | or32 | or | w1 |= w2 |
0x54 | and32 | and | w1 &= 0x42 |
0x5c | and32 | and | w1 &= w2 |
0x64 | lsh32 | lsh | w1 <<= 0x42 |
0x6c | lsh32 | lsh | w1 <<= w2 |
0x74 | rsh32 | rsh | w1 >>= 0x42 |
0x7c | rsh32 | rsh | w1 >>= w2 |
0x84 | neg32 | neg | w1 = -w1 |
0x94 | mod32 | mod | w1 %= 0x42 |
0x9c | mod32 | mod | w1 %= w2 |
0xa4 | xor32 | xor | w1 ^= 0x42 |
0xac | xor32 | xor | w1 ^= w2 |
0xb4 | mov32 | mov | w1 = 0x42 |
0xbc | mov32 | mov | w1 = w2 |
0xc4 | arsh32 | arsh | w1 s>>= 0x42 |
0xcc | arsh32 | arsh | w1 s>>= w2 |
0xe4 | sdiv32 | ||
0xec | sdiv32 |
Endian ALU extension
Opcode | rbpf | Capstone | LLVM | Notes |
---|---|---|---|---|
0xd4 | le{n} | le{n} | r1 = le{n} r1 | Basically a mask |
0xdc | be{n} | be{n} | r1 = be{n} r1 | Swaps endianness |
The LE/BE instructions operate in 16-bit, 32-bit, or 64-bit mode, indicated by the values in the immediate field.
For example, 0xdc
with destination register 1
and immediate 32
is be32 r1
.
They were used for portable endianness conversions. Since Solana is always little-endian, only the BE instruction swaps bytes.
le16 rD
is equivalent torD &= 0xFFFF
.le32 rD
is equivalent torD &= 0xFFFF_FFFF
.le64 rD
is anop
.be16 rD
swaps the lower 2 bytes and zeroes the upper 6.be32 rD
reverses the order of the lower 4 bytes and zeros the upper 4.be64 rD
reverses the order of all 8 bytes.
Jump class
The jump instructions combine comparisons and conditional jumps. Compared to x86 or PowerPC, this simplifies the ISA by removing the condition register and using less opcodes.
The conditional jump instructions (all except ja
) compare a register either against another register or an immediate value.
Opcode | rbpf | Capstone | LLVM |
---|---|---|---|
0x05 | ja | jmp | goto +12 |
0x15 | jeq | jeq | if r0 == r1 goto +12 |
0x1d | jeq | jeq | if r0 == 42 goto +12 |
0x25 | jgt | jgt | if r0 > r1 goto +12 |
0x2d | jgt | jgt | if r0 > 42 goto +12 |
0x35 | jge | jge | if r0 >= r1 goto +12 |
0x3d | jge | jge | if r0 >= 42 goto +12 |
0x45 | jset | jset | if r0 & r1 != 0 goto +12 |
0x4d | jset | jset | if r0 & 42 != 0 goto +12 |
0x55 | jne | jne | if r0 != r1 goto +12 |
0x5d | jne | jne | if r0 != 42 goto +12 |
0x65 | jsgt | jsgt | if r0 s> r1 goto +12 |
0x6d | jsgt | jsgt | if r0 s> 42 goto +12 |
0x75 | jsge | jsge | if r0 s>= r1 goto +12 |
0x7d | jsge | jsge | if r0 s>= 42 goto +12 |
0xa5 | jlt | jlt | if r0 < r1 goto +12 |
0xad | jlt | jlt | if r0 < 42 goto +12 |
0xb5 | jle | jle | if r0 <= r1 goto +12 |
0xbd | jle | jle | if r0 <= 42 goto +12 |
0xc5 | jslt | jslt | if r0 s< r1 goto +12 |
0xcd | jslt | jslt | if r0 s< 42 goto +12 |
0xd5 | jsle | jsle | if r0 s<= r1 goto +12 |
0xdd | jsle | jsle | if r0 s<= 42 goto +12 |
Call class
The call-related opcodes push/pop the call frame stack and stack pointer.
The call frame stack is a protected data structure that can only be accessed by the call class (comparable to x86 shadow stacks).
Opcode | rbpf | Capstone | LLVM | Notes |
---|---|---|---|---|
0x85 | call | call | call 0x1234 | |
0x8D | callx | callx | callx r3 |
Not part of kernel eBPF
Register idx in imm field |
0x95 | exit | exit | exit |
call
saves a call frame and enters a syscall or jumps to a target indicated by the immediate.callx
saves a call frame and jumps to the absolute address in the given register. The register index is stored in the immediate field of the instruction.exit
restores a call frame and jumps to its return address.
Resolving call targets
The 32-bit immediate field of the call
opcode contains the hash of target symbol name or syscall name. The VM resolves a call target hash using two immutable lookup maps for syscalls and jump targets, which are constructed on program load. Syscalls take precedence over jump targets. The VM aborts when a hash cannot be resolved.
The hash algorithm is Murmur3 in 32-bit mode on the UTF-8 encoding of the symbol name.
See next post for the syscalls available in Sealevel.

