~bpf.sol | 0x03: SBF Instruction Set

All Solana Bytecode Format (SBF) opcodes

Sources

The family of eBPF ISAs lacks a specification document as you may know it from other architectures. Instead, each flavor of eBPF is defined by its VM implementation. As such, Solana's SBF is currently defined by the solana_rbpf Rust crate.

All info below was compiled together from these sources:

Note: This source is not authoritative. It may be outdated or conflict with actual on-chain behavior. If you find any such bugs, please consider sending corrections. And if this document helps you build software, please attribute this source – thanks!

Assembler

Several flavors of SBF disassembly exist. Unfortunately, they are not mutually compatible.

  • rbpf-style: prefix notation with C-style indirect addressing.
    Seen in rbpf-cli and tools based on the Rust implementation of Solana.
  • Capstone-style: similar to rbpf-style with different ALU mnemonics.
    Seen in binary analysis tools like radare2.
  • LLVM-style: infix notation resembling C pseudocode.
    Seen with LLVM tools used in the build process.

Opcode Tables

Legacy Load/Store class

The BPF_LD opcode family consists of special purpose opcodes inherited from Linux eBPF where they were used to access network packet input data.

Opcode rbpf Capstone LLVM Notes
0x18 lddw lddw r1 = 0x42 ll Two insns long
0x30 ldabsb r1 = *(u8  *)skb[3] Deprecated
0x28 ldabsh r1 = *(u16 *)skb[3] Deprecated
0x20 ldabsw r1 = *(u32 *)skb[3] Deprecated
0x38 ldabsdw r1 = *(u64 *)skb[3] Deprecated
0x50 ldindb r1 = *(u8 *)skb[r0] Deprecated
0x48 ldindh r1 = *(u16 *)skb[r0] Deprecated
0x40 ldindw r1 = *(u32 *)skb[r0] Deprecated
0x58 ldinddw r1 = *(u64 *)skb[r0] Deprecated

ldabs* used to access instruction input data starting at 0x4_0000_0000 and ldind* was the indirect version thereof. These instructions are disabled now and have been replaced by the new load/store class.

Only lddw remains enabled on mainnet today. It moves a 64-bit immediate into a GPR. This makes it more of a 64-bit variant of mov32 than an actual load instruction. (Note that mov64 with an immediate is a synonym to mov32 – the name is misleading).

Load/Store class

The BPF_LDX and BPF_STX classes provide common memory operations.
Memory is referenced using a base address from a GPR and an immediate offset.

Three operation groups are available:

  • ldx*: Load value from memory into register (8-bit, 16-bit, 32-bit, 64-bit)
  • st*: Store immediate value into memory (8-bit, 16-bit, 32-bit, 32-bit zero extended to 64-bit)
  • stx*: Store value from register into memory (8-bit, 16-bit, 32-bit, 64-bit)
Opcode rbpf Capstone LLVM Notes
0x71 ldxb ldxb r1 = *(u8 *)(r2 + 42)
0x69 ldxh ldxh r1 = *(u16 *)(r2 + 42)
0x61 ldxw ldxw r1 = *(u32 *)(r2 + 42)
0x79 ldxdw ldxdw r1 = *(u64 *)(r2 + 42)
0x72 stb stb *(u8 *)(r2 + 42) = 69
0x6a sth sth *(u16 *)(r2 + 42) = 69
0x62 stw stw *(u32 *)(r2 + 42) = 69
0x7a stdw stdw *(u64 *)(r2 + 42) = 69 Immediate is 32-bit
0x73 stxb stxb *(u8 *)(r2 + 42) = r1
0x6b stxh stxh *(u16 *)(r2 + 42) = r1
0x63 stxw stxw *(u32 *)(r2 + 42) = r1
0x7b stxdw stxdw *(u64 *)(r2 + 42) = r1

64-bit ALU class

The 64-bit ALU instructions operate on general-purpose registers. The stack pointer (r10) can only be used as a source operand.

Each operation has two forms:

  • rD ← OP(rD, imm)
  • rD ← OP(rD, rS)
Opcode rbpf Capstone LLVM Notes
0x07 add64 add64 r1 += 0x42
0x0f add64 add64 r1 += r2
0x17 sub64 sub64 r1 -= 0x42
0x1f sub64 sub64 r1 -= r2
0x27 mul64 mul64 r1 *= 0x42
0x2f mul64 mul64 r1 *= r2
0x37 div64 div64 r1 /= 0x42
0x3f div64 div64 r1 /= r2
0x47 or64 or64 r1 |= 0x42
0x4f or64 or64 r1 |= r2
0x57 and64 and64 r1 &= 0x42
0x5f and64 and64 r1 &= r2
0x67 lsh64 lsh64 r1 <<= 0x42
0x6f lsh64 lsh64 r1 <<= r2
0x77 rsh64 rsh64 r1 >>= 0x42
0x7f rsh64 rsh64 r1 >>= r2
0x87 neg64 neg64 r1 = -r1
0x97 mod64 mod64 r1 %= 0x42
0x9f mod64 mod64 r1 %= r2
0xa7 xor64 xor64 r1 ^= 0x42
0xaf xor64 xor64 r1 ^= r2
0xb7 mov64 mov64 r1 = 0x42 same as mov32
0xbf mov64 mov64 r1 = r2
0xc7 arsh64 arsh64
0xcf arsh64 arsh64
0xe7 sdiv64
0xef sdiv64

32-bit ALU class

The 32-bit ALU instructions mostly follow their 64-bit counterparts. They operate on the lower word of the input registers. The upper half of destination registers gets implicitly zeroed.

Opcode rbpf Capstone LLVM
0x04 add32 add w1 += 0x42
0x0c add32 add w1 += w2
0x14 sub32 sub w1 -= 0x42
0x1c sub32 sub w1 -= w2
0x24 mul32 mul w1 *= 0x42
0x3c mul32 mul w1 *= w2
0x34 div32 div w1 /= 0x42
0x3c div32 div w1 /= w2
0x44 or32 or w1 |= 0x42
0x4c or32 or w1 |= w2
0x54 and32 and w1 &= 0x42
0x5c and32 and w1 &= w2
0x64 lsh32 lsh w1 <<= 0x42
0x6c lsh32 lsh w1 <<= w2
0x74 rsh32 rsh w1 >>= 0x42
0x7c rsh32 rsh w1 >>= w2
0x84 neg32 neg w1 = -w1
0x94 mod32 mod w1 %= 0x42
0x9c mod32 mod w1 %= w2
0xa4 xor32 xor w1 ^= 0x42
0xac xor32 xor w1 ^= w2
0xb4 mov32 mov w1 = 0x42
0xbc mov32 mov w1 = w2
0xc4 arsh32 arsh w1 s>>= 0x42
0xcc arsh32 arsh w1 s>>= w2
0xe4 sdiv32
0xec sdiv32

Endian ALU extension

Opcode rbpf Capstone LLVM Notes
0xd4 le{n} le{n} r1 = le{n} r1 Basically a mask
0xdc be{n} be{n} r1 = be{n} r1 Swaps endianness

The LE/BE instructions operate in 16-bit, 32-bit, or 64-bit mode, indicated by the values in the immediate field.
For example, 0xdc with destination register 1 and immediate 32 is be32 r1.
They were used for portable endianness conversions. Since Solana is always little-endian, only the BE instruction swaps bytes.

  • le16 rD is equivalent to rD &= 0xFFFF.
  • le32 rD is equivalent to rD &= 0xFFFF_FFFF.
  • le64 rD is a nop.
  • be16 rD swaps the lower 2 bytes and zeroes the upper 6.
  • be32 rD reverses the order of the lower 4 bytes and zeros the upper 4.
  • be64 rD reverses the order of all 8 bytes.

Jump class

The jump instructions combine comparisons and conditional jumps. Compared to x86 or PowerPC, this simplifies the ISA by removing the condition register and using less opcodes.

The conditional jump instructions (all except ja) compare a register either against another register or an immediate value.

Opcode rbpf Capstone LLVM
0x05 ja jmp goto +12
0x15 jeq jeq if r0 == r1 goto +12
0x1d jeq jeq if r0 == 42 goto +12
0x25 jgt jgt if r0 > r1 goto +12
0x2d jgt jgt if r0 > 42 goto +12
0x35 jge jge if r0 >= r1 goto +12
0x3d jge jge if r0 >= 42 goto +12
0x45 jset jset if r0 & r1 != 0 goto +12
0x4d jset jset if r0 & 42 != 0 goto +12
0x55 jne jne if r0 != r1 goto +12
0x5d jne jne if r0 != 42 goto +12
0x65 jsgt jsgt if r0 s> r1 goto +12
0x6d jsgt jsgt if r0 s> 42 goto +12
0x75 jsge jsge if r0 s>= r1 goto +12
0x7d jsge jsge if r0 s>= 42 goto +12
0xa5 jlt jlt if r0 < r1 goto +12
0xad jlt jlt if r0 < 42 goto +12
0xb5 jle jle if r0 <= r1 goto +12
0xbd jle jle if r0 <= 42 goto +12
0xc5 jslt jslt if r0 s< r1 goto +12
0xcd jslt jslt if r0 s< 42 goto +12
0xd5 jsle jsle if r0 s<= r1 goto +12
0xdd jsle jsle if r0 s<= 42 goto +12

Call class

The call-related opcodes push/pop the call frame stack and stack pointer.
The call frame stack is a protected data structure that can only be accessed by the call class (comparable to x86 shadow stacks).

Opcode rbpf Capstone LLVM Notes
0x85 call call call 0x1234
0x8D callx callx callx r3 Not part of kernel eBPF
Register idx in imm field
0x95 exit exit exit
  • call saves a call frame and enters a syscall or jumps to a target indicated by the immediate.
  • callx saves a call frame and jumps to the absolute address in the given register. The register index is stored in the immediate field of the instruction.
  • exit restores a call frame and jumps to its return address.

Resolving call targets

The 32-bit immediate field of the call opcode contains the hash of target symbol name or syscall name. The VM resolves a call target hash using two immutable lookup maps for syscalls and jump targets, which are constructed on program load. Syscalls take precedence over jump targets. The VM aborts when a hash cannot be resolved.

The hash algorithm is Murmur3 in 32-bit mode on the UTF-8 encoding of the symbol name.

See next post for the syscalls available in Sealevel.

~bpf.sol | 0x04: Sealevel Syscalls
This page lists the syscalls available in Solana’s Sealevel VM. The convention for invoking syscalls follows the SBFv2 ABI for function calls. …

Closing thoughts

Subscribe to bpf.wtf

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe