The same program, every language
The same program, on the left in formal and on the right in each language: switch between Hello World! and the fannkuch-redux benchmark with the program tabs, and between languages with the row above. The point is underneath the source. Counting the actual machine instructions and bytes, formal compiles to far less than the others, because it ships no runtime, no libc, and no garbage collector beneath it; the versions opposite carry a runtime, and that runtime still ships. fannkuch goes one further: its result is proven correct at compile time (the verifier checks the max flip count and checksum against the known answer), so the program need only compute and exit.
formal
print("Hello World!\n")
exit(0)
formal new hello_world && cd hello_world
cargo run
grep -cE '^ [a-z]' build/main.s
46 instructions
1,680
bytes
Rust
fn main() {
println!("Hello World!");
}
cargo new hw && cd hw
echo 'fn main() { println!("Hello World!"); }' > src/main.rs
CARGO_PROFILE_RELEASE_OPT_LEVEL=z CARGO_PROFILE_RELEASE_LTO=true CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 CARGO_PROFILE_RELEASE_PANIC=abort CARGO_PROFILE_RELEASE_STRIP=true RUSTFLAGS='-Zlocation-detail=none -Zfmt-debug=none -Zunstable-options -Cpanic=immediate-abort' cargo +nightly build --release --target x86_64-unknown-linux-musl -Zbuild-std=std,panic_abort -Zbuild-std-features=
objdump -d target/x86_64-unknown-linux-musl/release/hw | grep -cE '^[[:space:]]+[0-9a-f]+:'
4,562 instructions
26,720 bytes
Why
- Proves your assertions, doesn't trust you. A fail marker is a theorem: the compiler enumerates every hart interleaving and rejects the program unless the marker is unreachable on all of them. Provably-correct racy code, which Rust's borrow checker forbids and SPARK's tasking model excludes, verifies here.
- Infallible type inference. Leave a type as _ and the compiler searches type and locality assignments, accepting one iff it makes the whole program verify. If any typing works, it is found: inference and verification are the same search, so there is no "couldn't infer, add an annotation" failure mode. (Scalar types are searched; list/union types are infinite, so they are written explicitly.)
- The memory layout is an output, not an input. No .data/.bss in the source: the verifier discovers each variable's type and locality and emits the sections for you.
- Every memory access must be provably safe: inside its variable's inferred type, or inside a raw region with the right permissions: one the target system describes (e.g. MMIO, like the UART below) or one the program declares with section. An allocator declares each allocation as it hands it out; an access nothing describes makes the program invalid.
- It verifies the machine code, not your source: there is no compiler-trust gap between what was proven and what runs.
- The proof shrinks the binary. Dead-data elimination: bytes the proof shows no runtime path can touch are deleted from the output, with the program's address arithmetic rewritten in lockstep, so information consumed only at compile time simply doesn't exist at runtime.
Supports RISC-V ✓ · Doesn't support aarch64 ? · Won't support x86-64 ✗
End to end: source to booted RISC-V
This is the project's real end-to-end test (tests/uart_hello/). The source races two harts on an inferred global, has hart 0 check the message's runtime type descriptor (the program inspects its own inferred types), then prints over the UART. Note what the source does not contain: an entry point (execution starts at the first line, like Python), a data section, or most of the types.
You write input.hl
# Do racy arithmetic with an undefined variable.
# Use global locality so the load/stores are racy.
value: global _
t0 = &value
# Set to 0
t1 = 0
t0[0:4] = t1
# Do non-atomic addition
t1 = t0[0:4]
t1 = t1 + 1
t0[0:4] = t1
# The proof obligation: value < 4 on every interleaving (`require` lowers to
# a branch over `#!`, the `fail` marker).
t1 = t0[0:4]
t2 = 4
require t1 < t2
# Use hart 0 to output a message; every other hart skips straight to the
# `wfi` at the end.
t0 = csr(mhartid)
if t0 == 0:
welcome: _ [u8]*13
# H e l l o W o r l d ! 0
# Type exploration doesn't explore list types since they are infinite, so
# to define a list a user must define it manually.
# In this case we leave the locality as unspecified which will default to `thread`.
# Declare string
# Get address of type structure
t0 = type(welcome)
# Check variable is list
t2 = 8 # Load list type number
t1 = t0[0:8] # Load type type number
require t1 == t2
# Check list length
t0 = t0 + 16 # Increment address to point at length
t1 = t0[0:8] # Load length
t2 = 13 # Load desired length
require t1 == t2
# Check all values in list are u8
t0 = t0 - 8 # Decrement address to point at list address
t0 = t0[0:8] # Load list address
t5 = 0 # Use t5 as the counter
while t5 != t2:
t3 = t0[0:8] # Load type of list item
t4 = 0 # Load byte type number
require t3 == t4
t0 = t0 + 25 # Increment list item address (8+8+8+1)
t5 = t5 + 1 # Increment the count
# Set string
t0 = &welcome
t1 = 72 # H
t0[0:1] = t1
t1 = 101 # e
t0[1:2] = t1
t1 = 108 # l
t0[2:3] = t1
t1 = 108 # l
t0[3:4] = t1
t1 = 111 # o
t0[4:5] = t1
t1 = 32 # space
t0[5:6] = t1
t1 = 87 # W
t0[6:7] = t1
t1 = 111 # o
t0[7:8] = t1
t1 = 114 # r
t0[8:9] = t1
t1 = 108 # l
t0[9:10] = t1
t1 = 100 # d
t0[10:11] = t1
t1 = 33 # !
t0[11:12] = t1
t1 = 0 # NUL terminator
t0[12:13] = t1
# Output message
a0 = &welcome # Load message address
t1 = 0x10000000 # Load character device address
t2 = a0[0:1] # Load the first character
while t2 != 0:
t1[0:1] = t2 # Store character into character device
a0 = a0 + 1 # Add 1 message address
t2 = a0[0:1] # Load next character
asm:
wfi
unreachable
It proves, optimizes & emits
.global _start
_start:
#$ value global _
la t0, value
li t1, 0
sw t1, 0(t0)
lw t1, 0(t0)
addi t1, t1, 1
sw t1, 0(t0)
lw t1, 0(t0)
li t2, 4
blt t1, t2, _l0
_l0:
csrr t0, mhartid
bnez t0, _l1
#$ welcome _ [u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8]
la t0, __welcome_type # #& t0, welcome
li t2, 8
ld t1, 0(t0)
beq t1, t2, _l2
_l2:
addi t0, t0, 16
ld t1, 0(t0)
li t2, 13
beq t1, t2, _l3
_l3:
addi t0, t0, -8
ld t0, 0(t0)
li t5, 0
_l4:
beq t5, t2, _l5
ld t3, 0(t0)
li t4, 0
beq t3, t4, _l6
_l6:
addi t0, t0, 8
addi t5, t5, 1
j _l4
_l5:
la t0, welcome
li t1, 72
sb t1, 0(t0)
li t1, 101
sb t1, 1(t0)
li t1, 108
sb t1, 2(t0)
li t1, 108
sb t1, 3(t0)
li t1, 111
sb t1, 4(t0)
li t1, 32
sb t1, 5(t0)
li t1, 87
sb t1, 6(t0)
li t1, 111
sb t1, 7(t0)
li t1, 114
sb t1, 8(t0)
li t1, 108
sb t1, 9(t0)
li t1, 100
sb t1, 10(t0)
li t1, 33
sb t1, 11(t0)
li t1, 0
sb t1, 12(t0)
la a0, welcome
li t1, 0x10000000
lb t2, 0(a0)
_l7:
beqz t2, _l8
sb t2, 0(t1)
addi a0, a0, 1
lb t2, 0(a0)
j _l7
_l8:
_l1:
wfi
j __halt # unreachable (program end)
__halt:
wfi
j __halt
.section .data
__welcome_type:
.dword 8 # List
.dword __welcome_subtypes # subtypes
.dword 13 # length
__welcome_subtypes:
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.dword 0
.section .bss
.balign 8
value:
.zero 4
.balign 8
welcome:
.zero 13
translate (one statement, one instruction) → parse → verify (2,026,160 machine states across 1- and 2-hart systems) → infer value: u32, welcome: thread [u8 × 13] → remove dead code & never-taken branches → compact the layout → emit
The emitted program assembles with the RISC-V GNU toolchain, boots under qemu-system-riscv64 -machine virt, and prints Hello World!. Two things to notice in the output:
- The descriptor records shrank. In memory the verifier models each type record as 25 bytes ([u64 type, u64 subtypes-ptr, u64 length, u8 locality]), and the source walks them with t0 = t0 + 25. The program only ever reads each record's type number, so the emitted __welcome_subtypes is 13 .dwords (104 bytes, not 325), and the stride was rewritten to addi t0, t0, 8 to match. The locality byte, read only by the verifier, exists nowhere in the binary.
- The race was proven, not forbidden. Both harts run the non-atomic increment of value; the proof covers every interleaving and shows the value < 4 assertion holds in all of them.
$ cargo nt uart_hello
PASS [25.001s] formal::uart_hello uart_hello
The language
Two layers, kept deliberately close. You write a Python-like surface language in which every simple statement lowers to exactly one RISC-V instruction or directive, and if/while/require each lower to a small fixed pattern of branches plus generated labels (no register allocation, no hidden control flow, the C-like cost model where the price of a line is visible at the call site), with inline assembly always one asm: block away:
value: global _ -> #$ value global _ define (locality/type; _ = infer)
welcome: _ [u8]*13 -> #$ welcome _ [u8 ... u8] list types: [u8, u8] or [u8]*n
t0 = &value -> la t0, value address of a variable
t0 = type(welcome) -> #& t0, welcome address of the runtime type descriptor
t0 = csr(mhartid) -> csrr t0, mhartid which hart am I?
t1 = 0x10000000 -> li t1, 0x10000000 load immediate
t1 = t1 + 1 -> addi t1, t1, 1 register +/- immediate
t0[0:4] = t1 -> sw t1, 0(t0) store a byte slice (1 = sb, 4 = sw)
t1 = t0[8:16] -> ld t1, 8(t0) load a byte slice (1 = lb, 4 = lw, 8 = ld)
section 0x100 0x200 rw -> #@ 0x100 0x200 rw declare an accessible region
fail -> #! must be proven unreachable
unreachable -> #? end of execution
asm: -> (each indented line inline assembly
wfi emitted verbatim)
require t1 < t2 # fail unless t1 < t2 (one-line `if not cond: fail`)
if t0 == 0: # run the block when the condition holds
t1 = t0[0:8]
while t5 != t2: # top-tested loop
t5 = t5 + 1
Underneath sits the verification target: RISC-V assembly plus five directives; every directive starts with #, the RISC-V comment character, so any formal program is also a plain assembly file:
| Directive | Surface form | Meaning |
|---|---|---|
| #! | fail | An assertion the compiler must prove unreachable on every interleaving. |
| #? | unreachable | End of execution: the hart that reaches this point halts. |
| #$ x global u32 | x: global u32 | Declare a variable's locality and/or type; _ means "infer it for me". |
| #& t0, x | t0 = type(x) | Load the address of a variable's runtime type descriptor. Programs can inspect their own inferred types. |
| #@ 100 200 rw | section 100 200 rw | Declare a memory region the program may access (bounds may be registers: an allocator declares each allocation as it makes it). |
How it compares
The long version, with the hard cases argued honestly, is in comparison.md. The short version:
- vs Zig's comptime: concrete execution is the special case of symbolic execution where every value is exactly known. comptime evaluates the known at compile time; formal evaluates the unknown: ranges, all interleavings, all type assignments.
- vs Rust's borrow checker: the borrow checker is a fast, conservative approximation of the race-freedom question; formal answers the question itself, exactly. And you can still opt into borrow-checker economics by building a verified Rc whose invariant is a fail the proof discharges.
- vs Lean: Lean proves anything, manually, about a model; formal proves a fixed property class, automatically, about the actual machine code.
- vs Ada/SPARK (the closest peer): SPARK is deductive, modular, contract-driven and industrial; formal is operational, whole-program, contract-free and exhaustive: it verifies the racy code SPARK's model rejects, at exponential cost.
That cost is explicit: O(n · hr · 2b · 8v) (instructions × racy-interleavings × indeterminate branches × unspecified variables). The design lever is reducing the exponents (keep shared mutation rare, annotate what you know); planned compiler modes trade them away explicitly.
Where this is going
A bare-metal OS running DOOM. A bare-metal Tor node. A bare-metal web server. A bare-metal hypervisor under a serverless platform. All proven, all the way down: because the verifier works on the machine code, there is nothing underneath it to trust. Nearer term: growing the Python-like surface (it doesn't have a name yet) while keeping its near-one-to-one translation, so the cost model never goes opaque.