A verifying compiler for bare-metal RISC-V: every hart interleaving, every admissible type, proven at compile time. You write Python-like statements that map nearly one-to-one onto RISC-V instructions, with the memory layout left implicit; the compiler accepts the program only if it can prove, by symbolically executing the machine code itself, that no assertion fails and no memory access is unsafe on any hardware-thread interleaving, for any input, under the types it infers for you. The proof's by-products then build the binary.

🚧 Experimental: anything and everything might not work and much isn't yet implemented 🚧

The same program, every language

The same program, on the left in formal and on the right in each language: switch between Hello World! and the fannkuch-redux benchmark with the program tabs, and between languages with the row above. The point is underneath the source. Counting the actual machine instructions and bytes, formal compiles to far less than the others, because it ships no runtime, no libc, and no garbage collector beneath it; the versions opposite carry a runtime, and that runtime still ships. fannkuch goes one further: its result is proven correct at compile time (the verifier checks the max flip count and checksum against the known answer), so the program need only compute and exit.

formal
print("Hello World!\n")
exit(0)
formal new hello_world && cd hello_world
cargo run
grep -cE '^    [a-z]' build/main.s

46 instructions
1,680 bytes

Rust
fn main() {
    println!("Hello World!");
}
cargo new hw && cd hw
echo 'fn main() { println!("Hello World!"); }' > src/main.rs
CARGO_PROFILE_RELEASE_OPT_LEVEL=z CARGO_PROFILE_RELEASE_LTO=true CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1 CARGO_PROFILE_RELEASE_PANIC=abort CARGO_PROFILE_RELEASE_STRIP=true RUSTFLAGS='-Zlocation-detail=none -Zfmt-debug=none -Zunstable-options -Cpanic=immediate-abort' cargo +nightly build --release --target x86_64-unknown-linux-musl -Zbuild-std=std,panic_abort -Zbuild-std-features=
objdump -d target/x86_64-unknown-linux-musl/release/hw | grep -cE '^[[:space:]]+[0-9a-f]+:'

4,562 instructions
26,720 bytes

Why

Supports RISC-V ✓  Â·  Doesn't support aarch64 ?  Â·  Won't support x86-64 ✗

End to end: source to booted RISC-V

This is the project's real end-to-end test (tests/uart_hello/). The source races two harts on an inferred global, has hart 0 check the message's runtime type descriptor (the program inspects its own inferred types), then prints over the UART. Note what the source does not contain: an entry point (execution starts at the first line, like Python), a data section, or most of the types.

You write input.hl
# Do racy arithmetic with an undefined variable.
# Use global locality so the load/stores are racy.
value: global _
t0 = &value

# Set to 0
t1 = 0
t0[0:4] = t1

# Do non-atomic addition
t1 = t0[0:4]
t1 = t1 + 1
t0[0:4] = t1

# The proof obligation: value < 4 on every interleaving (`require` lowers to
# a branch over `#!`, the `fail` marker).
t1 = t0[0:4]
t2 = 4
require t1 < t2

# Use hart 0 to output a message; every other hart skips straight to the
# `wfi` at the end.
t0 = csr(mhartid)
if t0 == 0:
    welcome: _ [u8]*13
    # H  e  l  l  o     W  o  r  l  d  !  0
    # Type exploration doesn't explore list types since they are infinite, so
    # to define a list a user must define it manually.
    # In this case we leave the locality as unspecified which will default to `thread`.

    # Declare string
    # Get address of type structure
    t0 = type(welcome)

    # Check variable is list
    t2 = 8  # Load list type number
    t1 = t0[0:8]  # Load type type number
    require t1 == t2

    # Check list length
    t0 = t0 + 16  # Increment address to point at length
    t1 = t0[0:8]  # Load length
    t2 = 13  # Load desired length
    require t1 == t2

    # Check all values in list are u8
    t0 = t0 - 8  # Decrement address to point at list address
    t0 = t0[0:8]  # Load list address

    t5 = 0  # Use t5 as the counter
    while t5 != t2:
        t3 = t0[0:8]  # Load type of list item
        t4 = 0  # Load byte type number
        require t3 == t4
        t0 = t0 + 25  # Increment list item address (8+8+8+1)
        t5 = t5 + 1  # Increment the count

    # Set string
    t0 = &welcome
    t1 = 72  # H
    t0[0:1] = t1
    t1 = 101  # e
    t0[1:2] = t1
    t1 = 108  # l
    t0[2:3] = t1
    t1 = 108  # l
    t0[3:4] = t1
    t1 = 111  # o
    t0[4:5] = t1
    t1 = 32  # space
    t0[5:6] = t1
    t1 = 87  # W
    t0[6:7] = t1
    t1 = 111  # o
    t0[7:8] = t1
    t1 = 114  # r
    t0[8:9] = t1
    t1 = 108  # l
    t0[9:10] = t1
    t1 = 100  # d
    t0[10:11] = t1
    t1 = 33  # !
    t0[11:12] = t1
    t1 = 0  # NUL terminator
    t0[12:13] = t1

    # Output message
    a0 = &welcome  # Load message address
    t1 = 0x10000000  # Load character device address
    t2 = a0[0:1]  # Load the first character
    while t2 != 0:
        t1[0:1] = t2  # Store character into character device
        a0 = a0 + 1  # Add 1 message address
        t2 = a0[0:1]  # Load next character

asm:
    wfi
unreachable
It proves, optimizes & emits
.global _start
_start:
    #$ value global _
    la t0, value
    li t1, 0
    sw t1, 0(t0)
    lw t1, 0(t0)
    addi t1, t1, 1
    sw t1, 0(t0)
    lw t1, 0(t0)
    li t2, 4
    blt t1, t2, _l0
_l0:
    csrr t0, mhartid
    bnez t0, _l1
    #$ welcome _ [u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8 u8]
    la t0, __welcome_type  # #& t0, welcome
    li t2, 8
    ld t1, 0(t0)
    beq t1, t2, _l2
_l2:
    addi t0, t0, 16
    ld t1, 0(t0)
    li t2, 13
    beq t1, t2, _l3
_l3:
    addi t0, t0, -8
    ld t0, 0(t0)
    li t5, 0
_l4:
    beq t5, t2, _l5
    ld t3, 0(t0)
    li t4, 0
    beq t3, t4, _l6
_l6:
    addi t0, t0, 8
    addi t5, t5, 1
    j _l4
_l5:
    la t0, welcome
    li t1, 72
    sb t1, 0(t0)
    li t1, 101
    sb t1, 1(t0)
    li t1, 108
    sb t1, 2(t0)
    li t1, 108
    sb t1, 3(t0)
    li t1, 111
    sb t1, 4(t0)
    li t1, 32
    sb t1, 5(t0)
    li t1, 87
    sb t1, 6(t0)
    li t1, 111
    sb t1, 7(t0)
    li t1, 114
    sb t1, 8(t0)
    li t1, 108
    sb t1, 9(t0)
    li t1, 100
    sb t1, 10(t0)
    li t1, 33
    sb t1, 11(t0)
    li t1, 0
    sb t1, 12(t0)
    la a0, welcome
    li t1, 0x10000000
    lb t2, 0(a0)
_l7:
    beqz t2, _l8
    sb t2, 0(t1)
    addi a0, a0, 1
    lb t2, 0(a0)
    j _l7
_l8:
_l1:
    wfi
    j __halt  # unreachable (program end)
__halt:
    wfi
    j __halt

.section .data
__welcome_type:
    .dword 8                # List
    .dword __welcome_subtypes      # subtypes
    .dword 13                # length
__welcome_subtypes:
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0
    .dword 0

.section .bss
    .balign 8
value:
    .zero 4
    .balign 8
welcome:
    .zero 13

translate (one statement, one instruction) → parse → verify (2,026,160 machine states across 1- and 2-hart systems) → infer value: u32, welcome: thread [u8 × 13] → remove dead code & never-taken branches → compact the layout → emit

The emitted program assembles with the RISC-V GNU toolchain, boots under qemu-system-riscv64 -machine virt, and prints Hello World!. Two things to notice in the output:

$ cargo nt uart_hello
PASS [25.001s] formal::uart_hello uart_hello

The language

Two layers, kept deliberately close. You write a Python-like surface language in which every simple statement lowers to exactly one RISC-V instruction or directive, and if/while/require each lower to a small fixed pattern of branches plus generated labels (no register allocation, no hidden control flow, the C-like cost model where the price of a line is visible at the call site), with inline assembly always one asm: block away:

value: global _              ->  #$ value global _        define (locality/type; _ = infer)
welcome: _ [u8]*13           ->  #$ welcome _ [u8 ... u8]  list types: [u8, u8] or [u8]*n
t0 = &value                  ->  la t0, value              address of a variable
t0 = type(welcome)           ->  #& t0, welcome            address of the runtime type descriptor
t0 = csr(mhartid)            ->  csrr t0, mhartid          which hart am I?
t1 = 0x10000000              ->  li t1, 0x10000000         load immediate
t1 = t1 + 1                  ->  addi t1, t1, 1            register +/- immediate
t0[0:4] = t1                 ->  sw t1, 0(t0)              store a byte slice (1 = sb, 4 = sw)
t1 = t0[8:16]                ->  ld t1, 8(t0)              load a byte slice (1 = lb, 4 = lw, 8 = ld)
section 0x100 0x200 rw       ->  #@ 0x100 0x200 rw         declare an accessible region
fail                         ->  #!                        must be proven unreachable
unreachable                  ->  #?                        end of execution
asm:                         ->  (each indented line       inline assembly
    wfi                          emitted verbatim)

require t1 < t2              # fail unless t1 < t2  (one-line `if not cond: fail`)
if t0 == 0:                 # run the block when the condition holds
    t1 = t0[0:8]
while t5 != t2:             # top-tested loop
    t5 = t5 + 1

Underneath sits the verification target: RISC-V assembly plus five directives; every directive starts with #, the RISC-V comment character, so any formal program is also a plain assembly file:

Directive Surface form Meaning
#! fail An assertion the compiler must prove unreachable on every interleaving.
#? unreachable End of execution: the hart that reaches this point halts.
#$ x global u32 x: global u32 Declare a variable's locality and/or type; _ means "infer it for me".
#& t0, x t0 = type(x) Load the address of a variable's runtime type descriptor. Programs can inspect their own inferred types.
#@ 100 200 rw section 100 200 rw Declare a memory region the program may access (bounds may be registers: an allocator declares each allocation as it makes it).

How it compares

The long version, with the hard cases argued honestly, is in comparison.md. The short version:

That cost is explicit: O(n · hr · 2b · 8v) (instructions × racy-interleavings × indeterminate branches × unspecified variables). The design lever is reducing the exponents (keep shared mutation rare, annotate what you know); planned compiler modes trade them away explicitly.

Where this is going

A bare-metal OS running DOOM. A bare-metal Tor node. A bare-metal web server. A bare-metal hypervisor under a serverless platform. All proven, all the way down: because the verifier works on the machine code, there is nothing underneath it to trust. Nearer term: growing the Python-like surface (it doesn't have a name yet) while keeping its near-one-to-one translation, so the cost model never goes opaque.