Research Topic: Verified Code Generation



Lars Hupel ORCID

CakeML is a functional programming language with a proven-correct compiler and runtime system. This entry contains an unofficial version of the CakeML semantics that has been exported from the Lem specifications to Isabelle. Additionally, there are some hand-written theory files that adapt the exported code to Isabelle and port proofs from the HOL4 formalization, e.g. termination and equivalence proofs.

A Verified Compiler from Isabelle/HOL to CakeML

Lars Hupel ORCID Tobias Nipkow ORCID
European Symposium on Programming (ESOP, Open Access), 2018
The final publication is available at Springer Link.

Many theorem provers can generate functional programs from definitions or proofs. However, this code generation needs to be trusted. Except for the HOL4 system, which has a proof producing code generator for a subset of ML. We go one step further and provide a verified compiler from Isabelle/HOL to CakeML. More precisely we combine a simple proof producing translation of recursion equations in Isabelle/HOL into a deeply embedded term language with a fully verified compilation chain to the target language CakeML.

Dictionary Construction

Lars Hupel ORCID
Archive of Formal Proofs, 2017

Isabelle's code generator natively supports type classes. For targets that do not have language support for classes and instances, it performs the well-known dictionary translation, as described by Haftmann and Nipkow. This translation happens outside the logic, i.e., there is no guarantee that it is correct, besides the pen-and-paper proof. This work implements a certified dictionary translation that produces new class-free constants and derives equality theorems.

Constructor Functions

Lars Hupel ORCID
Archive of Formal Proofs, 2017

Isabelle's code generator performs various adaptations for target languages. Among others, constructor applications have to be fully saturated. That means that for constructor calls occuring as arguments to higher-order functions, synthetic lambdas have to be inserted. This entry provides tooling to avoid this construction altogether by introducing constructor functions.

Lazifying case constants

Lars Hupel ORCID
Archive of Formal Proofs, 2017

Isabelle's code generator performs various adaptations for target languages. Among others, case statements are printed as match expressions. Internally, this is a sophisticated procedure, because in HOL, case statements are represented as nested calls to the case combinators as generated by the datatype package. Furthermore, the procedure relies on laziness of match expressions in the target language, i.e., that branches guarded by patterns that fail to match are not evaluated. Similarly, if-then-else is printed to the corresponding construct in the target language. This entry provides tooling to replace these special cases in the code generator by ignoring these target language features, instead printing case expressions and if-then-else as functions.


Development of a verified code generator from Isabelle/HOL to CakeML, a verified subset of Standard ML. The vision of this project is to provide an alternative (or extension) to the current code generation facility that reduces the trusted code base.


What is the trusted code base of the pipeline?
We rely on faithful export from Lem to Isabelle, an unverified printer of CakeML AST to CakeML source text, and the kernel of Isabelle. There is ongoing work to use OpenTheory to get a more faithful representation of the CakeML formalization in Isabelle.
Does the pipeline target existing CakeML library constructs, like the built-in lists?
No, it does not, and it is not intended at the moment.
Does the pipeline target existing CakeML native constructs, like machine integers?
No, it does not, but it is intended for the future, and actively being worked on.
Does that mean that arithmetic on natural numbers in Isabelle is mapped to unary in CakeML?
That is accurate: nats are treated as a proper datatype with Zero and Suc constructors.
How are (8-bit) characters treated?
The most naive possible translation, 256-way enumerations, would lead to very large code sizes. Instead, we translate characters to bytes, i.e. 8-tuples of bools. As of Isabelle2018, this will become the default of code generation in Isabelle, including other (unverified) targets.
Is the generated code guaranteed to terminate?
The current Isabelle code generator only guarantees partial correctness (which has been proved on paper). While our work is designed for total correctness (all functions that are exported are internally proved to be terminating), we have only proved partial correctness so far. The CakeML compiler team has proved total correctness.
Why isn’t there a full compilation pipeline in the AFP?
We’re working on it. Some parts of it are already available as stand-alone entries, and we want to make sure it’s published in modular parts. In the meantime, use the packaged version that accompanies the main paper (see below).

Supplementary Material