A Veried Compiler from Isabelle/HOL to
CakeML
Lars Hupel and Tobias Nipkow
Technische Universität München
lars.hupel@tum.de, nipkow@in.tum.de
Abstract. Many theorem provers can generate functional programs from
denitions or proofs. However, this code generation needs to be trusted.
Except for the HOL4 system, which has a proof producing code generator
for a subset of ML. We go one step further and provide a veried com-
piler from Isabelle/HOL to CakeML. More precisely we combine a simple
proof producing translation of recursion equations in Isabelle/HOL into
a deeply embedded term language with a fully veried compilation chain
to the target language CakeML.
Keywords: Isabelle, CakeML, compiler, higher-order term rewriting
1 Introduction
Many theorem provers have the ability to generate executable code in some (typ-
ically functional) programming language from denitions, lemmas and proofs
(e.g. [6,8,9,12,15,26,36]). This makes code generation part of the trusted kernel
of the system. Myreen and Owens [29] closed this gap for the HOL4 system: they
have implemented a tool that translates from HOL4 into CakeML, a subset of
SML, and proves a theorem stating that a result produced by the CakeML code
is correct w.r.t. the HOL functions. They also have a veried implementation of
CakeML [23, 39]. We go one step further and provide a once-and-for-all veried
compiler from (deeply embedded) function denitions in Isabelle/HOL [31, 32]
into CakeML proving partial correctness of the generated CakeML code w.r.t.
the original functions. This is like the step from dynamic to static type checking.
It also means that preconditions on the input to the compiler are explicitly given
in the correctness theorem rather than implicitly by a failing translation. To the
best of our knowledge this is the rst veried (as opposed to certifying) compiler
from function denitions in a logic into a programming language.
Our compiler is composed of multiple phases and in principle applicable to
other languages than Isabelle/HOL or even HOL:
We erase types right away. Hence the type system of the source language is
irrelevant.
We merely assume that the source language has a semantics based on equa-
tional logic.
2
The compiler operates in three stages:
1. The preprocessing phase eliminates features that are not supported by our
compiler. Most importantly, dictionary construction eliminates occurrences
of type classes in HOL terms. It introduces dictionary datatypes and new
constants and proves the equivalence of old and new constants (§7).
2. The deep embedding lifts HOL terms into terms of type term, a HOL model
of HOL terms. For each constant c (of arbitrary type) it denes a constant
c
of type
term
and proves a theorem that expresses equivalence
3).
3. There are multiple compiler phases that eliminate certain constructs from
the term type, until we arrive at the CakeML expression type. Most phases
target a dierent intermediate term type (§5).
The rst two stages are preprocessing, are implemented in ML and produce
certicate theorems. Only these stages are specic to Isabelle. The third (and
main) stage is implemented completely in the logic HOL, without recourse to
ML. Its correctness is veried once and for all. All Isabelle denitions and proofs
can be found in the supplementary material.
2 Related work
There is existing work in the Coq [2, 14] and HOL [29] communities for proof
producing or veried extraction of functions dened in the logic. Anand et al. [2]
present work in progress on a veried compiler from Gallina (Coq’s specication
language) via untyped intermediate languages to CompCert C light. They plan
to connect their extraction routine to the CompCert compiler [25].
Translation of type classes into dictionaries is an important feature of Haskell
compilers. In the setting of Isabelle/HOL, this has been described by Wenzel [43]
and Krauss et al. [22]. Haftmann and Nipkow [16] use this construction to compile
HOL denitions into target languages that do not support type classes, e.g.
Standard ML and OCaml. In this work, we provide a certifying translation that
eliminates type classes inside the logic.
Compilation of pattern matching is well understood in literature [3, 35, 37].
In this work, we contribute a transformation of sets of equations with pattern
matching on the left-hand side into a single equation with nested pattern match-
ing on the right-hand side. This is implemented and veried inside Isabelle.
Besides CakeML, there are many projects for veried compilers for functional
programming languages of various degrees of sophistication and realism (e.g.
[4, 11, 13]). Particularly modular is the work by Neis et al. [30] on a veried
compiler for an ML-like imperative source language. The main distinguishing
feature of our work is that we start from a set of higher-order recursion equations
with pattern matching on the left-hand side rather than a lambda calculus with
pattern matching on the right-hand side. On the other hand we stand on the
shoulders of CakeML which allows us to bypass all complications of machine
code generation. Note that much of our compiler is not specic to CakeML and
that it would be possible to retarget it to, for example, Pilsner abstract syntax
with moderate eort.
3
datatype term =
Const string |
Free string |
Abs term |
Bound nat |
App term term
(a) Abstract syntax of
de Bruijn terms
Step
(lhs, rhs) R match lhs t = Some σ
R t subst σ rhs
Beta
closed t
R (Λt) $ t
t[t
]
Fun
R t t
R t $ u t
$ u
Arg
R u u
R t $ u t $ u
(b) Small-step semantics
Fig. 1: Basic syntax and semantics of the term type
3 Deep embedding
Starting with a HOL denition, we derive a new, reied denition in a deeply
embedded term language depicted in Figure 1a. This term language corresponds
closely to the term datatype of Isabelle’s implementation (using de Bruijn indices
[10]), but without types and schematic variables.
To establish a formal connection between the original and the reied deni-
tions, we use a logical relation, a concept that is well-understood in literature [19]
and can be nicely implemented in Isabelle using type classes. Note that the use
of type classes here is restricted to correctness proofs; it is not required for
the execution of the compiler itself. That way, there is no contradiction to the
elimination of type classes occurring in a previous stage.
Notation We abbreviate App t u to t$u and Abs t to Λ t. Other term types intro-
duced later in this paper use the same conventions. We reserve λ for abstractions
in HOL itself. Typing judgments are written with a double colon: t :: τ .
Embedding operation Embedding is implemented in ML. We denote this oper-
ation using angle brackets: t, where t is an arbitrary HOL expression and the
result t is a HOL value of type term. It is a purely syntactic transformation,
without preliminary evaluation or reduction, and it discards type information.
The following examples illustrate this operation and typographical conventions
concerning variables and constants:
x = Free "x" f = Const "f" λx. f x = Λ (f $ Bound 0)
Small-step semantics Figure 1b species the small-step semantics for term. It is
reminiscent of higher-order term rewriting, and modelled closely after equality in
HOL. The basic idea is that if the proposition t = u can be proved equationally
in HOL (without symmetry), then R t
u holds (where R :: (term ×
term) set). We call R the rule set. It is the result of translating a set of dening
equations lhs = rhs into pairs (lhs , rhs) R.
4
Rule Step performs a rewrite step by picking a rewrite rule from R and
rewriting the term at the root. For that purpose, match and subst are (mostly)
standard rst-order matching and substitution (see §4 for details).
Rule Beta performs β-reduction. Type term represents bound variables by
de Bruijn indices. The notation t[t
] represents the substitution of the outermost
bound variable in t with t
.
Our semantics does not constitute a fully-general higher-order term rewrit-
ing system, because we do not allow substitution under binders. For de Bruijn
terms, this would pose no problem, but as soon as we introduce named bound
variables, substitution under binders requires dealing with capture. To avoid this
altogether, all our semantics expect terms that are substituted into abstractions
to be closed. However, this does not mean that we restrict ourselves to any
particular evaluation order. Both call-by-value and call-by-name can be used
in the small-step semantics. But later on, the target semantics will only use
call-by-value.
Embedding relation
We denote the concept that an embedded term
t
corresponds
to a HOL term a of type τ w.r.t. rule set R with the syntax R t a. If we
want to be explicit about the type, we index the relation:
τ
.
For ground types, this can be dened easily. For example, the following two
rules dene
nat
:
R 0
nat
0
R t
nat
n
R Suc t
nat
Suc n
Denitions of for arbitrary datatypes without nested recursion can be de-
rived mechanically in the same fashion as for nat, where they constitute one-to-
one relations. Note that for ground types, ignores R. The reason why is
parametrized on R will become clear in a moment.
For function types, we follow Myreen and Owen’s approach [29]. The state-
ment R t f can be interpreted as t $ a can be rewritten to f a for
all a. Because this might involve applying a function denition from R, the
relation must be indexed by the rule set. As a notational convenience, we dene
another relation R t x to mean that there is a t
such that R t
t
and
R t
x. Using this notation, we formally dene for functions as follows:
R t f (x t
x
. R t
x
x R t $ t
x
f x)
Example As a running example, we will use the map function on lists:
map f [] = []
map f (x # xs) = f x # map f xs
The result of embedding this function is a set of rules map
:
map’ =
{(Const ”List.list.map” $ Free ”f” $ (Const ”List.list.Cons” $ Free ”x21” $ Free ”x22”),
5
Const ”List.list.Cons” $ (Free ”f” $ Free ”x21”) $ . . .),
(Const ”List.list.map” $ Free ”f” $ Const ”List.list.Nil”,
Const ”List.list.Nil”)}
together with the theorem map
Const "List.list.map" map, which is
proven by simple induction over map. Constant names like "List.list.map"
come from the fully-qualied internal names in HOL.
The induction principle for the proof arises from the use of the fun command
that is used to dene recursive functions in HOL [21]. But the user is also allowed
to specify custom equations for functions, in which case we will use heuristics to
generate and prove the appropriate induction theorem. For simplicity, we will
use the term (dening) equation uniformly to refer to any set of equations, ei-
ther default ones or ones specied by the user. Embedding partially-specied
functions in particular, proving the certicate theorem about them is cur-
rently not supported. In the future, we plan to leverage the domain predicate as
produced by fun to generate conditional theorems.
4 Terms, matching and substitution
The compiler transforms the initial term type (Figure 1a) through various inter-
mediate stages. This section gives an overview and introduces necessary termi-
nology.
Preliminaries The function arrow in HOL is . The cons operator on lists is
the inx #.
Throughout the paper, the concept of mappings is pervasive: We use the
type notation α β to denote a function α β option. In certain contexts,
a mapping may also be called an environment. We write mapping literals using
brackets: [a x, b y, . . .]. If it is clear from the context that σ is dened on
a, we often treat the lookup σ a as returning an x :: β.
The functions dom :: (α β) α set and range :: (α β) β set return
the domain and range of a mapping, respectively.
Dropping entries from a mapping is denoted by σ k, where σ is a mapping
and k is either a single key or a set of keys. We use σ
σ to denote that σ
is
a sub-mapping of σ, that is, dom σ
dom σ and a dom σ
. σ
a = σ a.
Merging two mappings σ and ρ is denoted with σ ++ ρ. It constructs a new
mapping with the union domain of σ and ρ. Entries from ρ override entries from
σ. That is, ρ σ ++ ρ holds, but not necessarily σ σ ++ ρ.
All mappings and sets are assumed to be nite. In the formalization, this is
enforced by using subtypes of and set. Note that one cannot dene datatypes
by recursion through sets for cardinality reasons. However, for nite sets, it
is possible. This is required to construct the various term types. We leverage
facilities of Blanchette et al.’s datatype command to dene these subtypes [7].
Standard functions All type constructors that we use (, set, list, option, ...) sup-
port the standard operations map and rel. For lists, map is the regular covariant
6
map. For mappings, the function has the type (β γ) (α β) (α γ).
It leaves the domain unchanged, but applies a function to the range of the map-
ping.
Function rel
τ
lifts a binary predicate P :: α α bool to the type con-
structor τ. We call this lifted relation the relator for a particular type.
For datatypes, its denition is structural, for example:
rel
list
P [] []
rel
list
P xs ys P x y
rel
list
P (x # xs) (y # ys)
For sets and mappings, the denition is a little bit more subtle.
Denition 1 (Set relator). For each element a A, there must be a corre-
sponding element b B such that P a b, and vice versa. Formally:
rel
set
P A B (x A. y B. P x y) (y B. x A. P x y)
Denition 2 (Mapping relator). For each a, m a and n a must be related
according to rel
option
P . Formally:
rel
mapping
P m n (a. rel
option
P (m a) (n a))
Term types There are four distinct term types: term, nterm, pterm, and sterm. All
of them support the notions of free variables, matching and substitution. Free
variables are always a nite set of strings. Matching a term against a pattern
yields an optional mapping of type string α from free variable names to terms.
Note that the type of patterns is itself term instead of a dedicated pattern
type. The reason is that we have to subject patterns to a linearity constraint
anyway and may use this constraint to carve out the relevant subset of terms:
Denition 3. A term is linear if there is at most one occurrence of any variable,
it contains no abstractions, and in an application f $ x, f must not be a free
variable. The HOL predicate is called linear :: term b ool.
Because of the similarity of operations across the term types, they are all in-
stances of the term type class. Note that in Isabelle, classes and types live in
dierent namespaces. The term type and the term type class are separate enti-
ties.
Denition 4. A term type τ supports the operations match :: term τ
(string τ ), subst :: (string τ ) τ τ and frees :: τ string set. We also
dene the following derived functions:
matchs matches a list of patterns and terms sequentially, producing a single
mapping
closed t is an abbreviation for frees t =
closed σ is an overloading of closed, denoting that all values in a mapping
are closed
7
Additionally, some (obvious) axioms have to be satised. We do not strive to
fully specify an abstract term algebra. Instead, the axioms are chosen according
to the needs of this formalization.
A notable deviation from matching as discussed in term rewriting literature
is that the result of matching is only well-dened if the pattern is linear.
Denition 5. An equation is a pair of a pattern (left-hand side) and a term
(right-hand side). The pattern is of the form f $p
1
$. . .$p
n
, where f is a constant
(i.e. of the form Const name). We refer to both f or name interchangeably as
the function symbol of the equation.
Following term rewriting terminology, we sometimes refer to an equation as rule.
4.1 De Bruijn terms (term )
The denition of term is almost an exact copy of Isabelle’s internal term type,
with the notable omissions of type information and schematic variables (Fig-
ure 1a). The implementation of β-reduction is straightforward via index shifting
of bound variables.
4.2 Named bound variables (nterm)
datatype nterm = Nconst string | Nvar string | Nabs string nterm | Napp nterm nterm
The nterm type is similar to term , but removes the distinction between bound
and free variables. Instead, there are only named variables. As mentioned in the
previous section, we forbid substitution of terms that are not closed in order
to avoid capture. This is also reected in the syntactic side conditions of the
correctness proofs (§5.1).
4.3 Explicit pattern matching (pterm)
datatype pterm =
Pconst string | Pvar string | Pabs ((term × pterm) set) | Papp pterm pterm
Functions in HOL are usually dened using implicit pattern matching, that is,
the terms p
i
occurring on the left-hand side f p
1
. . . p
n
of an equation must
be constructor patterns. This is also common among functional programming
languages like Haskell or OCaml. CakeML only supports explicit pattern match-
ing using case expressions. A function denition consisting of multiple dening
equations must hence be translated to the form f = λx. case x of . . .. The
elimination proceeds by iteratively removing the last parameter in the block of
equations until none are left.
In our formalization, we opted to combine the notion of abstraction and case
expression, yielding case abstractions, represented as the Pabs constructor. This
is similar to the fn construct in Standard ML, which denotes an anonymous
8
function that immediately matches on its argument [27]. The same construct
also exists in Haskell with the LambdaCase language extension. We chose this
representation mainly for two reasons: First, it allows for a simpler language
grammar because there is only one (shared) constructor for abstraction and case
expression. Second, the elimination procedure outlined above does not have to
introduce fresh names in the process. Later, when translating to CakeML syntax,
fresh names are introduced and proved correct in a separate step.
The set of pairs of pattern and right-hand side inside a case abstraction is
referred to as clauses. As a short-hand notation, we use Λ{p
1
t
1
, p
2
t
2
, . . .}.
4.4 Sequential clauses (sterm)
datatype sterm =
Sconst string | Svar string | Sabs ((term × sterm) list) | Sapp sterm sterm
In the term rewriting fragment of HOL, the order of rules is not signicant. If a
rule matches, it can be applied, regardless when it was dened or proven. This
is reected by the use of sets in the rule and term types. For CakeML, the rules
need to be applied in a deterministic order, i.e. sequentially. The sterm type only
diers from pterm by using list instead of set. Hence, case abstractions use list
brackets: Λ[p
1
t
1
, p
2
t
2
, . . .].
4.5 Irreducible terms (value)
CakeML distinguishes between expressions and values. Whereas expressions may
contain free variables or β-redexes, values are closed and fully evaluated. Both
have a notion of abstraction, but values dier from expressions in that they
contain an environment binding free variables.
Consider the expression (λx.λy.x) (λz .z), which is rewritten (by β-reduction)
to λy.λz.z. Note how the bound variable x disappears, since it is replaced. This
is contrary to how programming languages are usually implemented: evaluation
does not happen by substituting the argument term t for the bound variable
x, but by recording the binding x 7→ t in an environment [23]. A pair of an
abstraction and an environment is usually called a closure [24, 40].
In CakeML, this means that evaluation of the above expression results in the
closure
(λy.x, ["x" 7→ (λz.z, [])])
Note the nested structure of the closure, whose environment itself contains a
closure.
To reect this in our formalization, we introduce a type value of values (ex-
planation inline):
datatype value =
(∗ constructor value: a data constructor applied to multiple values ∗)
Vconstr string (value list) |
(∗ closure: clauses combined with an environment mapping variables to values ∗)
9
Vabs ((term × sterm) list) (string value) |
(∗ recursive closures: a group of mutually recursive function bodies with an environment ∗)
Vrecabs (string ((term × sterm) list)) string (string value)
The above example evaluates to the closure:
Vabs
[
y x
] [
"x" 7→ Vabs [z z] []
]
The third case for recursive closures only becomes relevant when we conate
variables and constants. As long as the rule set rs is kept separate, recursive calls
are straightforward: the appropriate denition for the constant can be looked up
there. CakeML knows no such distinction between constants and variables, hence
everything has to reside in a single environment σ.
Consider this example of odd and even:
odd 0 = False even 0 = True
odd (Suc n) = even n even (Suc n) = odd n
When evaluating the term odd k, the denitions of even and odd themselves must
be available in the environment captured in the denition of odd. However, it
would be cumbersome in HOL to construct such a Vabs that refers to itself.
Instead, we capture the expressions used to dene o dd and even in a recursive
closure. Other encodings might be possible, but since we are targeting CakeML,
we are opting to model it in a similar way as its authors do.
For the above example, this would result in the following global environ-
ment:
["odd" 7→ Vrecabs css "odd" [], "even" 7→ Vrecabs css "even" []]
where css = ["odd" 7→ [0 False , Suc n even n],
"even" 7→ [0 True , Suc n odd n]]
Note that in the rst line, the right-hand sides are values, but in css, they
are expressions. The additional string argument of Vrecabs denotes the selected
function. When evaluating an application of a recursive closure to an argument
(β-reduction), the semantics adds all constituent functions of the closure to the
environment used for recursive evaluation.
5 Intermediate semantics and compiler phases
In this section, we will discuss the progression from de Bruijn based term lan-
guage with its small-step semantics given in Figure 1a to the nal CakeML
semantics. The compiler starts out with terms of type term and applies multiple
phases to eliminate features that are not present in the CakeML source language.
Types term, nterm and pterm each have a small-step semantics only. Type sterm
has a small-step and several intermediate big-step semantics that bridge the gap
to CakeML. An overview of the intermediate semantics and compiler phases is
depicted in Figure 2. The left-hand column gives an overview of the dierent
phases. The right-hand column gives the types of the rule set and the semantics
for each phase; you may want to skip it upon rst reading.
10
de Bruijn
terms
Named bound
variables
Explicit
pattern
matching
Sequential
clauses
Evaluation
semantics
§5.2
§5.3
§5.4
§5.6
constructors :: string set (shared by all phases)
R :: (term × term) set, t, t
:: term
R t t
(Figure 1b)
R :: (term × nterm) set, t, t
:: nterm
R t t
(Figure 3)
R :: (string × pterm) set, t, t
:: pterm
R t t
(Figure 4)
rs :: (string × sterm) list, t, t
:: sterm
rs t t
(Figure 5)
rs :: (string × sterm) list, σ :: string sterm
t, u :: sterm
rs, σ t u (Figure 6)
§5.5
rs :: (string × value) list, σ :: string value
t :: sterm, u :: value
rs, σ t u (Figure 7)
σ :: string value
t :: sterm, u :: value
σ t u (Figure 8)
§5.7
Phase/Renement
Types & Semantics
Theorem 1
see §5.3
see §5.4
Theorem 2
Theorem 1
Theorem 4
compiler phase; semantics renement
semantics belonging to the phase; semantics relation
Fig. 2: Intermediate semantics and compiler phases
5.1 Side conditions
All of the following semantics require some side conditions on the rule set. These
conditions are purely syntactic. As an example we list the conditions for the
correctness of the rst compiler phase:
Patterns must be linear, and constructors in patterns must be fully applied.
Denitions must have at least one parameter on the left-hand side (§5.6).
The right-hand side of an equation refers only to free variables occurring in
patterns on the left-hand side and contain no dangling de Bruijn indices.
11
Step
(lhs, rhs) R match lhs t = Some σ
R t subst σ rhs
Beta
closed t
R (Λx. t) $ t
subst
[
x 7→ t
]
t
Fig. 3: Small-step semantics for
nterm
with named bound variables
There are no two dening equations lhs = rhs
1
and lhs = rhs
2
such that
rhs
1
̸= rhs
2
.
For each pair of equations that dene the same constant, their arity must
be equal and their patterns must be compatible (§5.3).
There is at least one equation.
Variable names occurring in patterns must not overlap with constant names
(§5.7).
Any occurring constants must either be dened by an equation or be a
constructor.
The conditions for the subsequent phases are suciently similar that we do not
list them again.
In the formalization, we use named contexts to x the rules and assump-
tions on them (locales in Isabelle terminology). Each phase has its own locale,
together with a proof that after compilation, the preconditions of the next phase
are satised. Correctness proofs assume the above conditions on R and similar
conditions on the term that is reduced. For brevity, this is usually omitted in
our presentation.
5.2 Naming bound variables: From term to nterm
Isabelle uses de Bruijn indices in the term language for the following two rea-
sons: For substitution, there is no need to rename bound variables. Additionally,
α-equivalent terms are equal. In implementations of programming languages,
these advantages are not required: Typically, substitutions do not happen inside
abstractions, and there is no notion of equality of functions. Therefore CakeML
uses named variables and in this compilation step, we get rid of de Bruijn indices.
The “named” semantics is based on the nterm type. The rules that are
changed from the original semantics (Figure 1b) are given in Figure 3 (Fun and
Arg remain unchanged). Notably, β-reduction reuses the substitution function.
For the correctness proof, we need to establish a correspondence between
terms and nterms. Translation from nterm to term is trivial: Replace bound
variables by the number of abstractions between occurrence and where they
were bound in, and keep free variables as they are. This function is called
nterm
_
to
_
term
.
The other direction is not unique and requires introduction of fresh names
for bound variables. In our formalization, we have chosen to use a monad to
produce these names. This function is called term_to_nterm. We can also prove
12
Beta
(pat , rhs) C match pat t = Some σ closed t
R (Λ C) $ t subst σ rhs
Step’
(name, rhs) R
R Pconst name rhs
Fig. 4: Small-step semantics for pterm with pattern matching
the obvious property nterm_to_term (term_to_nterm t) = t, where t is a term
without dangling de Bruijn indices.
Generation of fresh names in general can be thought of as picking a string
that is not an element of a (nite) set of already existing names. For Isabelle,
the Nominal framework [41,42] provides support for reasoning over fresh names,
but unfortunately, its denitions are not executable.
Instead, we chose to model generation of fresh names as a monad α fresh
with the following primitive operations in addition to the monad operations:
run :: α fresh string set α
fresh_name :: string fresh
In our implementation, we have chosen to represent α fresh as roughly isomorphic
to the state monad.
Compilation of a rule set proceeds by translation of the right-hand side of all
rules:
compile R = {(p, term_to_nterm t) | (p, t) R}
The left-hand side is left unchanged for two reasons: function match expects an
argument of type term (see §4), and patterns do not contain abstractions or
bound variables.
Theorem 1 (Correctness of compilation). Assuming a step can be taken
with the compiled rule set, it can be reproduced with the original rule set.
compile R t u closed t
R nterm_to_term t nterm_to_term u
We prove this by induction over the semantics (Figure 3).
5.3 Explicit pattern matching: From nterm to pterm
Usually, functions in HOL are dened using implicit pattern matching, that is,
the left-hand side of an equation is of the form f p
1
. . . p
n
. For any given
function f, there may be multiple such equations. In this compilation step, we
transform sets of equations for f dened using implicit pattern matching into a
single equation for f of the form f = Λ C, where C is a set of clauses.
The strategy we employ currently requires successive elimination of a single
parameter from right to left, in a similar fashion as Slind’s pattern matching
13
compiler [37, §3.3.1]. Recall our running example (map). It has arity 2. We omit
the brackets ⟨⟩ for brevity. First, the list parameter gets eliminated:
map f = λ [] []
| x # xs f x # map f xs
Finally, the function parameter gets eliminated:
map = λ f
(
λ [] []
| x # xs f x # map f xs
)
This has now arity 0 and is dened by a twice-nested abstraction.
Semantics The target semantics is given in Figure 4 (the Fun and Arg rules
from previous semantics remain unchanged). We start out with a rule set R that
allows only implicit pattern matching. After elimination, only explicit pattern
matching remains. The modied Step rule merely replaces a constant by its
denition, without taking arguments into account.
Restrictions For the transformation to work, we need a strong assumption
about the structure of the patterns p
i
to avoid the following situation:
map f [] = []
map g (x # xs) = g x # map g xs
Through elimination, this would turn into:
map = λ f
(
λ [] []
)
| g
(
λ x # xs f x # map f xs
)
Even though the original equations were non-overlapping, we suddenly obtained
an abstraction with two overlapping patterns. Slind observed a similar problem
[37, §3.3.2] in his algorithm. Therefore, he only permits uniform equations, as
dened by Wadler [35, §5.5]. Here, we can give a formal characterization of our
requirements as a computable function on pairs of patterns:
fun pat_compat :: term term bool where
pat_compat (t
1
$ t
2
) (u
1
$ u
2
) pat_compat t
1
u
1
(t
1
= u
1
pat_compat t
2
u
2
)
pat_compat t u (overlapping t u t = u)
This compatibility constraint ensures that any two overlapping patterns (of the
same column) p
i,k
and p
j,k
are equal and are thus appropriately grouped together
in the elimination procedure. We require all dening equations of a constant to be
mutually compatible. Equations violating this constraint will be agged during
embedding (§3), whereas the pattern elimination algorithm always succeeds.
While this rules out some theoretically possible pattern combinations (e.g. the
diagonal function [35, §5.5]), in practice, we have not found this to be a problem:
14
Step
(name, rhs) R
R Sconst name rhs
Beta
first_match cs t = Some (σ, rhs) closed t
R (Λ cs) $ t subst σ rhs
Fig. 5: Small-step semantics for
sterm
All of the function denitions we have tried (§8) satised pattern compatibility
(after automatic renaming of pattern variables). As a last resort, the user can
manually instantiate function equations. Although this will always lead to a
pattern compatible denition, it is not done automatically, due to the potential
blow-up.
Discussion Because this compilation phase is both non-trivial and has some
minor restrictions on the set of function denitions that can be processed, we
may provide an alternative implementation in the future. Instead of eliminat-
ing patterns from right to left, patterns may be grouped in tuples. The above
example would be translated into:
map = λ (f, []) []
| (f, x # xs) f x # map f xs
We would then leave the compilation of patterns for the CakeML compiler, which
has no pattern compatibility restriction.
The obvious disadvantage however is that this would require the knowledge
of a tuple type in the term language which is otherwise unaware of concrete
datatypes.
5.4 Sequentialization: From pterm to sterm
The semantics of pterm and sterm dier only in rule Step and Beta. Figure 5
shows the modied rules, where instead of any matching clause the rst match-
ing clause in a case abstraction is picked. For the correctness proof, the order
of clauses does not matter: we only need to prove that a step taken in the se-
quential semantics can be reproduced in the unordered semantics. As long as
no rules are dropped, this is trivially true. For that reason, the compiler orders
the clauses lexicographically. At the same time the rules are also converted from
type (string × pterm) set to (string × sterm) list. Below, rs will always denote a
list of the latter type.
5.5 Big-step semantics for sterm
This big-step semantics for sterm is not a compiler phase but moves towards
the desired evaluation semantics. In this rst step, we reuse the sterm type for
15
Const
(name, rhs) rs
rs, σ Sconst name rhs
Var
σ name = Some v
rs, σ Svar name v
Abs
rs, σ Λ cs Λ [(pat , subst (σ frees pat) t | (pat , t) cs]
Comb
rs, σ t Λ cs
rs, σ u u
first_match cs u
= Some (σ
, rhs) rs, σ ++ σ
rhs v
rs, σ t $ u v
Constr
name constructors rs, σ t
1
u
1
· · · rs, σ t
n
u
n
rs, σ Sconst name $ t
1
$ . . . $ t
n
Sconst name $ u
1
$ . . . $ u
n
Fig. 6: Big-step semantics for sterm
evaluation results, instead of evaluating to the separate type value. This allows
us to ignore environment capture in closures for now.
All previous relations were parametrized by a rule set. Now the big-step
predicate is of the form rs, σ t t
where σ :: string sterm is a variable
environment.
This semantics also introduces the distinction between constructors and de-
ned constants. If C is a constructor, the term C t
1
. . . t
n
is evaluated to
C t
1
. . . t
n
where the t
i
are the results of evaluating the t
i
.
The full set of rules is shown in Figure 6. They deserve a short explanation:
Const Constants are retrieved from the rule set rs.
Var Variables are retrieved from the environment σ.
Abs In order to achieve the intended invariant, abstractions are evaluated to
their fully substituted form.
Comb Function application t$u rst requires evaluation of t into an abstraction
Λ cs and evaluation of u into an arbitrary term u
. Afterwards, we look for a
clause matching u
in cs, which produces a local variable environment σ
, pos-
sibly overwriting existing variables in σ. Finally, we evaluate the right-hand
side of the clause with the combined global and local variable environment.
Constr For a constructor application C t
1
. . ., evaluate all t
i
. The set con-
structors is an implicit parameter of the semantics.
Lemma 1 (Closedness invariant). If σ contains only closed terms, frees t
dom σ and rs, σ t t
, then t
is closed.
Correctness of the big-step w.r.t. the small-step semantics is proved easily by
induction on the former:
Lemma 2. For any closed environment σ satisfying frees t dom σ,
rs, σ t u rs subst σ t
u
By setting σ = [], we obtain:
Theorem 2 (Correctness). rs, [] t u closed t rs t
u
16
Const
(name, rhs) rs
rs, σ Sconst name rhs
Var
σ name = Some v
rs, σ Svar name v
Abs
rs, σ Λ cs Vabs cs σ
Comb
rs, σ t Vabs cs σ
rs, σ u v first_match cs v = Some (σ
′′
, rhs) rs, σ
++ σ
′′
rhs v
rs, σ t $ u v
RecComb
rs, σ t Vrecabs css name σ
css name = Some cs rs, σ u v
first_match cs v = Some (σ
′′
, rhs) rs, σ
++ σ
′′
rhs v
rs, σ t $ u v
Constr
name constructors rs, σ t
1
v
1
· · · rs, σ t
n
v
n
rs, σ Sconst name $ t
1
$ . . . $ t
n
Vconstr name [v
1
, . . . , v
n
]
Fig. 7: Evaluation semantics from sterm to value
5.6 Evaluation semantics: Rening sterm to value
At this point, we introduce the concept of values into the semantics, while still
keeping the rule set (for constants) and the environment (for variables) separate.
The evaluation rules are specied in Figure 7 and represent a departure from the
original rewriting semantics: a term does not evaluate to another term but to an
object of a dierent type, a value. We still use as notation, because big-step
and evaluation semantics can be disambiguated by their types.
The evaluation model itself is fairly straightforward. As explained in §4.5,
abstraction terms are evaluated to closures capturing the current variable envi-
ronment. Note that at this point, recursive closures are not treated dierently
from non-recursive closures. In a later stage, when rs and σ are merged, this
distinction becomes relevant.
We will now explain each rule that has changed from the previous semantics:
Abs Abstraction terms are evaluated to a closure capturing the current envi-
ronment.
Comb As before, in an application t $ u, t must evaluate to a closure Vabs cs σ
.
The evaluation result of u is then matched against the clauses cs, producing
an environment σ
′′
. The right-hand side of the clause is then evaluated using
σ
++ σ
′′
; the original environment σ is eectively discarded.
RecComb Similar as above. Finding the matching clause is a two-step process:
First, the appropriate clause list is selected by name of the currently active
function. Then, matching is performed.
Constr As before, for an n-ary application C t
1
. . ., where C is a data con-
structor, we evaluate all t
i
. The result is a Vconstr value.
Conversion between sterm and value To establish a correspondence between
evaluating a term to an sterm and to a value, we apply the same trick as in §5.2.
17
Instead of specifying a complicated relation, we translate value back to sterm:
simply apply the substitutions in the captured environments to the clauses.
The translation rules for Vabs and Vrecabs are kept similar to the Abs rule
from the big-step semantics (Figure 6). Roughly speaking, the big-step semantics
always keeps terms fully substituted, whereas the evaluation semantics defers
substitution.
Similarly to §5.2, we can also dene a function sterm_to_value :: sterm
value and prove that one function is the inverse of the other.
Matching The value type, instead of using binary function application as all
other term types, uses n-ary constructor application. This introduces a concep-
tual mismatch between (binary) patterns and values. To make the proofs easier,
we introduce an intermediate type of n-ary patterns. This intermediate type can
be optimized away by fusion.
Correctness The correctness proof requires a number of interesting lemmas.
Lemma 3 (Substitution before evaluation).
Assuming that a term
t
can
be evaluated to a value u given a closed environment σ, it can be evaluated to
the same value after substitution with a sub-environment σ
. Formally: rs, σ
t u σ
σ rs, σ subst σ
t u
This justies the “pre-substitution” exhibited by the Abs rule in the big-step
semantics in contrast to the environment-capturing Abs rule in the evaluation
semantics.
Theorem 3 (Correctness). Let σ be a closed environment and t a term which
only contains free variables in dom σ. Then, an evaluation to a value rs, σ t v
can be reproduced in the big-step semantics as rs
, map value_to_sterm σ t
value_to_sterm v, where rs
= [(name, value_to_sterm rhs) | (name, rhs) rs].
Instantiating the correctness theorem The correctness theorem states that,
for any given evaluation of a term t with a given environment rs, σ containing
values, we can reproduce that evaluation in the big-step semantics using a derived
list of rules rs
and an environment σ
containing sterms that are generated by
the value_to_sterm function. But recall the diagram in Figure 2. In our scenario,
we start with a given rule set of sterms (that has been compiled from a rule set
of terms). Hence, the correctness theorem only deals with the opposite direction.
It remains to construct a suitable rs such that applying value_to_sterm to
it yields the given sterm rule set. We can exploit the side condition (§5.1) that
all bindings dene functions, not constants:
Denition 6 (Global clause set). The mapping global_css :: string ((term×
sterm) list) is obtained by stripping the Sabs constructors from all denitions and
converting the resulting list to a mapping.
18
For each denition with name f we dene a corresponding term v
f
= Vrecabs
global_css f []. In other words, each function is now represented by a recur-
sive closure bundling all functions. Applying value_to_sterm to v
f
returns the
original denition of f . Let rs denote the original sterm rule set and rs
v
the
environment mapping all f’s to the v
f
’s.
The variable environments σ and σ
can safely be set to the empty mapping,
because top-level terms are evaluated without any free variable bindings.
Corollary 1 (Correctness). rs
v
, [] t v rs, [] t value_to_sterm v
Note that this step was not part of the compiler (although rs
v
is computable)
but it is a renement of the semantics to support a more modular correctness
proof.
Example Recall the odd and even example from §4.5. After compilation to sterm,
the rule set looks like this:
rs = {("odd", Sabs [0 False , Suc n even n]),
("even", Sabs [0 True , Suc n odd n])}
This can be easily transformed into the following global clause set:
global_css = ["odd" 7→ [0 False , Suc n even n],
"even" 7→ [0 True , Suc n odd n]]
Finally, rs
v
is computed by creating a recursive closure for each function:
rs
v
= ["odd" 7→ Vrecabs global_css "odd" [],
"even" 7→ Vrecabs global_css "even" []]
5.7 Evaluation with recursive closures
CakeML distinguishes between non-recursive and recursive closures [29]. This
distinction is also present in the value type. In this step, we will conate vari-
ables with constants which necessitates a special treatment of recursive closures.
Therefore we introduce a new predicate σ t v in Figure 8 (in contrast to the
previous rs, σ t v). We examine the rules one by one:
Const/Var Constant denition and variable values are both retrieved from
the same environment σ. We have opted to keep the distinction between
constants and variables in the sterm type to avoid the introduction of another
term type.
Abs Identical to the previous evaluation semantics. Note that evaluation never
creates recursive closures at run-time (only at compile-time, see §5.6). Anony-
mous functions, e.g. in the term map (λx. x), are evaluated to non-recursive
closures.
19
Const
name / constructors σ name = Some v
σ Sconst name v
Var
σ name = Some v
σ Svar name v
Abs
σ Λ cs Vabs cs σ
Comb
σ t Vabs cs σ
σ u v first_match cs v = Some (σ
′′
, rhs) σ
++ σ
′′
rhs v
σ t $ u v
RecComb
σ t Vrecabs css name σ
css name = Some cs σ u v first_match cs v = Some (σ
′′
, rhs)
σ
++ mk_rec_env css σ
++ σ
′′
rhs v
σ t $ u v
Constr
name constructors σ t
1
v
1
· · · σ t
n
v
n
σ Sconst name $ t
1
$ . . . $ t
n
Vconstr name [v
1
, . . . , v
n
]
Fig. 8: ML-style evaluation semantics
Comb Identical to the previous evaluation semantics.
RecComb Almost identical to the evaluation semantics. Additionally, for each
function (name, cs) css, a new recursive closure Vrecabs css name σ
is
created and inserted into the environment. This ensures that after the rst
call to a recursive function, the function itself is present in the environment to
be called recursively, without having to introduce coinductive environments.
Constr Identical to the evaluation semantics.
Conating constants and variables By merging the rule set rs with the
variable environment σ, it becomes necessary to discuss possible clashes. Previ-
ously, the syntactic distinction between Svar and Sconst meant that x and x
are not ambiguous: all semantics up to the evaluation semantics clearly specify
where to look for the substitute. This is not the case in functional languages
where functions and variables are not distinguished syntactically.
Instead, we rely on the fact that the initial rule set only denes constants. All
variables are introduced by matching before β-reduction (that is, in the Comb
and RecComb rules). The Abs rule does not change the environment. Hence
it suces to assume that variables in patterns must not overlap with constant
names (see §5.1).
Correspondence relation Both constant denitions and values of variables
are recorded in a single environment σ. This also applies to the environment
contained in a closure. The correspondence relation thus needs to take a dierent
sets of bindings in closures into account.
Hence, we dene a relation
v
that is implicitly parametrized on the rule
set rs and compares environments. We call it right-conating, because in a cor-
20
respondence v
v
u, any bound environment in u is thought to contain both
variables and constants, whereas in v, any bound environment contains only
variables.
Denition 7 (Right-conating correspondence). We dene
v
coinduc-
tively as follows:
v
1
v
u
1
· · · v
n
v
u
n
Vconstr name [v
1
, . . . , v
n
]
v
Vconstr name [u
1
, . . . , u
n
]
x frees cs. σ
1
x
v
σ
2
x x consts cs. rs x
v
σ
2
x
Vabs cs σ
1
v
Vabs cs σ
2
cs range css. x frees cs. σ
1
x
v
σ
2
x
cs
range
css
.
x
consts
cs
. σ
1
x
v
(
σ
2
++
mk
_
rec
_
env
css
σ
2
)
x
Vrecabs css name σ
1
v
Vrecabs css name σ
2
Consequently,
v
is not reexive.
Correctness The correctness lemma is straightforward to state:
Theorem 4 (Correctness). Let σ be an environment, t be a closed term and
v a value such that σ t v. If for all constants x occurring in t, rs x
v
σ x
holds, then there is an u such that rs, [] t u and u
v
v.
As usual, the rather technical proof proceeds via induction over the semantics
(Figure 8). It is important to note that the global clause set construction (§5.6)
satises the preconditions of this theorem:
Lemma 4. If name is the name of a constant in rs, then
Vrecabs global_css name []
v
Vrecabs global_css name []
Because
v
is dened coinductively, the proof of this precondition proceeds by
coinduction.
5.8 CakeML
CakeML is a veried implementation of a subset of Standard ML [23, 39]. It
comprises a parser, type checker, formal semantics and backend for machine
code. The semantics has been formalized in Lem [28], which allows export to
Isabelle theories.
Our compiler targets CakeML’s abstract syntax tree. However, we do not
make use of certain CakeML features; notably mutable cells, modules, and lit-
erals. We have derived a smaller, executable version of the original CakeML
semantics, called CupCakeML, together with an equivalence proof. The cor-
rectness proof of the last compiler phase establishes a correspondence between
CupCakeML and the nal semantics of our compiler pipeline.
21
For the correctness proof of the CakeML compiler, its authors have extracted
the Lem specication into HOL4 theories [1]. In our work, we directly target
CakeML abstract syntax trees (thereby bypassing the parser) and use its big-
step semantics, which we have extracted into Isabelle.
1
Conversion from sterm to exp After the series of translations described in the
earlier sections, our terms are syntactically close to CakeML’s terms (Cake.exp).
The only remaining dierences are outlined below:
CakeML does not combine abstraction and pattern matching. For that rea-
son, we have to translate Λ [p
1
t
1
, . . .] into Λx. case x of p
1
t
1
| . . .,
where x is a fresh variable name. We reuse the fresh monad to obtain a bound
variable name. Note that it is not necessary to thread through already cre-
ated variable names, only existing names. The reason is simple: a generated
variable is bound and then immediately used in the body. Shadowing it
somewhere in the body is not problematic.
CakeML has two distinct syntactic categories for identiers (that can rep-
resent variables or functions) and data constructors. Our term types how-
ever have two distinct syntactic categories for constants (that can represent
functions or data constructors) and variables. The necessary prerequisites
to deal with this are already present in the ML-style evaluation semantics
(§5.7) which conates constants and variables, but has a dedicated Constr
rule for data constructors.
Types During embedding (§3), all type information is erased. Yet, CakeML per-
forms some limited form of type checking at run-time: constructing and matching
data must always be fully applied. That is, data constructors must always occur
with all arguments supplied on right-hand and left-hand sides.
Fully applied constructors in terms can be easily guaranteed by simple pre-
processing. For patterns however, this must be ensured throughout the com-
pilation pipeline; it is (like other syntactic constraints) another side condition
imposed on the rule set (§5.1).
The shape of datatypes and constructors is managed in CakeML’s environ-
ment. This particular piece of information is allowed to vary in closures, since
ML supports local type denitions. Tracking this would greatly complicate our
proofs. Hence, we x a global set of constructors and enforce that all values use
exactly that one.
Correspondence relation We dene two dierent correspondence relations:
One for values and one for expressions.
Denition 8 (Expression correspondence).
Var
rel_e (Svar n) (Cake.Var n)
Const
n / constructors
rel_e (Sconst n) (Cake.Var n)
1
based on a repository snapshot from March 27, 2017 (0c48672)
22
Constr
n constructors rel_e t
1
u
1
· · ·
rel_e (Sconst name $ t
1
$ . . . $ t
n
) (Cake.Con (Some (Cake.Short name) [u
1
, . . . , u
n
]))
App
rel_e t
1
u
1
rel_e t
2
u
2
rel_e t
1
$ t
2
Cake.App Cake.Opapp [u
1
, u
2
]
Fun
n / ids (Λ [p
1
t
1
, . . .]) constructors
q
1
= mk_ml_pat p
1
rel_e t
1
u
1
· · ·
rel_e (Λ [p
1
t
1
, . . .]) (Cake.Fun n (Cake.Mat (Cake.Var n)) [q
1
u
1
, . . .])
Mat
rel_e t u q
1
= mk_ml_pat p
1
rel_e t
1
u
1
· · ·
rel_e (Λ [p
1
t
1
, . . .] $ t) (Cake.Mat u [q
1
u
1
, . . .])
We will explain each of the rules briey here.
Var Variables are directly related by identical name.
Const As described earlier, constructors are treated specially in CakeML. In
order to not confuse functions or variables with data constructors themselves,
we require that the constant name is not a constructor.
Constr Constructors are directly related by identical name, and recursively
related arguments.
App CakeML does not just support general function application but also unary
and binary operators. In fact, function application is the binary operator
Opapp. We never generate other operators. Hence the correspondence is re-
stricted to Opapp.
Fun/Mat Observe the symmetry between these two cases: In our term lan-
guage, matching and abstraction are combined, which is not the case in
CakeML. This means we relate a case abstraction to a CakeML function
containing a match, and a case abstraction applied to a value to just a
CakeML match.
There is no separate relation for patterns, because their translation is simple.
The value correspondence (rel_v) is structurally simpler. In the case of con-
structor values (Vconstr and Cake.Conv), arguments are compared recursively.
Closures and recursive closures are compared extensionally, i.e. only bindings
that occur in the body are checked recursively for correspondence.
Correctness We use the same trick as in §5.6 to obtain a suitable environment
for CakeML evaluation based on the rule set rs.
Theorem 5 (Correctness). If the compiled expression sterm_to_cake t ter-
minates with a value u in the CakeML semantics, there is a value v such that
rel_v v u and rs t v.
23
6 Composition
The complete compiler pipeline consists of multiple phases. Correctness is justi-
ed for each phase between intermediate semantics and correspondence relations,
most of which are rather technical. Whereas the compiler may be complex and
impenetrable, the trustworthiness of the constructions hinges on the obviousness
of those correspondence relations.
Fortunately, under the assumption that terms to be evaluated and the re-
sulting values do not contain abstractions or closures, respectively all of the
correspondence relations collapse to simple structural equality: two terms are
related if and only if one can be converted to the other by consistent renaming
of term constructors.
The actual compiler can be characterized with two functions. Firstly, the
translation of term to Cake.exp is a simple composition of each term translation
function:
denition term_to_cake :: term Cake.exp where
term_to_cake = sterm_to_cake pterm_to_sterm nterm_to_pterm term_to_nterm
Secondly, the function that translates function denitions by composing the
phases as outlined in Figure 2, including iterated application of pattern elimi-
nation:
denition compile :: (term × term) fset Cake.dec where
compile = Cake.Dletrec compile_srules_to_cake compile_prules_to_srules
compile_irules_to_srules
compile_irules_iter
compile_crules_to_irules
consts_of compile_rules_to_nrules
Each function compile_* corresponds to one compiler phase; the remaining func-
tions are trivial. This produces a CakeML top-level declaration. We prove that
evaluating this declaration in the top-level semantics (evaluate_prog) results in
an environment cake_sem_env. But cake_sem_env can also be computed via
another instance of the global clause set trick (§5.6).
Equipped with these functions, we can state the nal correctness theorem:
theorem compiled_correct:
(∗ If CakeML evaluation of a term succeeds ... ∗)
assumes evaluate False cake_sem_env s (term_to_cake t) (s’, Rval ml_v)
(∗ ... producing a constructor term without closures ... ∗)
assumes cake_abstraction_free ml_v
(∗ ... and some syntactic properties of the involved terms hold ... ∗)
assumes closed t and ¬ shadows_consts (heads rs constructors) t and
welldened (heads rs constructors) t and wellformed t
(∗ ... then this evaluation can be reproduced in the termrewriting semantics ∗)
shows rs t
cake_to_term ml_v
This theorem directly relates the evaluation of a term t in the full CakeML
(including mutability and exceptions) to the evaluation in the initial higher-order
term rewriting semantics. The evaluation of t happens using the environment
produced from the initial rule set. Hence, the theorem can be interpreted as the
correctness of the pseudo-ML expression let rec rs in t.
24
class add =
xes plus :: ’a ’a ’a
denition f :: (’a::add) ’a where
f x = plus x x
(a) Source program
datatype ’a dict_add = Dict_add (’a ’a ’a)
fun cert_add :: (’a::add) dict_add bool where
cert_add (Dict_add pls) = (pls = plus)
fun f’ :: ’a dict_add ’a ’a where
f’ (Dict_add pls) x = pls x x
lemma f’_eq: cert_add dict f’ dict = f
<proof>
(b) Result of translation
Fig. 9: Dictionary construction in Isabelle
Observe that in the assumption, the conversion goes from our terms to
CakeML expressions, whereas in the conclusion, the conversion goes the opposite
direction.
7 Dictionary construction
Isabelle’s type system supports type classes (or simply classes) [17, 43] whereas
CakeML does not. In order to not complicate the correctness proofs, type classes
are not supported by our embedded term language either. Instead, we eliminate
classes and instances by a dictionary construction [18] before embedding into the
term language. Haftmann and Nipkow give a pen-and-paper correctness proof of
this construction [16, §4.1]. We augmented the dictionary construction with the
generation of a certicate theorem that shows the equivalence of the two versions
of a function, with type classes and with dictionaries. This section briey explains
our dictionary construction.
Figure 9 shows a simple example of a dictionary construction. Type vari-
ables may carry class constraints (e.g. α :: add). The basic idea is that classes
become dictionaries containing the functions of that class; class instances become
dictionary denitions. Dictionaries are realized as datatypes. Class constraints
become additional dictionary parameters for that class. In the example, class add
becomes dict_add; function f is translated into f
which takes an additional pa-
rameter of type dict_add. In reality our tool does not produce the Isabelle source
code shown in Figure 9b but performs the constructions internally. The correct-
ness lemma f
_eq is proved automatically. Its precondition expresses that the
dictionary must contain exactly the function(s) of class add. For any monomor-
phic instance, the precondition can be proved outright based on the certicate
theorems proved for each class instance as explained next.
Not shown in the example is the translation of class instances. The basic
form of a class instance in Isabelle is τ :: (c
1
, . . . , c
n
) c where τ is an n-ary type
constructor. It corresponds to Haskell’s (c
1
α
1
, . . . , c
n
α
n
) c (τ α
1
. . . α
n
) and
25
is translated into a function inst_c_τ :: α
1
dict_c
1
· · · α
n
dict_c
n
(α
1
, . . . , α
n
) τ dict_c and the following certicate theorem is proved:
cert_c
1
dict
1
· · · cert_c
n
dict
n
cert_c (inst_c_τ dict
1
. . . dict
n
)
For a more detailed explanation of how the dictionary construction works, we
refer to the corresponding entry in the Archive of Formal Proofs [20].
8 Evaluation
We have tried out our compiler on examples from existing Isabelle formalizations.
This includes an implementation of Human encoding, lists and sorting, string
functions [38], and various data structures from Okasaki’s book [33], including
binary search trees, pairing heaps, and leftist heaps. These denitions can be
processed with slight modications: functions need to be totalized (see the end
of §3). However, parts of the tactics required for deep embedding proofs (§3) are
too slow on some functions and hence still need to be optimized.
9 Conclusion
For this paper we have concentrated on the compiler from Isabelle/HOL to
CakeML abstract syntax trees. Partial correctness is proved w.r.t. the big-step
semantics of CakeML. In the next step we will link our work with the compiler
from CakeML to machine code. Tan et al. [39, §10] prove a correctness theorem
that relates their semantics with the execution of the compiled machine code.
In that paper, they use a newer iteration of the CakeML semantics (functional
big-step [34]) than we do here. Both semantics are still present in the CakeML
source repository, together with an equivalence proof.
Evaluation of our compiled programs is already possible via Isabelle’s pred-
icate compiler [5], which allows us to turn CakeML’s big-step semantics into
an executable function. We have used this execution mechanism to establish for
sample programs that they terminate successfully. We also plan to prove that
our compiled programs terminate, i.e. total correctness.
The total size of this formalization, excluding theories extracted from Lem,
is currently approximately 20000 lines of proof text (90 %) and ML code (10 %).
The ML code itself produces relatively simple theorems, which means that there
are less opportunities for it to go wrong. This constitutes an improvement over
certifying approaches that prove complicated properties in ML.
References
1. The HOL System Description (2014), https://hol-theorem-prover.org/
2. Anand, A., Appel, A.W., Morrisett, G., Paraskevopoulou, Z., Pollack, R., Bélanger,
O.S., Sozeau, M., Weaver, M.: CertiCoq: A veried compiler for Coq. In: CoqPL’17:
The Third International Workshop on Coq for Programming Languages (2017)
26
3. Augustsson, L.: Compiling pattern matching. In: Functional Programming Lan-
guages and Computer Architecture. pp. 368–381. Springer Berlin Heidelberg (1985)
4. Benton, N., Hur, C.: Biorthogonality, step-indexing and compiler correctness. In:
Hutton, G., Tolmach, A.P. (eds.) Proceeding of the 14th ACM SIGPLAN inter-
national conference on Functional programming, ICFP 2009. pp. 97–108. ACM
(2009), http://doi.acm.org/10.1145/1596550.1596567
5. Berghofer, S., Bulwahn, L., Haftmann, F.: Turning inductive into equational spec-
ications. In: LNCS. pp. 131–146. Springer (2009)
6. Berghofer, S., Nipkow, T.: Executing higher order logic. In: Callaghan, P., Luo, Z.,
McKinna, J., Pollack, R. (eds.) Types for Proofs and Programs (TYPES 2000).
vol. 2277, pp. 24–40 (2002)
7. Blanchette, J.C., Hölzl, J., Lochbihler, A., Panny, L., Popescu, A., Traytel, D.:
Truly modular (co)datatypes for Isabelle/HOL. In: Interactive Theorem Proving.
pp. 93–110. Springer International Publishing (2014)
8. Boespug, M., Dénès, M., Grégoire, B.: Full reduction at full throttle. In: Jouan-
naud, J., Shao, Z. (eds.) Certied Programs and Proofs. Lecture Notes in Computer
Science, vol. 7086, pp. 362–377. Springer (2011)
9. Boyer, R.S., Moore, J.S.: Single-threaded objects in ACL2. In: Krishnamurthi, S.,
Ramakrishnan, C.R. (eds.) Practical Aspects of Declarative Languages (PADL
2002). LNCS, vol. 2257, pp. 9–27. Springer (2002)
10. de Bruijn, N.G.: Lambda calculus notation with nameless dummies, a tool for
automatic formula manipulation, with application to the church-rosser theorem.
Indagationes Mathematicae (Proceedings) 75(5), 381 392 (1972)
11. Chlipala, A.: A veried compiler for an impure functional language. In:
Hermenegildo, M.V., Palsberg, J. (eds.) POPL 2010. pp. 93–106. ACM (2010)
12. Crow, J., Owre, S., Rushby, J., Shankar, N., Stringer-Calvert, D.: Evaluating, test-
ing, and animating PVS specications. Tech. rep., Computer Science Laboratory,
SRI International, Menlo Park, CA (Mar 2001)
13. Flatau, A.D.: A Veried Implementation of an Applicative Language with Dynamic
Storage Allocation. Ph.D. thesis, University of Texas at Austin (1992)
14. Forster, Y., Kunze, F.: Veried extraction from coq to a lambda-calculus. In: The
8th Coq Workshop (2016)
15. Greve, D.A., Kaufmann, M., Manolios, P., Moore, J.S., Ray, S., Ruiz-Reina, J.,
Sumners, R., Vroon, D., Wilding, M.: Ecient execution in an automated reasoning
environment. J. Funct. Program. 18(1), 15–46 (2008)
16. Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In:
Blume, M., Kobayashi, N., Vidal, G. (eds.) Functional and Logic Programming:
10th International Symposium, FLOPS 2010, Sendai, Japan, April 19-21, 2010.
Proceedings. pp. 103–117. Springer, Berlin, Heidelberg (2010)
17. Haftmann, F., Wenzel, M.: Constructive type classes in Isabelle. In: Altenkirch,
T., McBride, C. (eds.) Types for Proofs and Programs: International Workshop,
TYPES 2006, Nottingham, UK, April 18-21, 2006, Revised Selected Papers. pp.
160–174. Springer, Berlin, Heidelberg (2007)
18. Hall, C.V., Hammond, K., Jones, S.L.P., Wadler, P.L.: Type classes in haskell.
ACM Transactions on Programming Languages and Systems 18(2), 109–138 (mar
1996)
19. Hermida, C., Reddy, U.S., Robinson, E.P.: Logical relations and parametricity a
reynolds programme for category theory and programming languages. Electronic
Notes in Theoretical Computer Science 303, 149 180 (2014)
20. Hupel, L.: Dictionary construction. Archive of Formal Proofs (May 2017), http:
//isa-afp.org/entries/Dict_Construction.html, Formal proof development
27
21. Krauss, A.: Partial and nested recursive function denitions in higher-order logic.
Journal of Automated Reasoning 44(4), 303–336 (2010)
22. Krauss, A., Schropp, A.: A mechanized translation from higher-order logic to set
theory. In: Interactive Theorem Proving. pp. 323–338. Springer (2010)
23. Kumar, R., Myreen, M.O., Norrish, M., Owens, S.: CakeML: A veried implemen-
tation of ML. In: POPL. pp. 179–191. POPL ’14, ACM (2014)
24. Landin, P.J.: The mechanical evaluation of expressions. The Computer Journal
6(4), 308–320 (jan 1964)
25. Leroy, X.: Formal verication of a realistic compiler. Commun. ACM 52(7), 107–
115 (Jul 2009), http://doi.acm.org/10.1145/1538788.1538814
26. Letouzey, P.: A new extraction for coq. In: Geuvers, H., Wiedijk, F. (eds.) Types
for Proofs and Programs. LNCS, vol. 2646, pp. 200–219. Springer (2003)
27. Milner, R., Tofte, M., Harper, R., MacQueen, D.: The denition of Standard ML
(revised). MIT Press (1997)
28. Mulligan, D.P., Owens, S., Gray, K.E., Ridge, T., Sewell, P.: Lem: Reusable engi-
neering of real-world semantics. In: ICFP. pp. 175–188. ICFP ’14, ACM (2014)
29. Myreen, M.O., Owens, S.: Proof-producing translation of higher-order logic into
pure and stateful ML. JFP 24(2-3), 284–315 (005 2014)
30. Neis, G., Hur, C.K., Kaiser, J.O., McLaughlin, C., Dreyer, D., Vafeiadis, V.: Pil-
sner: a compositionally veried compiler for a higher-order imperative language.
pp. 166–178. ICFP 2015, ACM, New York, NY, USA (2015)
31. Nipkow, T., Klein, G.: Concrete Semantics. Springer (2014)
32. Nipkow, T., Paulson, L., Wenzel, M.: Isabelle/HOL A Proof Assistant for
Higher-Order Logic, LNCS, vol. 2283 (2002), 218 pp.
33. Okasaki, C.: Purely functional data structures. Cambridge University Press (1999)
34. Owens, S., Myreen, M.O., Kumar, R., Tan, Y.K.: Functional Big-Step Semantics,
pp. 589–615. Springer Berlin Heidelberg, Berlin, Heidelberg (2016)
35. Peyton Jones, S.L.: The Implementation of Functional Programming Languages.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1987)
36. Shankar, N.: Static analysis for safe destructive updates in a functional language.
In: Pettorossi, A. (ed.) Logic Based Program Synthesis and Transformation (LOP-
STR 2001). LNCS, vol. 2372, pp. 1–24. Springer (2001)
37. Slind, K.: Reasoning about Terminating Functional Programs. Ph.D. thesis, Tech-
nische Universität München (1999)
38. Sternagel, C., Thiemann, R.: Haskell’s show class in isabelle/hol. Archive of For-
mal Proofs (Jul 2014), http://isa-afp.org/entries/Show.html, Formal proof
development
39. Tan, Y.K., Myreen, M.O., Kumar, R., Fox, A., Owens, S., Norrish, M.: A new
veried compiler backend for CakeML. In: Proceedings of the 21st ACM SIGPLAN
International Conference on Functional Programming - ICFP 2016. Association for
Computing Machinery (ACM) (2016)
40. Turner, D.A.: Some history of functional programming languages. In: Lecture Notes
in Computer Science, pp. 1–20. Springer Berlin Heidelberg (2013)
41. Urban, C.: Nominal techniques in Isabelle/HOL. Journal of Automated Reasoning
40(4), 327–356 (2008), http://dx.doi.org/10.1007/s10817-008-9097-2
42. Urban, C., Berghofer, S., Kaliszyk, C.: Nominal 2. Archive of Formal Proofs (Feb
2013), http://isa-afp.org/entries/Nominal2.shtml, Formal proof development
43. Wenzel, M.: Type classes and overloading in higher-order logic. In: Gunter, E.L.,
Felty, A. (eds.) TPHOLs ’97. pp. 307–322. Springer, Berlin, Heidelberg (1997)