A Veriﬁed Compiler from Isabelle/HOL

to CakeML

Lars Hupel

(

B

)

and Tobias Nipkow

Technische Universit¨at M¨unchen, Munich, Germany

lars.hupel@tum.de, nipkow@in.tum.de

Abstract. Many theorem provers can generate functional programs

from deﬁnitions or proofs. However, this code generation needs to be

trusted. Except for the HOL4 system, which has a proof producing

code generator for a subset of ML. We go one step further and provide

a veriﬁed compiler from Isabelle/HOL to CakeML. More precisely we

combine a simple proof producing translation of recursion equations in

Isabelle/HOL into a deeply embedded term language with a fully veriﬁed

compilation chain to the target language CakeML.

Keywords: Isabelle

· CakeML · Compiler

Higher-order term rewriting

1 Introduction

Many theorem provers have the ability to generate executable code in some (typ-

ically functional) programming language from deﬁnitions, lemmas and proofs

(e.g. [6,8,9,12,16,27,37]). This makes code generation part of the trusted kernel

of the system. Myreen and Owens [30] closed this gap for the HOL4 system: they

have implemented a tool that translates from HOL4 into CakeML, a subset of

SML, and proves a theorem stating that a result produced by the CakeML code

is correct w.r.t. the HOL functions. They also have a veriﬁed implementation of

CakeML [24,40]. We go one step further and provide a once-and-for-all veriﬁed

compiler from (deeply embedded) function deﬁnitions in Isabelle/HOL [32,33]

into CakeML proving partial correctness of the generated CakeML code w.r.t.

the original functions. This is like the step from dynamic to static type checking.

It also means that preconditions on the input to the compiler are explicitly given

in the correctness theorem rather than implicitly by a failing translation. To the

best of our knowledge this is the ﬁrst veriﬁed (as opposed to certifying) compiler

from function deﬁnitions in a logic into a programming language.

Our compiler is composed of multiple phases and in principle applicable to

other languages than Isabelle/HOL or even HOL:

c

The Author(s) 2018

A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 999–1026, 2018.

https://doi.org/10.1007/978-3-319-89884-1

_35

1000 L. Hupel and T. Nipkow

– We erase types right away. Hence the type system of the source language is

irrelevant.

– We merely assume that the source language has a semantics based on equa-

tional logic.

The compiler operates in three stages:

1. The preprocessing phase eliminates features that are not supported by our

compiler. Most importantly, dictionary construction eliminates occurrences

of type classes in HOL terms. It introduces dictionary datatypes and new

constants and proves the equivalence of old and new constants (Sect. 7).

2. The deep embedding lifts HOL terms into terms of type term, a HOL model

of HOL terms. For each constant c (of arbitrary type) it deﬁnes a constant c

of type term and proves a theorem that expresses equivalence (Sect. 3).

3. There are multiple compiler phases that eliminate certain constructs from

the term type, until we arrive at the CakeML expression type. Most phases

target a diﬀerent intermediate term type (Sect. 5).

The ﬁrst two stages are preprocessing, are implemented in ML and produce

certiﬁcate theorems. Only these stages are speciﬁc to Isabelle. The third (and

main) stage is implemented completely in the logic HOL, without recourse to

ML. Its correctness is veriﬁed once and for all.

1

2 Related Work

There is existing work in the Coq [2,15]andHOL[30] communities for proof

producing or veriﬁed extraction of functions deﬁned in the logic. Anand et al. [2]

present work in progress on a veriﬁed compiler from Gallina (Coq’s speciﬁcation

language) via untyped intermediate languages to CompCert C light. They plan

to connect their extraction routine to the CompCert compiler [26].

Translation of type classes into dictionaries is an important feature of Haskell

compilers. In the setting of Isabelle/HOL, this has been described by Wenzel

[44] and Krauss et al. [23]. Haftmann and Nipkow [17] use this construction to

compile HOL deﬁnitions into target languages that do not support type classes,

e.g. Standard ML and OCaml. In this work, we provide a certifying translation

that eliminates type classes inside the logic.

Compilation of pattern matching is well understood in literature [3,36,38].

In this work, we contribute a transformation of sets of equations with pattern

matching on the left-hand side into a single equation with nested pattern match-

ing on the right-hand side. This is implemented and veriﬁed inside Isabelle.

Besides CakeML, there are many projects for veriﬁed compilers for functional

programming languages of various degrees of sophistication and realism (e.g.

1

All Isabelle deﬁnitions and proofs can be found on the paper website: https://

lars.hupel.info/research/codegen/,orarchivedashttps://doi.org/10.5281/zenodo.

1167616.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1001

[4,11,14]). Particularly modular is the work by Neis et al. [31] on a veriﬁed

compiler for an ML-like imperative source language. The main distinguishing

feature of our work is that we start from a set of higher-order recursion equations

with pattern matching on the left-hand side rather than a lambda calculus with

pattern matching on the right-hand side. On the other hand we stand on the

shoulders of CakeML which allows us to bypass all complications of machine

code generation. Note that much of our compiler is not speciﬁc to CakeML and

that it would be possible to retarget it to, for example, Pilsner abstract syntax

with moderate eﬀort.

Finally, Fallenstein and Kumar [13] have presented a model of HOL inside

HOL using large cardinals, including a reﬂection proof principle.

3 Deep Embedding

Starting with a HOL deﬁnition, we derive a new, reiﬁed deﬁnition in a deeply

embedded term language depicted in Fig. 1a. This term language corresponds

closely to the term datatype of Isabelle’s implementation (using de Bruijn indices

[10]), but without types and schematic variables.

To establish a formal connection between the original and the reiﬁed deﬁni-

tions, we use a logical relation, a concept that is well-understood in literature

[20] and can be nicely implemented in Isabelle using type classes. Note that the

use of type classes here is restricted to correctness proofs; it is not required for

the execution of the compiler itself. That way, there is no contradiction to the

elimination of type classes occurring in a previous stage.

Notation. We abbreviate App tuto t $ u and Abs t to Λt. Other term types

introduced later in this paper use the same conventions. We reserve λ for abstrac-

tions in HOL itself. Typing judgments are written with a double colon: t :: τ.

Embedding Operation. Embedding is implemented in ML. We denote this oper-

ation using angle brackets: t, where t is an arbitrary HOL expression and the

result t is a HOL value of type term. It is a purely syntactic transformation,

without preliminary evaluation or reduction, and it discards type information.

The following examples illustrate this operation and typographical conventions

concerning variables and constants:

x = Free "x" f = Const "f" λx. f x = Λ (f $ Bound 0)

Small-Step Semantics. Figure 1b speciﬁes the small-step semantics for term.Itis

reminiscent of higher-order term rewriting, and modelled closely after equality in

HOL. The basic idea is that if the proposition t = u

can be proved equationally

in HOL (without symmetry), then R t−→

∗

u holds (where R :: (term ×

term) set ). We call R the rule set. It is the result of translating a set of deﬁning

equations lhs = rhs into pairs (lhs , rhs) ∈ R.

1002 L. Hupel and T. Nipkow

datatype term =

Const string |

Free string |

Abs term |

Bound nat |

App term term

(a) Abstract syntax of

de Bruijn terms

Step

(lhs, rhs) ∈ R match lhs t = Some σ

R t −→ subst σ rhs

Beta

closed t

R (Λt)$t

−→ t[t

]

Fun

R t −→ t

R t $ u −→ t

$ u

Arg

R u −→ u

R t $ u −→ t $ u

(b) Small-step semantics

Fig. 1. Basic syntax and semantics of the term type

Rule Step performs a rewrite step by picking a rewrite rule from R and

rewriting the term at the root. For that purpose, match and subst are (mostly)

standard ﬁrst-order matching and substitution (see Sect. 4 for details).

Rule Beta performs β-reduction. Type term represents bound variables by

de Bruijn indices. The notation t[t

] represents the substitution of the outermost

bound variable in t with t

.

Our semantics does not constitute a fully-general higher-order term rewrit-

ing system, because we do not allow substitution under binders. For de Bruijn

terms, this would pose no problem, but as soon as we introduce named bound

variables, substitution under binders requires dealing with capture. To avoid this

altogether, all our semantics expect terms that are substituted into abstractions

to be closed. However, this does not mean that we restrict ourselves to any par-

ticular evaluation order. Both call-by-value and call-by-name can be used in the

small-step semantics. But later on, the target semantics will only use call-by-

value.

Embedding Relation. We denote the concept that an embedded term t corre-

sponds to a HOL term a of type τ w.r.t. rule set R with the syntax R t ≈ a.

If we want to be explicit about the type, we index the relation: ≈

τ

.

For ground types, this can be deﬁned easily. For example, the following two

rules deﬁne ≈

nat

:

R 0≈

nat

0

R t≈

nat

n

R Suc t≈

nat

Suc n

Deﬁnitions of ≈ for arbitrary datatypes without nested recursion can be derived

mechanically in the same fashion as for nat, where they constitute one-to-

one relations. Note that for ground types, ≈ ignores R. The reason why ≈ is

parametrized on R will become clear in a moment.

For function types, we follow Myreen and Owen’s approach [30]. The state-

ment R t ≈ f can be interpreted as “t $ a can be rewritten to fa for

all a”. Because this might involve applying a function deﬁnition from R,the≈

relation must be indexed by the rule set. As a notational convenience, we deﬁne

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1003

another relation R t ↓ x to mean that there is a t

such that R t −→

∗

t

and

R t

≈ x. Using this notation, we formally deﬁne ≈ for functions as follows:

R t ≈ f ↔ (∀ux.R u ↓ x → R t $ u ↓ fx)

Example. As a running example, we will use the map function on lists:

map f [] = []

map f (x # xs)=fx# map fxs

The result of embedding this function is a set of rules map

:

map’ =

{(Const ”List.list.map” $ Free ”f” $ (Const ”List.list.Cons” $ Free ”x21” $ Free ”x22”),

Const ”List.list.Cons” $ (Free ”f” $ Free ”x21”) $ ...),

(Const ”List.list.map” $ Free ”f” $ Const ”List.list.Nil”,

Const ”List.list.Nil”)}

together with the theorem map

Const "List.list.map" ↓ map,whichis

proven by simple induction over map. Constant names like "List.list.map"

come from the fully-qualiﬁed internal names in HOL.

The induction principle for the proof arises from the use of the fun command

that is used to deﬁne recursive functions in HOL [22]. But the user is also allowed

to specify custom equations for functions, in which case we will use heuristics

to generate and prove the appropriate induction theorem. For simplicity, we

will use the term (deﬁning) equation uniformly to refer to any set of equations,

either default ones or ones speciﬁed by the user. Embedding partially-speciﬁed

functions – in particular, proving the certiﬁcate theorem about them – is cur-

rently not supported. In the future, we plan to leverage the domain predicate as

produced by fun to generate conditional theorems.

4 Terms, Matching and Substitution

The compiler transforms the initial term type (Fig. 1a) through various inter-

mediate stages. This section gives an overview and introduces necessary

terminology.

Preliminaries. The function arrow in HOL is ⇒. The cons operator on lists is

the inﬁx #.

Throughout the paper, the concept of mappings is pervasive: We use the

type notation αβto denote a function α ⇒ β option. In certain contexts,

a mapping may also be called an environment. We write mapping literals using

brackets: [a ⇒ x, b ⇒ y,...]. If it is clear from the context that σ is deﬁned on

a, we often treat the lookup σaas returning an x :: β.

The functions dom :: (αβ) ⇒ α set and range :: (αβ) ⇒ β set return

the domain and range of a mapping, respectively.

1004 L. Hupel and T. Nipkow

Dropping entries from a mapping is denoted by σ − k, where σ is a mapping

and k is either a single key or a set of keys. We use σ

⊆ σ to denote that σ

is

a sub-mapping of σ,thatis,dom σ

⊆ dom σ and ∀a ∈ dom σ

.σ

a = σa.

Merging two mappings σ and ρ is denoted with σ ++ ρ. It constructs a new

mapping with the union domain of σ and ρ.Entriesfromρ override entries from

σ.Thatis,ρ ⊆ σ ++ ρ holds, but not necessarily σ ⊆ σ ++ ρ.

All mappings and sets are assumed to be ﬁnite. In the formalization, this is

enforced by using subtypes of and set. Note that one cannot deﬁne datatypes

by recursion through sets for cardinality reasons. However, for ﬁnite sets, it

is possible. This is required to construct the various term types. We leverage

facilities of Blanchette et al.’s datatype command to deﬁne these subtypes [7].

Standard Functions. All type constructors that we use (, set, list, option, ...)

support the standard operations map and rel. For lists, map is the regular covariant

map. For mappings, the function has the type (β ⇒ γ) ⇒ (αβ) ⇒ (αγ).

It leaves the domain unchanged, but applies a function to the range of the

mapping.

Function rel

τ

lifts a binary predicate P :: α ⇒ α ⇒ bool to the type construc-

tor τ. We call this lifted relation the relator for a particular type.

For datatypes, its deﬁnition is structural, for example:

rel

list

P [] []

rel

list

P xs ys Pxy

rel

list

P (x # xs)(y # ys)

For sets and mappings, the deﬁnition is a little bit more subtle.

Deﬁnition 1 (Set relator). For each element a ∈ A, there must be a corre-

sponding element b ∈ B such that Pab, and vice versa. Formally:

rel

set

PAB↔ (∀x ∈ A. ∃y ∈ B. P x y) ∧ (∀y ∈ B. ∃x ∈ A. P x y)

Deﬁnition 2 (Mapping relator). For each a, maand namust be related

according to rel

option

P . Formally:

rel

mapping

Pmn↔ (∀a. rel

option

P (ma)(na))

Term Types. There are four distinct term types: term, nterm, pterm,andsterm.

All of them support the notions of free variables, matching and substitution. Free

variables are always a ﬁnite set of strings. Matching a term against a pattern

yields an optional mapping of type string αfrom free variable names to terms.

Note that the type of patterns is itself term instead of a dedicated pattern

type. The reason is that we have to subject patterns to a linearity constraint

anyway and may use this constraint to carve out the relevant subset of terms:

Deﬁnition 3. Atermislinear if there is at most one occurrence of any variable,

it contains no abstractions, and in an application f $ x, f must not be a free

variable. The HOL predicate is called linear :: term ⇒ bool.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1005

Because of the similarity of operations across the term types, they are all

instances of the term type class. Note that in Isabelle, classes and types live

in diﬀerent namespaces. The term type and the term type class are separate

entities.

Deﬁnition 4. A term type τ supports the operations match :: term ⇒ τ ⇒

(string τ), subst :: (string τ) ⇒ τ ⇒ τ and frees :: τ ⇒ string set.We

also deﬁne the following derived functions:

– matchs matches a list of patterns and terms sequentially, producing a single

mapping

– closed t is an abbreviation for frees t = ∅

– closed σ is an overloading of closed, denoting that all values in a mapping are

closed

Additionally, some (obvious) axioms have to be satisﬁed. We do not strive to

fully specify an abstract term algebra. Instead, the axioms are chosen according

to the needs of this formalization.

A notable deviation from matching as discussed in term rewriting literature

is that the result of matching is only well-deﬁned if the pattern is linear.

Deﬁnition 5. An equation is a pair of a pattern (left-hand side) andaterm

(right-hand side). The pattern is of the form f $p

1

$...$p

n

,wheref is a constant

(i.e. of the form Const name). We refer to both f or name interchangeably as

the function symbol of the equation.

Following term rewriting terminology, we sometimes refer to an equation as rule.

4.1 De Bruijn terms (term)

The deﬁnition of term is almost an exact copy of Isabelle’s internal term type,

with the notable omissions of type information and schematic variables (Fig. 1a).

The implementation of β-reduction is straightforward via index shifting of bound

variables.

4.2 Named Bound Variables (nterm)

datatype nterm = Nconst string | Nvar string | Nabs string nterm | Napp nterm nterm

The nterm type is similar to term, but removes the distinction between bound

and free variables. Instead, there are only named variables. As mentioned in the

previous section, we forbid substitution of terms that are not closed in order

to avoid capture. This is also reﬂected in the syntactic side conditions of the

correctness proofs (Sect. 5.1).

1006 L. Hupel and T. Nipkow

4.3 Explicit Pattern Matching (pterm)

datatype pterm =

Pconst string | Pvar string | Pabs ((term × pterm) set) | Papp pterm pterm

Functions in HOL are usually deﬁned using implicit pattern matching, that is,

the terms p

i

occurring on the left-hand side f p

1

... p

n

of an equation must

be constructor patterns. This is also common among functional programming

languages like Haskell or OCaml. CakeML only supports explicit pattern match-

ing using case expressions. A function deﬁnition consisting of multiple deﬁning

equations must hence be translated to the form f = λx. case x of ....The

elimination proceeds by iteratively removing the last parameter in the block of

equations until none are left.

In our formalization, we opted to combine the notion of abstraction and case

expression, yielding case abstractions, represented as the Pabs constructor. This

is similar to the fn construct in Standard ML, which denotes an anonymous

function that immediately matches on its argument [28]. The same construct

also exists in Haskell with the LambdaCase language extension. We chose this

representation mainly for two reasons: First, it allows for a simpler language

grammar because there is only one (shared) constructor for abstraction and case

expression. Second, the elimination procedure outlined above does not have to

introduce fresh names in the process. Later, when translating to CakeML syntax,

fresh names are introduced and proved correct in a separate step.

The set of pairs of pattern and right-hand side inside a case abstraction is

referred to as clauses. As a short-hand notation, we use Λ{p

1

⇒ t

1

,p

2

⇒ t

2

,...}.

4.4 Sequential Clauses (sterm)

datatype sterm =

Sconst string | Svar string | Sabs ((term × sterm) list) | Sapp sterm sterm

In the term rewriting fragment of HOL, the order of rules is not signiﬁcant. If a

rule matches, it can be applied, regardless when it was deﬁned or proven. This

is reﬂected by the use of sets in the rule and term types. For CakeML, the rules

need to be applied in a deterministic order, i.e. sequentially. The sterm type only

diﬀers from pterm by using list instead of set. Hence, case abstractions use list

brackets: Λ[p

1

⇒ t

1

,p

2

⇒ t

2

,...].

4.5 Irreducible Terms (value)

CakeML distinguishes between expressions and values. Whereas expressions may

contain free variables or β-redexes, values are closed and fully evaluated. Both

have a notion of abstraction, but values diﬀer from expressions in that they

contain an environment binding free variables.

Consider the expression (λx.λy.x)(λz.z), which is rewritten (by β-reduction)

to λy.λz.z. Note how the bound variable x disappears, since it is replaced. This

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1007

is contrary to how programming languages are usually implemented: evaluation

does not happen by substituting the argument term t for the bound variable

x, but by recording the binding x → t in an environment [24]. A pair of an

abstraction and an environment is usually called a closure [25,41].

In CakeML, this means that evaluation of the above expression results in the

closure

(λy.x, ["x" → (λz.z, [])])

Note the nested structure of the closure, whose environment itself contains a

closure.

To reﬂect this in our formalization, we introduce a type value of values (expla-

nation inline):

datatype value =

(∗ constructor value: a data constructor applied to multiple values ∗)

Vconstr string (value list) |

(∗ closure: clauses combined with an environment mapping variables to values ∗)

Vabs ((term × sterm) list) (string value) |

(∗ recursive closures: a group of mutually recursive function bodies with an environment ∗)

Vrecabs (string ((term × sterm) list)) string (string value)

The above example evaluates to the closure:

Vabs

y⇒x

"x" → Vabs [z⇒z][]

The third case for recursive closures only becomes relevant when we conﬂate

variables and constants. As long as the rule set rs is kept separate, recursive calls

are straightforward: the appropriate deﬁnition for the constant can be looked up

there. CakeML knows no such distinction between constants and variables, hence

everything has to reside in a single environment σ.

Consider this example of odd and even:

odd 0=False even 0=True

odd (Suc n)=even n even (Suc n)=odd n

When evaluating the term odd k, the deﬁnitions of even and odd themselves

must be available in the environment captured in the deﬁnition of odd. However,

it would be cumbersome in HOL to construct such a Vabs that refers to itself.

Instead, we capture the expressions used to deﬁne odd and even in a recursive

closure. Other encodings might be possible, but since we are targeting CakeML,

we are opting to model it in a similar way as its authors do.

For the above example, this would result in the following global environment:

["odd" → Vrecabs css "odd" [], "even" → Vrecabs css "even" []]

where css =["odd" → [0⇒False ,

Suc n⇒even n],

"even" → [0⇒True , Suc n⇒odd n]]

1008 L. Hupel and T. Nipkow

Note that in the ﬁrst line, the right-hand sides are values, but in css, they

are expressions. The additional string argument of Vrecabs denotes the selected

function. When evaluating an application of a recursive closure to an argument

(β-reduction), the semantics adds all constituent functions of the closure to the

environment used for recursive evaluation.

5 Intermediate Semantics and Compiler Phases

In this section, we will discuss the progression from de Bruijn based term lan-

guage with its small-step semantics given in Fig. 1a to the ﬁnal CakeML seman-

tics. The compiler starts out with terms of type term and applies multiple

phases to eliminate features that are not present in the CakeML source language.

de Bruijn

terms

Named bound

variables

Explicit pattern

matching

Sequential

clauses

Evaluation

semantics

§5.2

§5.3

§5.4

§5.6

constructors :: string set (shared by all phases)

R :: (term × term) set, t, t

:: term

R t −→ t

(Figure 1b)

R :: (term × nterm) set, t, t

:: nterm

R t −→ t

(Figure 3)

R :: (string × pterm) set, t, t

:: pterm

R t −→ t

(Figure 4)

rs :: (string × sterm) list, t, t

:: sterm

rs t −→ t

(Figure 5)

rs :: (string × sterm) list, σ :: string sterm

t, u :: sterm

rs,σ t ↓ u (Figure 6)

§5.5

rs :: (string × value) list, σ :: string value

t :: sterm, u :: value

rs,σ t ↓ u (Figure 7)

σ :: string value

t :: sterm, u :: value

σ t ↓ u (Figure 8)

§5.7

Phase/Reﬁnement Types & Semantics

Theorem 1

see §5.3

see §5.4

Theorem 2

Theorem 1

Theorem 4

compiler phase; semantics reﬁnement

semantics belonging to the phase; semantics relation

Fig. 2. Intermediate semantics and compiler phases

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1009

Types term, nterm and pterm each have a small-step semantics only. Type sterm

has a small-step and several intermediate big-step semantics that bridge the gap

to CakeML. An overview of the intermediate semantics and compiler phases is

depicted in Fig. 2. The left-hand column gives an overview of the diﬀerent phases.

The right-hand column gives the types of the rule set and the semantics for each

phase; you may want to skip it upon ﬁrst reading.

Step

(lhs, rhs) ∈ R match lhs t = Some σ

R t −→ subst σ rhs

Beta

closed t

R (Λx. t)$t

−→ subst

x → t

t

Fig. 3. Small-step semantics for nterm with named bound variables

5.1 Side Conditions

All of the following semantics require some side conditions on the rule set. These

conditions are purely syntactic. As an example we list the conditions for the

correctness of the ﬁrst compiler phase:

– Patterns must be linear, and constructors in patterns must be fully applied.

– Deﬁnitions must have at least one parameter on the left-hand side (Sect. 5.6).

– The right-hand side of an equation refers only to free variables occurring in

patterns on the left-hand side and contain no dangling de Bruijn indices.

– There are no two deﬁning equations lhs = rhs

1

and lhs = rhs

2

such that

rhs

1

= rhs

2

.

– For each pair of equations that deﬁne the same constant, their arity must be

equal and their patterns must be compatible (Sect. 5.3).

– There is at least one equation.

– Variable names occurring in patterns must not overlap with constant names

(Sect. 5.7).

– Any occurring constants must either be deﬁned by an equation or be a con-

structor.

The conditions for the subsequent phases are suﬃciently similar that we do not

list them again.

In the formalization, we use named contexts to ﬁx the rules and assump-

tions on them (locales in Isabelle terminology). Each phase has its own locale,

together with a proof that after compilation, the preconditions of the next phase

are satisﬁed. Correctness proofs assume the above conditions on R and similar

conditions on the term that is reduced. For brevity, this is usually omitted in

our presentation.

1010 L. Hupel and T. Nipkow

5.2 Naming Bound Variables: From term to nterm

Isabelle uses de Bruijn indices in the term language for the following two rea-

sons: For substitution, there is no need to rename bound variables. Additionally,

α-equivalent terms are equal. In implementations of programming languages,

these advantages are not required: Typically, substitutions do not happen inside

abstractions, and there is no notion of equality of functions. Therefore CakeML

uses named variables and in this compilation step, we get rid of de Bruijn indices.

The “named” semantics is based on the nterm type. The rules that are

changed from the original semantics (Fig. 1b) are given in Fig. 3 (Fun and Arg

remain unchanged). Notably, β-reduction reuses the substitution function.

For the correctness proof, we need to establish a correspondence between

termsandnterms. Translation from nterm to term is trivial: Replace bound

variables by the number of abstractions between occurrence and where they

were bound in, and keep free variables as they are. This function is called

nterm

to term.

The other direction is not unique and requires introduction of fresh names

for bound variables. In our formalization, we have chosen to use a monad to

produce these names. This function is called term

to nterm. We can also prove

the obvious property nterm

to term (term to nterm t)=t, where t is a term

without dangling de Bruijn indices.

Generation of fresh names in general can be thought of as picking a string

that is not an element of a (ﬁnite) set of already existing names. For Isabelle,

the Nominal framework [42,43] provides support for reasoning over fresh names,

but unfortunately, its deﬁnitions are not executable.

Instead, we chose to model generation of fresh names as a monad α fresh

with the following primitive operations in addition to the monad operations:

run:: α fresh ⇒ string set ⇒ α

fresh

name:: string fresh

In our implementation, we have chosen to represent α fresh as roughly isomorphic

to the state monad.

Compilation of a rule set proceeds by translation of the right-hand side of all

rules:

compile R = {(p, term

to nterm t) | (p, t) ∈ R}

The left-hand side is left unchanged for two reasons: function match expects an

argument of type term (see Sect. 4), and patterns do not contain abstractions or

bound variables.

Theorem 1 (Correctness of compilation). Assuming a step can be taken

with the compiled rule set, it can be reproduced with the original rule set.

compile R t −→ u closed t

R nterm to term t −→ nterm to term u

We prove this by induction over the semantics (Fig. 3).

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1011

Beta

(pat , rhs) ∈ C match pat t = Some σ closed t

R (Λ C)$t subst σ rhs

Step’

(name, rhs) ∈ R

R Pconst name rhs

Fig. 4. Small-step semantics for pterm with pattern matching

5.3 Explicit Pattern Matching: From nterm to pterm

Usually, functions in HOL are deﬁned using implicit pattern matching, that is,

the left-hand side of an equation is of the form f p

1

... p

n

, where the p

i

are

patterns over datatype constructors. For any given function f, there may be

multiple such equations. In this compilation step, we transform sets of equations

for f deﬁned using implicit pattern matching into a single equation for f of the

form f = Λ C, where C is a set of clauses.

The strategy we employ currently requires successive elimination of a single

parameter from right to left, in a similar fashion as Slind’s pattern matching

compiler [38, Sect. 3.3.1]. Recall our running example (map). It has arity 2. We

omit the brackets for brevity. First, the list parameter gets eliminated:

map f = λ [] ⇒ []

| x # xs ⇒ fx# map fxs

Finally, the function parameter gets eliminated:

map = λf ⇒

λ [] ⇒ []

| x # xs ⇒ fx# map fxs

This has now arity 0 and is deﬁned by a twice-nested abstraction.

Semantics. The target semantics is given in Fig. 4 (the Fun and Arg rules

from previous semantics remain unchanged). We start out with a rule set R that

allows only implicit pattern matching. After elimination, only explicit pattern

matching remains. The modiﬁed Step rule merely replaces a constant by its

deﬁnition, without taking arguments into account.

Restrictions. For the transformation to work, we need a strong assumption

about the structure of the patterns p

i

to avoid the following situation:

map f [] = []

map g (x # xs)=gx# map gxs

Through elimination, this would turn into:

map = λf ⇒

λ [] ⇒ []

| g ⇒

λx# xs ⇒ fx# map fxs

1012 L. Hupel and T. Nipkow

Step

(name, rhs) ∈ R

R Sconst name rhs

Beta

ﬁrst

match cs t = Some (σ, rhs) closed t

R (Λ cs)$t subst σ rhs

Fig. 5. Small-step semantics for sterm

Even though the original equations were non-overlapping, we suddenly

obtained an abstraction with two overlapping patterns. Slind observed a similar

problem [38, Sect. 3.3.2] in his algorithm. Therefore, he only permits uniform

equations, as deﬁned by Wadler [36, Sect. 5.5]. Here, we can give a formal char-

acterization of our requirements as a computable function on pairs of patterns:

fun pat compat :: term ⇒ term ⇒ bool where

pat

compat (t

1

$ t

2

)(u

1

$ u

2

) ↔ pat compat t

1

u

1

∧ (t

1

= u

1

→ pat compat t

2

u

2

)

pat

compat tu↔ (overlapping tu→ t = u)

This compatibility constraint ensures that any two overlapping patterns (of the

same column) p

i,k

and p

j,k

are equal and are thus appropriately grouped together

in the elimination procedure. We require all deﬁning equations of a constant to be

mutually compatible. Equations violating this constraint will be ﬂagged during

embedding (Sect. 3), whereas the pattern elimination algorithm always succeeds.

While this rules out some theoretically possible pattern combinations (e.g.

the diagonal function [36, Sect. 5.5]), in practice, we have not found this to be a

problem: All of the function deﬁnitions we have tried (Sect. 8) satisﬁed pattern

compatibility (after automatic renaming of pattern variables). As a last resort,

the user can manually instantiate function equations. Although this will always

lead to a pattern compatible deﬁnition, it is not done automatically, due to the

potential blow-up.

Discussion. Because this compilation phase is both non-trivial and has some

minor restrictions on the set of function deﬁnitions that can be processed, we

may provide an alternative implementation in the future. Instead of eliminat-

ing patterns from right to left, patterns may be grouped in tuples. The above

example would be translated into:

map = λ (f, []) ⇒ []

| (f, x # xs) ⇒ fx# map fxs

We would then leave the compilation of patterns for the CakeML compiler, which

has no pattern compatibility restriction.

The obvious disadvantage however is that this would require the knowledge

of a tuple type in the term language which is otherwise unaware of concrete

datatypes.

5.4 Sequentialization: From pterm to sterm

The semantics of pterm and sterm diﬀer only in rule Step and Beta. Figure 5

shows the modiﬁed rules. Instead of any matching clause, the ﬁrst matching

clause in a case abstraction is picked.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1013

For the correctness proof, the order of clauses does not matter: we only need

to prove that a step taken in the sequential semantics can be reproduced in the

unordered semantics. As long as no rules are dropped, this is trivially true. For

that reason, the compiler orders the clauses lexicographically. At the same time

the rules are also converted from type (string × pterm) set to (string × sterm) list.

Below, rs will always denote a list of the latter type.

Const

(name, rhs) ∈ rs

rs,σ Sconst name ↓ rhs

Va r

σ name = Some v

rs,σ Svar name ↓ v

Abs

rs,σ Λ cs ↓ Λ [(pat , subst (σ − frees pat ) t | (pat ,t) ← cs]

Comb

rs,σ t ↓ Λ cs

rs,σ u ↓ u

ﬁrst match cs u

= Some (σ

, rhs) rs,σ++ σ

rhs ↓ v

rs,σ t $ u ↓ v

Constr

name ∈ constructors rs,σ t

1

↓ u

1

··· rs,σ t

n

↓ u

n

rs,σ Sconst name $ t

1

$ ...$ t

n

Sconst name $ u

1

$ ...$ u

n

Fig. 6. Big-step semantics for sterm

5.5 Big-Step Semantics for sterm

This big-step semantics for sterm is not a compiler phase but moves towards

the desired evaluation semantics. In this ﬁrst step, we reuse the sterm type for

evaluation results, instead of evaluating to the separate type value. This allows

us to ignore environment capture in closures for now.

All previous −→ relations were parametrized by a rule set. Now the big-step

predicate is of the form rs,σ t ↓ t

where σ :: string sterm is a variable

environment.

This semantics also introduces the distinction between constructors and

deﬁned constants. If C is a constructor, the term C t

1

... t

n

is evaluated to

C t

1

... t

n

where the t

i

are the results of evaluating the t

i

.

The full set of rules is shown in Fig. 6. They deserve a short explanation:

Const. Constants are retrieved from the rule set rs.

Var. Variables are retrieved from the environment σ.

Abs. In order to achieve the intended invariant, abstractions are evaluated to

their fully substituted form.

Comb. Function application t $ u ﬁrst requires evaluation of t into an abstrac-

tion Λ cs and evaluation of u into an arbitrary term u

. Afterwards, we look

for a clause matching u

in cs, which produces a local variable environment

σ

, possibly overwriting existing variables in σ. Finally, we evaluate the right-

hand side of the clause with the combined global and local variable environ-

ment.

Constr. For a constructor application C t

1

..., evaluate all t

i

. The set con-

structors is an implicit parameter of the semantics.

1014 L. Hupel and T. Nipkow

Const

(name, rhs) ∈ rs

rs,σ Sconst name ↓ rhs

Va r

σ name = Some v

rs,σ Svar name ↓ v

Abs

rs,σ Λ cs ↓ Vabs cs σ

Comb

rs,σ t ↓ Vabs cs σ

rs,σ u ↓ v ﬁrst match cs v = Some (σ

, rhs) rs,σ

++ σ

rhs ↓ v

rs,σ t $ u ↓ v

RecComb

rs,σ t ↓ Vrecabs css name σ

css name = Some cs rs,σ u ↓ v

ﬁrst

match cs v = Some (σ

, rhs) rs,σ

++ σ

rhs ↓ v

rs,σ t $ u ↓ v

Constr

name ∈ constructors rs,σ t

1

↓ v

1

··· rs,σ t

n

↓ v

n

rs,σ Sconst name $ t

1

$ ...$ t

n

Vconstr name [v

1

,...,v

n

]

Fig. 7. Evaluation semantics from sterm to value

Lemma 1 (Closedness invariant). If σ contains only closed terms, frees t ⊆

dom σ and rs,σ t ↓ t

, then t

is closed.

Correctness of the big-step w.r.t. the small-step semantics is proved easily by

induction on the former:

Lemma 2. For any closed environment σ satisfying frees t ⊆ dom σ,

rs,σ t ↓ u → rs subst σt−→

∗

u

By setting σ = [], we obtain:

Theorem 2 (Correctness). rs, [] t ↓ u ∧ closed t → rs t −→

∗

u

5.6 Evaluation Semantics: Reﬁning sterm to value

At this point, we introduce the concept of values into the semantics, while still

keeping the rule set (for constants) and the environment (for variables) separate.

The evaluation rules are speciﬁed in Fig. 7 and represent a departure from the

original rewriting semantics: a term does not evaluate to another term but to an

object of a diﬀerent type, a value. We still use ↓ as notation, because big-step

and evaluation semantics can be disambiguated by their types.

The evaluation model itself is fairly straightforward. As explained in Sect. 4.5,

abstraction terms are evaluated to closures capturing the current variable envi-

ronment. Note that at this point, recursive closures are not treated diﬀerently

from non-recursive closures. In a later stage, when rs and σ are merged, this

distinction becomes relevant.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1015

We will now explain each rule that has changed from the previous semantics:

Abs. Abstraction terms are evaluated to a closure capturing the current

environment.

Comb. As before, in an application t$u, t must evaluate to a closure Vabs cs σ

.

The evaluation result of u is then matched against the clauses cs, producing

an environment σ

. The right-hand side of the clause is then evaluated using

σ

++ σ

; the original environment σ is eﬀectively discarded.

RecComb. Similar as above. Finding the matching clause is a two-step process:

First, the appropriate clause list is selected by name of the currently active

function. Then, matching is performed.

Constr. As before, for an n-ary application C t

1

..., where C is a data con-

structor, we evaluate all t

i

. The result is a Vconstr value.

Conversion Between sterm and value. To establish a correspondence between

evaluating a term to an sterm and to a value, we apply the same trick as in

Sect. 5.2. Instead of specifying a complicated relation, we translate value back

to sterm: simply apply the substitutions in the captured environments to the

clauses.

The translation rules for Vabs and Vrecabs are kept similar to the Abs rule

from the big-step semantics (Fig. 6). Roughly speaking, the big-step semantics

always keeps terms fully substituted, whereas the evaluation semantics defers

substitution.

Similarly to Sect. 5.2, we can also deﬁne a function sterm

to value :: sterm ⇒

value and prove that one function is the inverse of the other.

Matching. The value type, instead of using binary function application as all

other term types, uses n-ary constructor application. This introduces a concep-

tual mismatch between (binary) patterns and values. To make the proofs easier,

we introduce an intermediate type of n-ary patterns. This intermediate type can

be optimized away by fusion.

Correctness. The correctness proof requires a number of interesting lemmas.

Lemma 3 (Substitution before evaluation). Assuming that a term t can

be evaluated to a value u given a closed environment σ, it can be evaluated to

the same value after substitution with a sub-environment σ

. Formally: rs,σ

t ↓ u ∧ σ

⊆ σ → rs,σ subst σ

t ↓ u

This justiﬁes the “pre-substitution” exhibited by the Abs rule in the big-step

semantics in contrast to the environment-capturing Abs rule in the evaluation

semantics.

Theorem 3 ( Correctness). Let σ beaclosedenvironmentandt a term which

only contains free variables in dom σ. Then, an evaluation to a value rs,σ t ↓ v

can be reproduced in the big-step semantics as rs

, map value to sterm σ t ↓

value

to sterm v, where rs

=[(name, value to sterm rhs) | (name, rhs) ← rs].

1016 L. Hupel and T. Nipkow

Instantiating the Correctness Theorem. The correctness theorem states

that, for any given evaluation of a term t with a given environment rs,σ con-

taining values, we can reproduce that evaluation in the big-step semantics using

a derived list of rules rs

and an environment σ

containing sterms that are gen-

erated by the value

to sterm function. But recall the diagram in Fig. 2.Inour

scenario, we start with a given rule set of sterms (that has been compiled from a

rule set of terms). Hence, the correctness theorem only deals with the opposite

direction.

It remains to construct a suitable rs such that applying value

to sterm to it

yields the given sterm rule set. We can exploit the side condition (Sect. 5.1) that

all bindings deﬁne functions, not constants:

Deﬁnition 6 (Global clause set). The mapping global

css :: string ((term×

sterm) list) is obtained by stripping the Sabs constructors from all deﬁnitions and

converting the resulting list to a mapping.

For each deﬁnition with name f we deﬁne a corresponding term v

f

= Vrecabs

global

css f []. In other words, each function is now represented by a recursive

closure bundling all functions. Applying value

to sterm to v

f

returns the original

deﬁnition of f.Letrs denote the original sterm rule set and rs

v

the environment

mapping all f’s to the v

f

’s.

The variable environments σ and σ

can safely be set to the empty mapping,

because top-level terms are evaluated without any free variable bindings.

Corollary 1 (Correctness). rs

v

, [] t ↓ v → rs, [] t ↓ value to sterm v

Note that this step was not part of the compiler (although rs

v

is computable)

but it is a reﬁnement of the semantics to support a more modular correctness

proof.

Example. Recall the odd and even example from Sect. 4.5. After compilation to

sterm, the rule set looks like this:

rs = {("odd", Sabs [0⇒False , Suc n⇒even n]),

("even", Sabs [0⇒True , Suc n⇒odd n])}

This can be easily transformed into the following global clause set:

global

css =["odd" → [0⇒False , Suc n⇒even n],

"even" → [0⇒True , Suc n⇒odd n]]

Finally, rs

v

is computed by creating a recursive closure for each function:

rs

v

=["odd" → Vrecabs global css "odd" [],

"even" → Vrecabs global

css "even" []]

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1017

Const

name /∈ constructors σ name = Some v

σ Sconst name ↓ v

Va r

σ name = Some v

σ Svar name ↓ v

Abs

σ Λ cs ↓ Vabs cs σ

Comb

σ t ↓ Vabs cs σ

σ u ↓ v ﬁrst match cs v = Some (σ

, rhs) σ

++ σ

rhs ↓ v

σ t $ u ↓ v

RecComb

σ t ↓ Vrecabs css name σ

css name = Some cs σ u ↓ v ﬁrst match cs v = Some (σ

, rhs)

σ

++ mk rec env css σ

++ σ

rhs ↓ v

σ t $ u ↓ v

Constr

name ∈ constructors σ t

1

↓ v

1

··· σ t

n

↓ v

n

σ Sconst name $ t

1

$ ...$ t

n

Vconstr name [v

1

,...,v

n

]

Fig. 8. ML-style evaluation semantics

5.7 Evaluation with Recursive Closures

CakeML distinguishes between non-recursive and recursive closures [30]. This

distinction is also present in the value type. In this step, we will conﬂate vari-

ables with constants which necessitates a special treatment of recursive closures.

Therefore we introduce a new predicate σ t ↓ v in Fig. 8 (in contrast to the

previous rs,σ t ↓ v). We examine the rules one by one:

Const/Var. Constant deﬁnition and variable values are both retrieved from

the same environment σ. We have opted to keep the distinction between

constants and variables in the sterm type to avoid the introduction of another

term type.

Abs. Identical to the previous evaluation semantics. Note that evaluation never

creates recursive closures at run-time (only at compile-time, see Sect. 5.6).

Anonymous functions, e.g. in the term map (λx. x), are evaluated to non-

recursive closures.

Comb. Identical to the previous evaluation semantics.

RecComb. Almost identical to the evaluation semantics. Additionally, for each

function (name, cs) ∈ css, a new recursive closure Vrecabs css name σ

is

created and inserted into the environment. This ensures that after the ﬁrst

call to a recursive function, the function itself is present in the environment to

be called recursively, without having to introduce coinductive environments.

Constr. Identical to the evaluation semantics.

Conﬂating Constants and Variables. By merging the rule set rs with the

variable environment σ, it becomes necessary to discuss possible clashes. Previ-

ously, the syntactic distinction between Svar and Sconst meant that x and x

are not ambiguous: all semantics up to the evaluation semantics clearly specify

1018 L. Hupel and T. Nipkow

where to look for the substitute. This is not the case in functional languages

where functions and variables are not distinguished syntactically.

Instead, we rely on the fact that the initial rule set only deﬁnes constants. All

variables are introduced by matching before β-reduction (that is, in the Comb

and RecComb rules). The Abs rule does not change the environment. Hence

it suﬃces to assume that variables in patterns must not overlap with constant

names (see Sect. 5.1).

Correspondence Relation. Both constant deﬁnitions and values of variables

are recorded in a single environment σ. This also applies to the environment

contained in a closure. The correspondence relation thus needs to take a diﬀerent

sets of bindings in closures into account.

Hence, we deﬁne a relation ≈

v

that is implicitly parametrized on the rule

set rs and compares environments. We call it right-conﬂating, because in a cor-

respondence v ≈

v

u, any bound environment in u is thought to contain both

variables and constants, whereas in v, any bound environment contains only

variables.

Deﬁnition 7 (Right-conﬂating correspondence). We deﬁne ≈

v

coinduc-

tively as follows:

v

1

≈

v

u

1

··· v

n

≈

v

u

n

Vconstr name [v

1

,...,v

n

] ≈

v

Vconstr name [u

1

,...,u

n

]

∀x ∈ frees cs.σ

1

x ≈

v

σ

2

x ∀x ∈ consts cs. rs x ≈

v

σ

2

x

Vabs cs σ

1

≈

v

Vabs cs σ

2

∀cs ∈ range css. ∀x ∈ frees cs.σ

1

x ≈

v

σ

2

x

∀cs ∈ range css. ∀x ∈ consts cs.σ

1

x ≈

v

(σ

2

++ mk rec env css σ

2

) x

Vrecabs css name σ

1

≈

v

Vrecabs css name σ

2

Consequently, ≈

v

is not reﬂexive.

Correctness. The correctness lemma is straightforward to state:

Theorem 4 (Correctness). Let σ be an environment, t beaclosedtermand

v a value such that σ t ↓ v. If for all constants x occurring in t,rsx ≈

v

σx

holds, then there is an u such that rs, [] t ↓ u and u ≈

v

v.

As usual, the rather technical proof proceeds via induction over the semantics

(Fig. 8). It is important to note that the global clause set construction (Sect. 5.6)

satisﬁes the preconditions of this theorem:

Lemma 4. If name is the name of a constant in rs, then

Vrecabs global

css name [] ≈

v

Vrecabs global css name []

Because ≈

v

is deﬁned coinductively, the proof of this precondition proceeds by

coinduction.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1019

5.8 CakeML

CakeML is a veriﬁed implementation of a subset of Standard ML [24,40]. It

comprises a parser, type checker, formal semantics and backend for machine

code. The semantics has been formalized in Lem [29], which allows export to

Isabelle theories.

Our compiler targets CakeML’s abstract syntax tree. However, we do not

make use of certain CakeML features; notably mutable cells, modules, and lit-

erals. We have derived a smaller, executable version of the original CakeML

semantics, called CupCakeML, together with an equivalence proof. The correct-

ness proof of the last compiler phase establishes a correspondence between Cup-

CakeML and the ﬁnal semantics of our compiler pipeline.

For the correctness proof of the CakeML compiler, its authors have extracted

the Lem speciﬁcation into HOL4 theories [1]. In our work, we directly target

CakeML abstract syntax trees (thereby bypassing the parser) and use its big-

step semantics, which we have extracted into Isabelle.

2

Conversion from sterm to exp. After the series of translations described in the

earlier sections, our terms are syntactically close to CakeML’s terms (Cake.exp).

The only remaining diﬀerences are outlined below:

– CakeML does not combine abstraction and pattern matching. For that reason,

we have to translate Λ [p

1

⇒ t

1

,...]intoΛx. case x of p

1

⇒ t

1

| ..., where x

is a fresh variable name. We reuse the fresh monad to obtain a bound variable

name. Note that it is not necessary to thread through already created variable

names, only existing names. The reason is simple: a generated variable is

bound and then immediately used in the body. Shadowing it somewhere in

the body is not problematic.

– CakeML has two distinct syntactic categories for identiﬁers (that can repre-

sent variables or functions) and data constructors. Our term types however

have two distinct syntactic categories for constants (that can represent func-

tions or data constructors) and variables. The necessary prerequisites to deal

with this are already present in the ML-style evaluation semantics (Sect. 5.7)

which conﬂates constants and variables, but has a dedicated Constr rule for

data constructors.

Types. During embedding (Sect. 3), all type information is erased. Yet, CakeML

performs some limited form of type checking at run-time: constructing and

matching data must always be fully applied. That is, data constructors must

always occur with all arguments supplied on right-hand and left-hand sides.

Fully applied constructors in terms can be easily guaranteed by simple pre-

processing. For patterns however, this must be ensured throughout the com-

pilation pipeline; it is (like other syntactic constraints) another side condition

imposed on the rule set (Sect. 5.1).

2

Based on a repository snapshot from March 27, 2017 (0c48672).

1020 L. Hupel and T. Nipkow

The shape of datatypes and constructors is managed in CakeML’s environ-

ment. This particular piece of information is allowed to vary in closures, since

ML supports local type deﬁnitions. Tracking this would greatly complicate our

proofs. Hence, we ﬁx a global set of constructors and enforce that all values use

exactly that one.

Correspondence R elation. We deﬁne two diﬀerent correspondence relations:

One for values and one for expressions.

Deﬁnition 8 (Expression correspondence)

Var

rel e (Svar n)(Cake.Var n)

Const

n/∈ constructors

rel e (Sconst n)(Cake.Var n)

Constr

n ∈ constructors rel e t

1

u

1

···

rel e (Sconst name $ t

1

$ ...$ t

n

)(Cake.Con (Some (Cake.Short name)[u

1

,...,u

n

]))

App

rel e t

1

u

1

rel e t

2

u

2

rel e t

1

$ t

2

Cake.App Cake.Opapp [u

1

,u

2

]

Fun

n/∈ ids (Λ [p

1

⇒ t

1

,...]) ∪ constructors

q

1

= mk ml pat p

1

rel e t

1

u

1

···

rel e (Λ [p

1

⇒ t

1

,...]) (Cake.Fun n (Cake.Mat (Cake.Var n)) [q

1

⇒ u

1

,...])

Mat

rel e tu q

1

= mk ml pat p

1

rel e t

1

u

1

···

rel e (Λ [p

1

⇒ t

1

,...]$t)(Cake.Mat u [q

1

⇒ u

1

,...])

We will explain each of the rules brieﬂy here.

Var. Variables are directly related by identical name.

Const. As described earlier, constructors are treated specially in CakeML. In

order to not confuse functions or variables with data constructors themselves,

we require that the constant name is not a constructor.

Constr. Constructors are directly related by identical name, and recursively

related arguments.

App. CakeML does not just support general function application but also unary

and binary operators. In fact, function application is the binary operator

Opapp. We never generate other operators. Hence the correspondence is

restricted to Opapp.

Fun/Mat. Observe the symmetry between these two cases: In our term lan-

guage, matching and abstraction are combined, which is not the case in

CakeML. This means we relate a case abstraction to a CakeML function con-

taining a match, and a case abstraction applied to a value to just a CakeML

match.

There is no separate relation for patterns, because their translation is simple.

The value correspondence (rel

v) is structurally simpler. In the case of con-

structor values (Vconstr and Cake.Conv), arguments are compared recursively.

Closures and recursive closures are compared extensionally, i.e. only bindings

that occur in the body are checked recursively for correspondence.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1021

Correctness. We use the same trick as in Sect. 5.6 to obtain a suitable envi-

ronment for CakeML evaluation based on the rule set rs.

Theorem 5 ( Correctness). If the compiled expression sterm

to cake t termi-

nates with a value u in the CakeML semantics, there is a value v such that

rel

v vuand rs t ↓ v.

6 Composition

The complete compiler pipeline consists of multiple phases. Correctness is justi-

ﬁed for each phase between intermediate semantics and correspondence relations,

most of which are rather technical. Whereas the compiler may be complex and

impenetrable, the trustworthiness of the constructions hinges on the obviousness

of those correspondence relations.

Fortunately, under the assumption that terms to be evaluated and the result-

ing values do not contain abstractions – or closures, respectively – all of the

correspondence relations collapse to simple structural equality: two terms are

related if and only if one can be converted to the other by consistent renaming

of term constructors.

The actual compiler can be characterized with two functions. Firstly, the

translation of term to Cake.exp is a simple composition of each term translation

function:

deﬁnition term to cake :: term ⇒ Cake.exp where

term

to cake = s term to cake ◦ pterm to sterm ◦ nterm to pterm ◦ term to nterm

Secondly, the function that translates function deﬁnitions by composing the

phases as outlined in Fig. 2, including iterated application of pattern elimination:

deﬁnition compile :: (term × term) fset ⇒ Cake.dec where

compile = Cake.Dletrec ◦ compile

srules to cake ◦ compile prules to srules ◦

compile

irules to srules ◦ compile irules iter ◦ compile crules to irules ◦

consts

of ◦ compile rules to nrules

Each function compile * corresponds to one compiler phase; the remaining func-

tions are trivial. This produces a CakeML top-level declaration. We prove that

evaluating this declaration in the top-level semantics (evaluate

prog) results in an

environment cake

sem env.Butcake sem env can also be computed via another

instance of the global clause set trick (Sect. 5.6).

Equipped with these functions, we can state the ﬁnal correctness theorem:

theorem compiled correct:

(∗ If CakeML evaluation of a term succeeds ... ∗)

assumes evaluate False cake

sem env s (term to cake t)(s’, Rval ml v)

(∗ ... producing a constructor term without closures ... ∗)

assumes cake

abstraction free ml v

(∗ ... and some syntactic properties of the involved terms hold ... ∗)

assumes closed t and ¬ shadows

consts (heads rs ∪ constructors) t and

welldeﬁned (heads rs ∪ constructors) t and wellformed t

(∗ ... then this evaluation can be reproduced in the term−rewriting semantics ∗)

shows rs t →

∗

cake to term ml v

1022 L. Hupel and T. Nipkow

class add =

ﬁxes plus :: ’a ⇒ ’a ⇒ ’a

deﬁnition f :: (’a::add) ⇒ ’a where

f x = plus xx

(a) Source program

datatype ’a dict

add = Dict add (’a ⇒ ’a ⇒ ’a)

fun cert

add :: (’a::add) dict add ⇒ bool where

cert

add (Dict add pls)=(pls =plus)

fun f’ :: ’a dict

add ⇒ ’a ⇒ ’a where

f’ (Dict

add pls) x = pls x x

lemma f’

eq: cert add dict → f’ dict =f

<proof>

(b) Result of translation

Fig. 9. Dictionary construction in Isabelle

This theorem directly relates the evaluation of a term t in the full CakeML

(including mutability and exceptions) to the evaluation in the initial higher-order

term rewriting semantics. The evaluation of t happens using the environment

produced from the initial rule set. Hence, the theorem can be interpreted as the

correctness of the pseudo-ML expression let rec rs in t.

Observe that in the assumption, the conversion goes from our terms to

CakeML expressions, whereas in the conclusion, the conversion goes the opposite

direction.

7 Dictionary Construction

Isabelle’s type system supports type classes (or simply classes)[18,44] whereas

CakeML does not. In order to not complicate the correctness proofs, type classes

are not supported by our embedded term language either. Instead, we eliminate

classes and instances by a dictionary construction [19] before embedding into the

term language. Haftmann and Nipkow give a pen-and-paper correctness proof

of this construction [17, Sect. 4.1]. We augmented the dictionary construction

with the generation of a certiﬁcate theorem that shows the equivalence of the

two versions of a function, with type classes and with dictionaries. This section

brieﬂy explains our dictionary construction.

Figure 9 shows a simple example of a dictionary construction. Type vari-

ables may carry class constraints (e.g. α :: add). The basic idea is that classes

become dictionaries containing the functions of that class; class instances become

dictionary deﬁnitions. Dictionaries are realized as datatypes. Class constraints

become additional dictionary parameters for that class. In the example, class

add becomes dict

add; function f is translated into f

which takes an additional

parameter of type dict

add. In reality our tool does not produce the Isabelle

source code shown in Fig. 9b but performs the constructions internally. The cor-

rectness lemma f

eq is proved automatically. Its precondition expresses that the

dictionary must contain exactly the function(s) of class add. For any monomor-

phic instance, the precondition can be proved outright based on the certiﬁcate

theorems proved for each class instance as explained next.

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1023

Not shown in the example is the translation of class instances. The basic

form of a class instance in Isabelle is τ::(c

1

,...,c

n

) c where τ is an n-ary type

constructor. It corresponds to Haskell’s (c

1

α

1

,...,c

n

α

n

) ⇒ c (τα

1

...α

n

)

and is translated into a function inst

c τ ::α

1

dict c

1

⇒ ··· ⇒ α

n

dict c

n

⇒

(α

1

,...,α

n

) τ dict c and the following certiﬁcate theorem is proved:

cert

c

1

dict

1

→···→cert c

n

dict

n

→ cert c (inst c τ dict

1

... dict

n

)

For a more detailed explanation of how the dictionary construction works, we

refer to the corresponding entry in the Archive of Formal Proofs [21].

8 Evaluation

We have tried out our compiler on examples from existing Isabelle formalizations.

This includes an implementation of Huﬀman encoding, lists and sorting, string

functions [39], and various data structures from Okasaki’s book [34], including

binary search trees, pairing heaps, and leftist heaps. These deﬁnitions can be

processed with slight modiﬁcations: functions need to be totalized (see the end

of Sect. 3). However, parts of the tactics required for deep embedding proofs

(Sect. 3) are too slow on some functions and hence still need to be optimized.

9 Conclusion

For this paper we have concentrated on the compiler from Isabelle/HOL to

CakeML abstract syntax trees. Partial correctness is proved w.r.t. the big-step

semantics of CakeML. In the next step we will link our work with the compiler

from CakeML to machine code. Tan et al. [40, Sect. 10] prove a correctness the-

orem that relates their semantics with the execution of the compiled machine

code. In that paper, they use a newer iteration of the CakeML semantics (func-

tional big-step [35]) than we do here. Both semantics are still present in the

CakeML source repository, together with an equivalence proof. Another impor-

tant step consists of targeting CakeML’s native types, e.g. integer numbers and

characters.

Evaluation of our compiled programs is already possible via Isabelle’s pred-

icate compiler [5], which allows us to turn CakeML’s big-step semantics into

an executable function. We have used this execution mechanism to establish for

sample programs that they terminate successfully. We also plan to prove that

our compiled programs terminate, i.e. total correctness.

The total size of this formalization, excluding theories extracted from Lem,

is currently approximately 20000 lines of proof text (90 %) and ML code (10 %).

The ML code itself produces relatively simple theorems, which means that there

are less opportunities for it to go wrong. This constitutes an improvement over

certifying approaches that prove complicated properties in ML.

1024 L. Hupel and T. Nipkow

References

1. The HOL System Description (2014). https://hol-theorem-prover.org/

2. Anand, A., Appel, A.W., Morrisett, G., Paraskevopoulou, Z., Pollack, R., B´elanger,

O.S., Sozeau, M., Weaver, M.: CertiCoq: a veriﬁed compiler for Coq. In: CoqPL

2017: Third International Workshop on Coq for Programming Languages (2017)

3. Augustsson, L.: Compiling pattern matching. In: Jouannnaud, J.P. (ed.) Functional

Programming Languages and Computer Architecture, pp. 368–381. Springer, Hei-

delberg (1985)

4. Benton, N., Hur, C.: Biorthogonality, step-indexing and compiler correctness. In:

Hutton, G., Tolmach, A.P. (eds.) ICFP 2009, pp. 97–108. ACM (2009)

5. Berghofer, S., Bulwahn, L., Haftmann, F.: Turning inductive into equational spec-

iﬁcations. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs

2009. LNCS, vol. 5674, pp. 131–146. Springer, Heidelberg (2009). https://doi.org/

10.1007/978-3-642-03359-9

11

6. Berghofer, S., Nipkow, T.: Executing higher order logic. In: Callaghan, P., Luo, Z.,

McKinna, J., Pollack, R., Pollack, R. (eds.) TYPES 2000. LNCS, vol. 2277, pp.

24–40. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45842-5

2

7. Blanchette, J.C., H¨olzl, J., Lochbihler, A., Panny, L., Popescu, A., Traytel, D.:

Truly modular (co)datatypes for Isabelle/HOL. In: Klein, G., Gamboa, R. (eds.)

ITP 2014. LNCS, vol. 8558, pp. 93–110. Springer, Cham (2014). https://doi.org/

10.1007/978-3-319-08970-6

7

8. Boespﬂug, M., D´en`es, M., Gr´egoire, B.: Full reduction at full throttle. In: Jouan-

naud, J.-P., Shao, Z. (eds.) CPP 2011. LNCS, vol. 7086, pp. 362–377. Springer,

Heidelberg (2011). https://doi.org/10.1007/978-3-642-25379-9

26

9. Boyer, R.S., Strother Moore, J.: Single-threaded objects in ACL2. In: Krishna-

murthi, S., Ramakrishnan, C.R. (eds.) PADL 2002. LNCS, vol. 2257, pp. 9–27.

Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45587-6

3

10. de Bruijn, N.G.: Lambda calculus notation with nameless dummies, a tool for

automatic formula manipulation, with application to the church-rosser theorem.

Indag. Math. (Proceedings) 75(5), 381–392 (1972)

11. Chlipala, A.: A veriﬁed compiler for an impure functional language. In:

Hermenegildo, M.V., Palsberg, J. (eds.) POPL 2010, pp. 93–106. ACM (2010)

12. Crow, J., Owre, S., Rushby, J., Shankar, N., Stringer-Calvert, D.: Evaluating, test-

ing, and animating PVS speciﬁcations. Technical report, Computer Science Labo-

ratory, SRI International, Menlo Park, CA, March 2001

13. Fallenstein, B., Kumar, R.: Proof-producing reﬂection for HOL. In: Urban, C.,

Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 170–186. Springer, Cham (2015).

https://doi.org/10.1007/978-3-319-22102-1

11

14. Flatau, A.D.: A veriﬁed implementation of an applicative language with dynamic

storage allocation. Ph.D. thesis, University of Texas at Austin (1992)

15. Forster, Y., Kunze, F.: Veriﬁed extraction from coq to a lambda-calculus. In: The

8th Coq Workshop (2016)

16. Greve, D.A., Kaufmann, M., Manolios, P., Moore, J.S., Ray, S., Ruiz-Reina, J.,

Sumners, R., Vroon, D., Wilding, M.: Eﬃcient execution in an automated reasoning

environment. J. Funct. Program. 18(1), 15–46 (2008)

17. Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In:

Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp.

103–117. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12251-

4

9

A Veriﬁed Compiler from Isabelle/HOL to CakeML 1025

18. Haftmann, F., Wenzel, M.: Constructive type classes in Isabelle. In: Altenkirch,

T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer,

Heidelberg (2007). https://doi.org/10.1007/978-3-540-74464-1

11

19. Hall, C.V., Hammond, K., Jones, S.L.P., Wadler, P.L.: Type classes in Haskell.

ACM Trans. Program. Lang. Syst. 18(2), 109–138 (1996)

20. Hermida, C., Reddy, U.S., Robinson, E.P.: Logical relations and parametricity - a

Reynolds programme for category theory and programming languages. Electron.

Notes Theoret. Comput. Sci. 303, 149–180 (2014)

21. Hupel, L.: Dictionary construction. Archive of Formal Proofs, May 2017. http://

isa-afp.org/entries/Dict

Construction.html, Formal proof development

22. Krauss, A.: Partial and nested recursive function deﬁnitions in higher-order logic.

J. Autom. Reason. 44(4), 303–336 (2010)

23. Krauss, A., Schropp, A.: A mechanized translation from higher-order logic to set

theory. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp.

323–338. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-

5

23

24. Kumar, R., Myreen, M.O., Norrish, M., Owens, S.: CakeML: a veriﬁed implemen-

tation of ML. In: POPL 2014, pp. 179–191. ACM (2014)

25. Landin, P.J.: The mechanical evaluation of expressions. Comput. J. 6(4), 308–320

(1964)

26. Leroy, X.: Formal veriﬁcation of a realistic compiler. Commun. ACM 52(7), 107–

115 (2009). http://doi.acm.org/10.1145/1538788.1538814

27. Letouzey, P.: A new extraction for Coq. In: Geuvers, H., Wiedijk, F. (eds.) TYPES

2002. LNCS, vol. 2646, pp. 200–219. Springer, Heidelberg (2003). https://doi.org/

10.1007/3-540-39185-1

12

28. Milner, R., Tofte, M., Harper, R., MacQueen, D.: The Deﬁnition of Standard ML

(Revised). MIT Press, Cambridge (1997)

29. Mulligan, D.P., Owens, S., Gray, K.E., Ridge, T., Sewell, P.: Lem: reusable engi-

neering of real-world semantics. In: ICFP 2014, pp. 175–188. ACM (2014)

30. Myreen, M.O., Owens, S.: Proof-producing translation of higher-order logic into

pure and stateful ML. JFP 24(2–3), 284–315 (2014)

31. Neis, G., Hur, C.K., Kaiser, J.O., McLaughlin, C., Dreyer, D., Vafeiadis, V.: Pil-

sner: a compositionally veriﬁed compiler for a higher-order imperative language.

In: ICFP 2015, pp. 166–178. ACM, New York (2015)

32. Nipkow, T., Klein, G.: Concrete Semantics. Springer, Cham (2014). https://doi.

org/10.1007/978-3-319-10542-0

33. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL—A Proof Assistant

for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.

org/10.1007/3-540-45949-9. 218p.

34. Okasaki, C.: Purely Functional Data Structures. Cambridge University Press, Cam-

bridge (1999)

35. Owens, S., Myreen, M.O., Kumar, R., Tan, Y.K.: Functional big-step semantics. In:

Thiemann, P. (ed.) ESOP 2016. LNCS, vol. 9632, pp. 589–615. Springer, Heidelberg

(2016). https://doi.org/10.1007/978-3-662-49498-1

23

36. Peyton Jones, S.L.: The Implementation of Functional Programming Languages.

Prentice-Hall Inc., Upper Saddle River (1987)

37. Shankar, N.: Static analysis for safe destructive updates in a functional language.

In: Pettorossi, A. (ed.) LOPSTR 2001. LNCS, vol. 2372, pp. 1–24. Springer, Hei-

delberg (2002). https://doi.org/10.1007/3-540-45607-4

1

38. Slind, K.: Reasoning about terminating functional programs. Ph.D. thesis, Tech-

nische Universit¨at M¨unchen (1999)

1026 L. Hupel and T. Nipkow

39. Sternagel, C., Thiemann, R.: Haskell’s show class in Isabelle/HOL. Archive of For-

mal Proofs, July 2014. http://isa-afp.org/entries/Show.html, Formal proof devel-

opment

40. Tan, Y.K., Myreen, M.O., Kumar, R., Fox, A., Owens, S., Norrish, M.: A new

veriﬁed compiler backend for CakeML. In: Proceedings of 21st ACM SIGPLAN

International Conference on Functional Programming - ICFP 2016. Association

for Computing Machinery (ACM) (2016)

41. Turner, D.A.: Some history of functional programming languages. In: Loidl, H.-W.,

Pe˜na, R. (eds.) TFP 2012. LNCS, vol. 7829, pp. 1–20. Springer, Heidelberg (2013).

https://doi.org/10.1007/978-3-642-40447-4

1

42. Urban, C.: Nominal techniques in Isabelle/HOL. J. Autom. Reason. 40(4), 327–356

(2008). https://doi.org/10.1007/s10817-008-9097-2

43. Urban, C., Berghofer, S., Kaliszyk, C.: Nominal 2. Archive of Formal Proofs, Febru-

ary 2013. Formal proof development: http://isa-afp.org/entries/Nominal2.shtml

44. Wenzel, M.: Type classes and overloading in higher-order logic. In: Gunter, E.L.,

Felty, A. (eds.) TPHOLs 1997. LNCS, vol. 1275, pp. 307–322. Springer, Heidelberg

(1997). https://doi.org/10.1007/BFb0028402

Open Access This chapter is licensed under the terms of the Creative Commons

Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),

which permits use, sharing, adaptation, distribution and reproduction in any medium

or format, as long as you give appropriate credit to the original author(s) and the

source, provide a link to the Creative Commons license and indicate if changes were

made.

The images or other third party material in this chapter are included in the chapter’s

Creative Commons license, unless indicated otherwise in a credit line to the material. If

material is not included in the chapter’s Creative Commons license and your intended

use is not permitted by statutory regulation or exceeds the permitted use, you will

need to obtain permission directly from the copyright holder.