Experience Report: The Next 1100 Haskell Programmers

Jasmin Christian Blanchette Lars Hupel Tobias Nipkow Lars Noschinski Dmitriy Traytel

Fakult

¨

at f

¨

ur Informatik, Technische Universit

¨

at M

¨

unchen, Germany

Abstract

We report on our experience teaching a Haskell-based functional

programming course to over 1100 students for two winter terms.

The syllabus was organized around selected material from various

sources. Throughout the terms, we emphasized correctness through

QuickCheck tests and proofs by induction. The submission archi-

tecture was coupled with automatic testing, giving students the pos-

sibility to correct mistakes before the deadline. To motivate the stu-

dents, we complemented the weekly assignments with an informal

competition and gave away trophies in a award ceremony.

Categories and Subject Descriptors D.1.1 [Programming Tech-

niques]: Applicative (Functional) Programming; D.3.2 [Program-

ming Languages]: Language Classiﬁcations—Applicative (func-

tional) languages; K.3.2 [Computers and Education]: Computer

and Information Science Education—Computer science education

General Terms Algorithms, Languages, Reliability

Keywords Haskell, functional programming, induction, testing,

monads, education, competition, QuickCheck, SmallCheck

1. Introduction

This paper reports on a mandatory Haskell-based functional pro-

gramming course at the Technische Universit

¨

at M

¨

unchen. In the

ﬁrst iteration (winter semester of 2012–2013), there were 619 stu-

dents enrolled. In the following winter semester (2013–2014), there

were 553 students enrolled. The course ran for 15 weeks with

one 90-minute lecture and one 90-minute tutorial each week. The

weekly homework was graded, but the ﬁnal grade was primarily

determined by the examination. To make the homework more at-

tractive, we coupled it with an informal programming competition.

The departmental course description does not prescribe a spe-

ciﬁc functional language but focuses on functional programming

in general. In the previous two years, the course had been based

on Standard ML. We have a strong ML background ourselves but

chose Haskell because of its simple syntax, large user community,

real-world appeal, variety of textbooks, and availability of Quick-

Check [3]. The one feature we could well have done without is lazy

evaluation; in fact, we wondered whether it would get in the way.

The course was mandatory for computer science (Informatik)

and information systems (Wirtschaftsinformatik) students. All had

learned Java in their ﬁrst semester. The computer science students

had also taken courses on algorithms and data structures, discrete

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

Haskell ’14, September 4–5, 2014, Gothenburg, Sweden.

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ACM 978-1-4503-3041-1/14/09. . . $15.00.

http://dx.doi.org/10.1145/2633357.2633359

mathematics, and linear algebra. The information systems students

had only had a basic calculus course and were taking discrete

mathematics in parallel.

The dramatis personae in addition to the students were lecturer

Tobias Nipkow, who designed the course, produced the slides,

1

and gave the lectures; Masters of TAs Lars Noschinski and Lars

Hupel, who directed a dozen teaching assistants (TAs) and took

care of the overall organization; furthermore the (Co)Masters of

Competition Jasmin Blanchette (MC) and Dmitriy Traytel (CoMC),

who selected competition problems and ranked the solutions.

2. Syllabus

The second iteration covered the following topics in order. (The

ﬁrst iteration was similar, with a few exceptions discussed below.)

Each topic was the subject of one 90-minute lecture unless other-

wise speciﬁed.

1. Introduction to functional programming [0.5 lecture]

2. Basic Haskell: Bool, QuickCheck, Integer and Int, guarded

equations, recursion on numbers, Char, String, tuples

3. Lists: list comprehension, polymorphism, a glimpse of the Pre-

lude, basic type classes (Num, Eq, Ord), pattern matching, re-

cursion on lists (including accumulating parameters and non-

primitive recursion); scoping rules by example [1.5 lectures]

4. Proof by structural induction on lists

5. Higher-order functions: map, filter, foldr, λ -abstractions,

extensionality, currying, more Prelude [2 lectures]

6. Type classes [0.5 lecture]

7. Algebraic datatypes: data by example, the general case, Boolean

formula case study, structural induction [1.5 lectures]

8. I/O, including ﬁles and web

9. Modules: module syntax, data abstraction, correctness proofs

10. Case study: Huffman coding

11. Lazy evaluation and inﬁnite lists

12. Complexity and optimization

13. Case study: parser combinators

Most topics were presented together with examples or smaller

case studies, of which we have only mentioned Boolean formulas.

Moreover, two topics kept on recurring: tests (using QuickCheck)

and proofs (by induction).

From day one, examples and case studies in class were accom-

panied by properties suitable for QuickCheck. Rather than concen-

trate all inductive proofs in the lecture about induction, we dis-

tributed them over the entire course and appealed to them whenever

it was appropriate. A typical example: In a case study, a function

is ﬁrst deﬁned via map myg . map myf and then optimized to map

(myg . myf), justiﬁed by a proof of map (g . f ) = map g . map f .

Much of the above material is uncontroversial and part of any

Haskell introduction, but some choices deserve some discussion.

1

www21.in.tum.de/teaching/info2/WS1314/slides.pdf

Induction. Against our expectations, induction was well under-

stood, as the examination conﬁrmed (Section 5). What may have

helped is that we gave the students a rigid template for inductions.

We went as far as requiring them to prove equations l

1

= r

1

not by

one long chain of equalities l

1

= ··· = l

m

= r

n

= ··· = r

1

but by two

reductions l

1

= · ·· = l

m

and r

1

= · ·· = r

n

. This avoids the strange

effect of having to shift to reverse gear halfway through the proof

of l

1

= r

1

. It must be stressed that we considered only structural

induction, that we generally did not expect the students to think

up auxiliary lemmas themselves, and that apart from extensionality

and induction all reasoning was purely equational.

In Haskell, there is the additional complication that proofs by

structural induction establish the property only for ﬁnite objects.

Some authors restrict the scope of their lemmas to ﬁnite lists of de-

ﬁned elements [15], while others prove reverse (reverse xs)

= xs without mentioning that it does not hold for partial or in-

ﬁnite lists [7]. Although some authors discuss ﬁnite partial ob-

jects and inﬁnite objects [6, 15], we avoided them in our course—

undeﬁnedness alone is a can of worms that we did not want to open.

Hence, we restricted ourselves to a total subset of Haskell in which

“fast and loose reasoning” [4] is sound.

Input/output and monads. In the ﬁrst iteration, I/O was covered

toward the end of the course because it is connected with the

advanced topic of monads. For much of the course, many students

may have had the impression that Haskell is only a gloriﬁed pocket

calculator. Therefore we moved I/O to an earlier point in the course.

We also dropped monads, since the majority had not grasped them.

This was not entirely satisfactory, because a minority of students

had been excited by monads and expressed their disappointment.

Abstraction functions. In the lecture on modules and data ab-

straction, we also showed how to prove correctness of data repre-

sentations (e.g., the representation of sets by lists). This requires

an abstraction function from the representation back to the abstract

type that must commute with all operations on the type. As the cor-

responding homework showed, we failed to convey this. In retro-

spect, it is outside the core functional programming syllabus, which

is why it is absent from all the textbooks. The topic still appeared

brieﬂy in the lecture in the second iteration, but without exercises.

Laziness. Haskell’s lazy evaluation strategy and inﬁnite objects

played only a very minor role and were introduced only toward

the end. Initially, we were worried that laziness might confuse

students when they accidentally stumble across it before it has been

introduced, but this was not reported as a problem by any of the

TAs. However, we could not give the students a good operational

model of the language without laziness: All they knew initially was

that equations were applied in some unspeciﬁed order. Even after

we had explained laziness, it remained unclear to many students

how exactly to determine what needs to be evaluated.

Complexity and optimization. Complexity considerations are se-

riously complicated by laziness. We found that the book by Bird [1]

offered the best explanation. For time complexity, he notes that as-

suming eager evaluation is easier and still gives an upper bound.

Therefore, we simply replaced lazy by eager evaluation for this lec-

ture. The principles then applied to most programming languages,

and one can cover key optimizations such as tail recursion.

Parser combinators. In the ﬁrst iteration of the course, the lecture

on parser combinators had been given about two thirds into the

course, but many students had failed to grasp it. As a result we

moved it to the end of the course for the second iteration. This

means that it could not be covered by an exercise sheet and we

have no hard feedback on how well the parser combinators were

understood. It should be noted that this is the ﬁrst time during their

studies that the students are exposed to the technicalities of parsing.

Figure 1. Status page with exercise points (where blue bars denote

the student’s points and black markers denote the median of all

students)

1 2 3 4

5 6

7 8 9 10 11 12 13

0%

20%

40%

60%

80%

100%

Sheet

2012–2013

2013–2014

Figure 2. Homework submissions relative to the number of en-

rolled students

3. Exercises

Each week we released an exercise sheet with group and homework

assignments. The main objective of the homework was to have the

students actually program in Haskell. The submission infrastruc-

ture periodically ran automatic tests, giving the students fast feed-

back and an opportunity to correct mistakes before the deadline.

3.1 Assignments

A typical assignment sheet contained between three and ﬁve group

exercises and about as many homework exercises. The group exer-

cises were solved in 90-minute tutorial groups. There were 25 or

26 such groups, each with up to 24 students. Each exercise fo-

cused on a speciﬁc concept from the week’s lecture. Many were

programming exercises, but some required the students to write

QuickCheck tests, carry out proofs, or infer an expression’s type.

The homework assignments, to be solved individually, covered

the same topics in more depth, sometimes in combination. They

were optional, but in the ﬁrst iteration of the course, the students

who collected at least 40% of the possible points were awarded a

bonus of 0.3 to the ﬁnal grade, on a scale from 1.0 (≈ A

+

) to 5.0

(≈ F). In the end, 281 students claimed the bonus. Furthermore, the

(Co)MCs nominated one of the assignments to count as part of the

competition (Section 4).

Much to our regret, there were several unpleasing and time-

consuming incidents with plagiarism (Section 3.4). Thus, we de-

cided to drop the bonus system in the second iteration of the course.

As a countermeasure against the anticipated decrease in the num-

ber of homework submissions, we tried to motivate the students by

providing a graphical overview of their homework grades in com-

parison with the median value of all submissions on the web (Fig-

ure 1) and ensured quick grading of the homeworks by the TAs.

Still, the decrease was severe (Figure 2): In the ﬁrst iteration, 75%

of the enrolled students submitted the ﬁrst homework; the number

dropped and ﬁnally stayed below 40% after sheet 10. In the second

iteration, it started with 50% and stayed below 20% after sheet 8.

Most of the exercises were well understood by those who did

them, perhaps as they conformed closely to the lectures. A few

important exceptions are noted below.

A group problem consisted of registering the polymorphic func-

tion type a -> b as an instance of the Num type class, so that ( f + g)

x == f x + g x and similarly for the other operations. Many students

did not understand what their task was, or why one would register

functions as numbers; and even those who understood the question

had to realize that b must be an instance of Num and ﬁght the prob-

lem’s higher-order nature. We had more success two weeks later

when we redid the exercise for a Fraction datatype and gently

explained why it makes sense to view fractions as numbers.

Less surprisingly, many students had issues with λ -abstractions.

They tended to use λ s correctly with map and filter (although

many preferred list comprehensions when given the choice), but

other exercises revealed the limits of their understanding. One

exercise required implementing a function fixpoint eq f x that

repeatedly applies f to x until f

n+1

x ‘eq‘ f

n

x and then using this

function to solve concrete problems. Another exercise featured a

deterministic ﬁnite automaton represented as a tuple, where the δ

component is represented by a Haskell function.

One difﬁculty we continually faced when designing exercises

is that the Internet provides too many answers. This was an issue

especially in the ﬁrst few weeks, when little syntax has been in-

troduced. We did our best to come up with fresh ideas and, failing

that, obfuscated some old ideas.

3.2 Submission and Testing Infrastructure

The university provides a central system for managing student sub-

missions, but we built our own infrastructure so that we could

couple it with automatic testing. Our submission system combines

standard Unix tools and custom scripts. The students were given

a secure shell (ssh) account on the submission server. They had

to upload their submissions following a simple naming convention.

The system generated test reports every 15 minutes using Quick-

Check. Many students appear to have improved their submissions

iteratively based on the system’s feedback. The ﬁnal reports were

made available to the TAs but had no direct effect on grading.

To increase the likelihood that the submissions compile with

the testing system, we provided a correctly named template ﬁle for

each assignment, including the necessary module declarations and

stub deﬁnitions f = undefined for the functions to implement.

Nonetheless, many students had problems with the naming scheme

(there are surprisingly many ways to spell “exercise”), causing their

submissions to be ignored. These problems went away after we

started providing a per-student graphical web page listing the status

of all their assignments and announced a stricter grading policy.

A few exercises required writing QuickCheck properties for a

function described textually. These properties had to take the func-

tion under test as argument, so that we could check them against se-

cret reference implementations. Since higher-order arguments had

not yet been introduced, we disguised the argument type using a

type synonym and put the boilerplate in the template ﬁle.

The test reports included the compilation status, the result of

each test, and enough information about the failed tests to identify

the errors. The tests themselves were not revealed, since they often

contained hints for a correct implementation. In cases where the

input of the test case did not coincide with the input of the tested

function, we had to explain this in the description or provide more

details using QuickCheck’s printTestCase function. Some care

was needed because the function under test can throw exceptions,

which are not caught by QuickCheck because of the lazy evalua-

tion of printTestCase’s argument. We used the Control.Spoon

package to suppress these exceptions.

To make the output more informative, we introduced an operator

==? that compares the expected and actual results and reports

mismatches using printTestCase.

We did not ﬁnd any fully satisfactory way to handle very slow

and nonterminating functions. QuickCheck’s within combinator

fails if a single test iteration takes too long, but these failures are

confusing for correct code. Instead, we limited the test process’s

runtime, potentially leaving students with a truncated report.

3.3 Test Design

As regular users of the Isabelle proof assistant [10], we had a lot of

experience with Isabelle’s version of QuickCheck [2]. The tool is

run automatically on each conjectured lemma as it is entered by the

user to exhibit ﬂaws, either in the lemma itself or in the underlying

speciﬁcation (generally a functional–logic program). Typically, the

lemmas arise naturally as part of the formalization effort and are

not designed to reveal bugs in the speciﬁcation.

We designed our Haskell tests to expose the most likely bugs

and capture the main properties of the function under test. We

usually also included a test against a reference implementation. We

soon found out that many bugs escaped the test suite because the

Haskell QuickCheck’s default setup is much less exhaustive than its

Isabelle namesake’s. For example, the Haskell random generator

tends to produce much larger integers than the Isabelle one; as a

result, random lists of integers rarely contain duplicates, which are

essential to test some classes of functions. Worse, for polymorphic

functions we did not realize immediately that type variables are

instantiated with the unit type () by default (a peculiar choice

to say the least). In contrast, Isabelle’s version of QuickCheck

supports random testing, exhaustive testing (cf. SmallCheck [12]),

and narrowing (cf. Lazy SmallCheck [12], Agsy [9]), the default

number of iterations is 250, and type variables are instantiated by

small types. The differences between the two QuickCheck versions

became painfully obvious with the competition exercises, as we

will see in Section 4.

Following these initial difﬁculties, the Masters of TAs were ap-

pointed Masters of Tests and put in charge of setting up the testing

framework properly. They immediately increased QuickCheck’s

number of iterations, decreased the maximum size parameter, and

regained control by deﬁning custom generators and instantiating

type variables with small types. They also started using Small-

Check to reliably catch bugs exposed by small counterexamples.

3.4 Plagiarism Detection

We considered it important to detect and deter plagiarism in the ﬁrst

year, both because individual bonuses should be earned individ-

ually and because learning functional programming requires doing

some programming on one’s own. Our policy was clear: Plagiarism

led to forfeiture of the bonus for all involved parties.

j_____.h___l

k____.g______g

v.v_______o

t_____.a____ l

d_____.p_______ i

s_____.w___ ____r

m_____.a_______ __r

e________k

f______.s____z

m_____.s______r

g_____k

l____.k____r

j_____.c.j___t

m______.l____c

s____.w____ ___n

g_____j

f______.w__ ___d

t__.h__g

n___.m_____r

6_________ _____7

j__.b_______r

p____.o___w

a___e

g_____h

a_____.b______ _d

s___________ _h

g_____g

f________.k__ _____r

d_____.m__z

m_____.b___r

b_______.l____ ___e

a______.b_____ __k

g_____c

r_____.u___n

Figure 3. Plagiarism graph excerpt

featuring the Pastebin clique

To identify pla-

giarists, we used

Moss [13] extended

with a custom shell

script to visual-

ize the results with

Graphviz [5]. The

resulting graph con-

nects pairs of sub-

missions with simi-

lar features, with thicker edges for stronger similarities. Figure 3

shows an anonymized excerpt of the output for week 3.

A noteworthy instance of unintended sharing is the complete

subgraph of thick edges in the middle of Figure 3. One of the

involved students has used Pastebin (http://pastebin.com/)

for his own purposes, without realizing that it would be indexed

by Google and picked up by other students.

Moss’s results are imprecise, with many false positives, so they

must be analyzed carefully. Functional programming often allows

short, canonical solutions. Unusual naming conventions, spacing,

or bugs are useful clues. One could have thought that the recent

German plagiarism scandals, which eventually cost two federal

ministers their Dr. title and minister position, would have cured the

country for some time. Sadly, we had to disqualify 29 students.

4. Competition

Our main inspiration for the programming competition has been

CADE’s Automated Theorem Prover System Competition [14],

organized by Geoff Sutcliffe since 1996. We have been entering

Isabelle since 2009 and have noticed the competition’s impact on

the theorem proving community. We were also moved by our late

colleague Piotr Rudnicki’s arguments in favor of contests [11]:

I am dismayed by the watering down of the curriculum

at CS departments which does not push the students to

their intellectual limits. This wastes a lot of talented people

who, under these conditions, have no chance to discover

how talented and capable they really are. The programming

contests attract a substantial fraction of the most talented

students that we have; I enjoy working with them and they

seem to enjoy doing it too.

The Heavenly Father, with his unique sense of humor,

has distributed the mental talents in a rather unpredictable

way. It is our role to discover these talents and make them

shine. If we do not do it, then we—the educators—will end

up in Hell. And I would rather not get there just for this

one reason.

For our own contest, each week we selected one of the pro-

gramming assignments as a competition problem. We also ﬁxed a

criterion for ranking the correct entries. By enclosing their solu-

tions within special tags, students became competitors. Each week,

rank i ∈ {1, . . . , 20} brought in 21 − i points. The ﬁve students cu-

mulating the most points were promised “tasteful” trophies.

Once the entries had been tested and ranked, we published the

names of the top 20 students on the competitions’ web pages

2

and

updated the cumulative top 20. To avoid legal issues regarding

privacy, we inserted a notice in the assignment sheets, making it

clear that the competition is an opt-in. The list of winners was

followed by a discussion of the most remarkable solutions, written

by the MC, the CoMC, or the Masters of TAs, in ironic self-

important third-person style.

An unexpected side effect of the competition is that it provided

a channel to introduce more advanced concepts, such as higher-

order functions, before they were seen in class. The criteria were

designed to raise the students’ awareness of engineering trade-offs,

including performance and scalability, even though these topics

were beyond the scope of the course.

As is to be expected, participation went down as the session

progressed. We tended to lose those students who were not in the

cumulative top 20, which is the reason why we extended it to a top

30 in the second iteration. The optional exercises attracted only the

hard core. We have the testimony of a student who, after gather-

ing enough points to secure the grade bonus, skipped the manda-

tory exercises to focus on the competition. Thus, our experience

corroborates Rudnicki’s: Contests motivate talented students, who

otherwise might not get the stimuli they need to perform.

The (Co)MCs revealed the last competition week’s results in an

award ceremony during the last lecture, handing out the trophies

and presenting characteristic code snippets from each winner.

Because of the various ranking criteria, and also because stu-

dents knew that their solutions could turn up on the web page, the

competition triggered much more diversity than usual assignments.

Ranking the solutions was fascinating. Each week, we had plenty

of material for the web page. The page was read by many students,

including some who were not even taking the course. A summary

of selected competition problems and solutions follows, originating

from the ﬁrst iteration of the course unless indicated otherwise.

2

www21.in.tum.de/teaching/info2/WS1213/wettbewerb.html,

www21.in.tum.de/teaching/info2/WS1314/wettbewerb.html

Week 1: Sum of Two Maxima’s Squares (434 Entrants)

Task: Write a function that adds the squares of the two largest of its

arguments x, y, z. Criterion: Token count (lower is better).

The task and criterion were modeled after a similar Scheme

exercise by Jacques Haguel. By having to keep the token count low,

students are encouraged to focus on the general case.

The winner’s solution had 13 tokens (excluding the left-hand

side and counting ‘max‘ as one):

max x y ^ 2 + min x y ‘max‘ z ^ 2

Here the concision was attained at the expense of simplicity, to the

point that we felt the need to verify the solution with Isabelle. Lists

appeared in several of the top 20 solutions:

sum $ tail $ (^2) ‘map‘ sort [x, y, z]

sum [x * x | x <- tail (sort [x, y, z])]

A few competitors exploited the explicit application operator $ to

save on parentheses ( f $ g x == f (g x)). Using only syntaxes

and functions seen in class, a 25-token solution was possible:

x * x + y * y + z * z - a * a

where a = min x (min y z)

The median solution had a 3-way case distinction. There were

plenty of 6-way distinctions, and one entry even featured a cor-

rect 10-way distinction using < and ==, complete with 64 needless

parentheses, totaling 248 tokens. This provided the ideal context

for the MC to quote Donald Knuth [8, p. 56] on the competition’s

web site: “The ability to handle lots of cases is Computer Science’s

strength and weakness. We are good at dealing with such complex-

ity, but we sometimes don’t try for unity when there is unity.”

To count tokens, we initially used a fast-and-frugal Perl script.

However, many students asked us to make the program available,

so we replaced it by a bullet-proof Haskell solution based on Niklas

Broberg’s lexical analyzer (Language.Haskell.Exts.Lexer).

Week 5: Quasi-subsequences (206 Entrants)

Task: Write a function that tests whether a list [x

1

,. . .,x

m

] is a

quasi-subsequence of a list ys, meaning that it is a (noncontiguous)

subsequence of ys or that there exists an index k such that [x

1

,. . .,

x

k−1

,x

k+1

,. . .,x

m

] is a subsequence of ys. Criterion: Speed.

Thomas Genet shared this example with us. The problem state-

ment mentioned that the MC’s solution took 0.4 s for quasiSubseq

[1 .. N] ([N] ++ [2 .. N − 1] ++ [1]) with N = 50 000.

To rank the solutions, we ran some additional QuickCheck tests,

ﬁltering out 43 incorrect solutions from the 143 that compiled and

passed all the ofﬁcial tests. Then we tried various examples, includ-

ing the one above with different Ns, and eliminated solutions that

reached the generous timeout. Some examples had a huge m and

a short ys. This produced 20 winners, whom we listed on the web

site. The algorithms varied greatly and were difﬁcult to understand.

One of the TAs, Manuel Eberl, contributed an automaton-based so-

lution. The MC’s solution had a dynamic programming ﬂavor.

The story does not end here. Having noticed more bugs while

ranking, we speculated that some of the top 20 entries might be in-

correct. Prompted to action by Meta-Master Nipkow, we rechecked

all 20 solutions using Isabelle’s implementation of QuickCheck and

found ﬂaws in 6 of them. We did not penalize their authors but took

a second look at Haskell’s testing capabilities (cf. Section 3.2).

Week 6: Email Address Anonymizer (163 Entrants)

Task: Write a function that replaces all email addresses in a text

by an anonymized version (e.g., p .q @f .c ). Criterion:

Closeness to the ofﬁcial deﬁnition of email address.

The task idea came from Koen Claessen. The statement sug-

gested a simple deﬁnition of email addresses, which is what most

students implemented, but pointed to RFCs 5321 and 5322 for a

more precise deﬁnition. Our goal was to see how students react to

unclear requirements. Of course, the RFCs partly contradicted each

other, and it was not clear whether the domains had to be validated

against the rules speciﬁed in earlier RFCs. It was also left to the

student’s imagination how to locate email addresses in the text.

This task was, by far, the most despised by the students. It was

also the most difﬁcult to rank to be fair to those who invested many

hours in it but failed some simple test we had designed to rank the

solutions. We revised the ranks upward to comfort the more vocal

participants, turning the top 20 into a top 45.

Weeks 8–9: Boolean Solver (Optional, 14 Entrants)

Task: Write a Boolean satisﬁability (SAT) solver. Criterion: Speed.

To avoid repeating the week 6 debacle, we suggested ﬁve opti-

mizations to a DPLL algorithm that would be evaluated in turn:

1. Eliminate pure positive variables.

2. Select short clauses before long ones.

3. Select frequent literals before infrequent ones.

4. Use a dedicated algorithm for Horn formulas.

5. Use a dedicated algorithm for 2CNF.

Obvious variants of these optimizations would be invoked to break

ties. The Meta-Master promised a Ph.D. degree for polynomial

solutions (to no avail).

For the evaluation, we needed to devise problems that can be

solved fast if and only if the heuristic is implemented. Showing the

absence of an optimization turned out to be much more difﬁcult

than we had anticipated, because the various optimizations interact

in complicated ways, and the exact mixture varied from solution

to solution. To make matters worse, often the optimizations were

implemented in a naive way that slowed down the solver (e.g.,

reprocessing the entire problem each time a literal is selected to

detect whether an optimization has become applicable).

Unlike for previous weeks, this problem gave no homework

points. In exchange, it was worth double (40 points), and the stu-

dents had two weeks to complete it. Also, we did not provide any

QuickCheck tests, leaving it to the students to think up their own

(“just like in real life”). There were 14 submissions to rank, as well

two noncompetitive entries by TA Eberl and one by the (Co)MCs.

We found bugs in 8 of the submissions, but gave these incorrect

solutions some consolation points to help populate the week’s “top

20.” The two best solutions implemented optimizations 1 to 4 and

pure negative literal elimination, but neither 2CNF nor dual-Horn.

Week 13: Programmatic Art (Optional, 10 Entrants)

Task: Write a program that generates a pixel or vector graphic in

some standard format. Criteria: Aesthetics and technique.

Figure 4. The winning

“Mondrian” entry

There were no constraints con-

cerning the subject of the picture or

its generation. While the imprecision

had annoyed the students in the email

anonymizer assignment, here it was

perceived as a most welcome artis-

tic freedom. The creations’ visual

nature was a perfect match for the

award ceremony, where the weekly

and ﬁnal results were presented.

The students were asked to turn

in both the program and a gener-

ated picture. The Meta-Master and

the Masters of TAs rated the aesthet-

ics on a scale from 1 to 10. The re-

maining 10 points were issued by the (Co)MCs for “technique,”

mostly as a safeguard against cheaters.

Two students “misunderstood” the exercise: one handed in

a generated ASCII-art text ﬁle, another used Network.Curl.

Download to retrieve Vincent van Gogh’s “Starry Night” from

Google. The latter secured the top score for aesthetics but was pun-

ished with a 2 for technique. The winner had been visibly inspired

by Piet Mondrian’s famous “Compositions” (Figure 4). His ran-

domized solution could generate arbitrarily many fake Mondrians.

We ran this challenge again in the second iteration. Interestingly

enough, a fair share of competitors produced animations, whereas

in the ﬁrst iteration, only static images were submitted. Both times,

fractals (like the Mandelbrot set) were popular among the students.

Weeks 11–12 (2nd iter.): Othello (Optional, 18 Entrants)

Task: Write a player based on artiﬁcial intelligence for Othello.

Criterion: Ranking in a tournament between all contestants.

According to the students, this was one of the most time-

consuming tasks, or—from a different viewpoint—one of the most

enjoyable tasks to invest time in. The task was inspired by Tjark

Weber. The contestants were allowed to turn in up to three different

programs. Most provided only one implementation, but alternated

some parameters (e.g., different values of depth for the game tree

search). This was also the ﬁrst and only exercise where students

had to adhere to a speciﬁed module interface, where we speciﬁed

some abstract types. There were 36 working programs in total. We

ran all possible pairs of them with a timeout of 20 minutes per

player per game. The evaluation of all 1260 games in the tourna-

ment took a few days on a cluster of 8 dual-core workstations. We

were surprised to see that there was little to no correlation between

the total running time of each player and their ranking—there were

both extremely fast and good solutions (the third place obtained

90% of the points of the second place but took only 2% of the

time), as well as slow and bad solutions.

5. Examination

The ﬁnal grades were based on a two-hour examination. Students

were allowed to bring one hand-written “cheat sheet.” They needed

40% to pass. The results were then translated to a 1.0 to 5.0 scale.

The 0.3 homework bonus was awarded only to those who had

passed the examination. No bonus was applied in the second iter-

ation. The correlation between homework points and examination

points is shown in Figure 5.

The problems were similar to the group exercises but avoided

more advanced or mundane topics (e.g., modules and data abstrac-

tion). The examination was designed so that the best students could

ﬁnish in one hour. Perhaps because we had no previous experience

in teaching Haskell, the marking revealed many surprises. Our im-

pressions are summarized below for seven problems:

0 100 200 300

0

10

20

30

40

Exercise points

Examination points

Linear regression

Examination passing score

Figure 5. Correlation between points in the exercises and in the

examination (second iteration)

1. Infer the types of given expressions. Many students who under-

stood types and type inference in practice had problems apply-

ing their knowledge in a more abstract context. They often for-

got to instantiate type variables or to remove the argument type

when applying a function. For example filter not :: [Bool]

-> [Bool] was often typed as [a] -> [a] or even (a -> Bool)

-> [a] -> [a]. Tellingly, one of the best students lost 2.5 of 5

points here, while answering all the other questions correctly.

2. Implement the same simple function using recursion, using

a list comprehension, and using higher-order functions (e.g.,

map, filter). The deﬁnitions based on a list comprehension

were usually correct. The corresponding map–filter version

proved more challenging. The recursive deﬁnitions were usu-

ally correct but sometimes lacked cases.

3. Implement a function that lifts a variable renaming function to a

logical formula datatype. The datatype featured both a directly

recursive constructor (for logical negation) and recursive con-

structors through lists (for n-ary conjunctions and disjunctions).

The recursion through the list, using map (rename f), confused

many (although it had been covered in several exercises). Some

solutions managed to change the shape of the formula, rewrit-

ing n-ary expressions into nested binary expressions. The pat-

tern matching syntax was also not universally understood, and

the constructors were often missing in the right-hand sides.

4. Prove map f (concat xss) = concat (map (map f ) xss).

The proof by induction posed little problems. Presumably the

students had the induction template on their cheat sheet. Quite

a few followed the template too slavishly, claiming to be doing

an induction on xs instead of xss. Another common mistake was

to take xss = [[]] as the base case.

5. Choose two test functions from a given set that together consti-

tute a complete test suite for a given function operating on lists.

There were seven tests to choose from: tests for the [], [x], and

x : xs cases, a distributivity law, a length law, and two properties

about the result list’s content. Obvious solutions were [] with x

: xs or [x] with distributivity, but there were many other com-

binations, most of which we discovered while marking. For ex-

ample, the length law implies the [] case, and the [x] and x : xs

cases together imply the [] case. Of all people, we should have

been alert to the dangers of axiomatic speciﬁcations. It would

have been easy to use Isabelle to prove or refute each of the

21 possible answers before sending the examination to press.

6. Evaluate given expressions step by step. The order of evaluation

was not understood by all. We were downright shocked by some

of the answers provided for (\x -> (\y -> (1 + 2) + x)) 4 5.

We were not prepared to see monstrosities such as (\4 -> (\5

-> (1 + 2) + 4)) as the end result.

7. Write an I/O program that reads the user’s input line by line

and prints the total number of vowels seen so far. The monadic

solutions were surprisingly good, perhaps due to the students’

familiarity with imperative programming. The main difﬁculty

was to keep track of the cumulative vowel count. Many so-

lutions simply printed the count of each line instead. Another

common mistake was to use the monadic syntax <- instead of

let to bind nonmonadic values.

Some statistics: 552 (493 in the second iteration) students regis-

tered for the exams. 432 (379) students took the examination. 334

(262) passed it. 39 (11) secured the top grade (1.0), with at least 47

out of 50 (38.5 out of 40) points. Five (one) had a perfect score. The

higher results in the ﬁrst iteration can be explained by the bonus,

which undoubtedly led to more motivation for doing homework and

better preparation for the examination. In total, this amounted to a

difference in passing rate of 8 percentage points.

6. Conclusion

Teaching functional programming using Haskell has been an enjoy-

able experience overall. As is usually the case for functional pro-

gramming, the feedback from students was mixed. If we have failed

to convince some of them of the value of functional programming,

we have also received many testimonies of students who have “seen

the light,” and some of the serious competitors told us the course

had been the most fun so far.

For future years, we plan to either leave out some of the more

advanced material or enhance its presentation. Type inference is

one topic we downplayed so far; it should be possible to present it

more rigorously without needing inference trees. On the infrastruc-

ture side, we developed tool support for simple proofs by induction,

in the form of a lightweight proof assistant and will integrate it into

our homework submission system in the upcoming third iteration.

Acknowledgments

We thank all the people involved in giving the course, including the

TAs and our colleagues from the Chair for Logic and Veriﬁcation.

We also thank Koen Claessen and Philip Wadler for sharing their

slides with us. Finally, Mark Summerﬁeld and the anonymous

reviewers suggested many textual improvements to this paper.

References

[1] R. Bird. Introduction to Functional Programming using Haskell.

Prentice Hall, 1998. Second edition.

[2] L. Bulwahn. The new Quickcheck for Isabelle—Random, exhaustive

and symbolic testing under one roof. In C. Hawblitzel and D. Miller,

editors, CPP 2012, volume 7679 of LNCS, pages 92–108. Springer,

2012.

[3] K. Claessen and J. Hughes. QuickCheck: A lightweight tool for

random testing of Haskell programs. In ICFP ’00, pages 268–279.

ACM, 2000.

[4] N. A. Danielsson, J. Hughes, P. Jansson, and J. Gibbons. Fast and

loose reasoning is morally correct. In J. G. Morrisett and S. L. P.

Jones, editors, POPL 2006, pages 206–217. ACM, 2006.

[5] J. Ellson, E. R. Gansner, E. Koutsoﬁos, S. C. North, and

G. Woodhull. Graphviz—Open source graph drawing tools. In

Graph Drawing, pages 483–484, 2001.

[6] P. Hudak. The Haskell School of Expression. Cambridge University

Press, 2000.

[7] G. Hutton. Programming in Haskell. Cambridge University Press,

2007.

[8] D. E. Knuth, T. L. Larrabee, and P. M. Roberts. Mathematical

Writing. Mathematical Association of America, 1989.

[9] F. Lindblad. Property directed generation of ﬁrst-order test data. In

M. Moraz

´

an, editor, TFP 2007, pages 105–123. Intellect, 2008.

[10] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Proof

Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer,

2002.

[11] P. Rudnicki. Why I do it? http://webdocs.cs.ualberta.ca/

~

piotr/ProgContest/why.txt. Accessed 25 Feb. 2013.

[12] C. Runciman, M. Naylor, and F. Lindblad. SmallCheck and Lazy

SmallCheck: Automatic exhaustive testing for small values. In

A. Gill, editor, Haskell 2008, pages 37–48. ACM, 2008.

[13] S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: Local

algorithms for document ﬁngerprinting. In A. Y. Halevy, Z. G. Ives,

and A. Doan, editors, SIGMOD Conference, pages 76–85. ACM,

2003.

[14] G. Sutcliffe and C. B. Suttner. The state of CASC. AI Commun., 19

(1):35–48, 2006.

[15] S. Thompson. Haskell, the craft of functional programming. Addison

Wesley, 2011. Third edition.