falsify: Internal Shrinking Reimagined for Haskell

Edsko de Vries, Haskell Symposium 2023
Published on July 15, 2023

Abstract: In unit testing we apply the function under test to known inputs and check for known outputs. By contrast, in property based testing we state properties relating inputs and outputs, apply the function to random inputs, and verify that the property holds; if not, we found a bug. Randomly generated inputs tend to be large and should therefore be minimised. Traditionally this is done with an explicitly provided shrinker, but in this paper we propose a way to write generators that obsoletes the need to write a separate shrinker. Inspired by the Python library Hypothesis, the approach can work even across monadic bind. Compared to Hypothesis, our approach is more suitable to the Haskell setting: it depends on a minimal set of core principles, and handles generation and shrinking of infinite data structures, including functions.

Searching Entangled Program Spaces

James Koppel, Zheng Guo, Edsko de Vries, Armando Solar-Lezama, Nadia Polikarpova, 27th ACM SIGPLAN International Conference on Functional Programming (ICFP 2022)
Published on September 13, 2022

Abstract: Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures – version-space algebras, finite tree automata, or e-graphs – to compactly represent these programs. To find a compact representation, existing data structures exploit independence of subterms; they blow up when the choices of subterms are entangled. We introduce equality-constrained tree automata (ECTAs), a generalization of the three aforementioned data structures that can efficiently represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, ecta. Using ecta we construct Hectare, a type-driven program synthesizer for Haskell. Hectare significantly outperforms a state-of-the-art synthesizer Hoogle+ – providing an average speedup of 8x – despite its implementation being an order of magnitude smaller.

True Sums of Products

Edsko de Vries and Andres Löh, 10th ACM SIGPLAN Workshop on Generic Programming (WGP 2014)
Published on August 31, 2014

Abstract: We introduce the sum-of-products (SOP) view for datatype-generic programming (in Haskell). While many of the libraries that are commonly in use today represent datatypes as arbitrary combinations of binary sums and products, SOP reflects the structure of datatypes more faithfully: each datatype is a single n-ary sum, where each component of the sum is a single n-ary product. This representation turns out to be expressible accurately in GHC with today’s extensions. The resulting list-like structure of datatypes allows for the definition of powerful high-level traversal combinators, which in turn encourage the definition of generic functions in a compositional and concise style. A major plus of the SOP view is that it allows to separate function-specific metadata from the main structural representation and recombining this information later.

Compositional Reasoning for Explicit Resource Management in Channel-Based Concurrency

Adrian Francalanza, Edsko de Vries and Matthew Hennessy, Logic in Computer Science (LMCS) 10 (2:15) 2014
Published on June 24, 2014

Abstract: We define a pi-calculus variant with a costed semantics where channels are treated as resources that must explicitly be allocated before they are used and can be deallocated when no longer required. We use a substructural type system tracking permission transfer to construct coinductive proof techniques for comparing behaviour and resource usage efficiency of concurrent processes. We establish full abstraction results between our coinductive definitions and a contextual behavioural preorder describing a notion of process efficiency w.r.t. its management of resources. We also justify these definitions and respective proof techniques through numerous examples and a case study comparing two concurrent implementations of an extensible buffer.

Modelling Unique and Affine Typing using Polymorphism

Edsko de Vries, "The Beauty of Functional Code", festschrift for Rinus Plasmeijer on the occassion of his 61th birthday, presented at IFL 2013
Published on August 28, 2013

Abstract: Uniqueness typing and affine (or linear) typing are dual type systems. Uniqueness gives a guarantee that an term has not been shared, while affinity imposes a restriction that a term may not be shared. We show that we can unify both concepts through polymorphism.

Locally Nameless Permutation Types

Edsko de Vries and Vasileios Koutavas, rejected from ITP 2012
Published on August 12, 2012

Abstract: We define “Locally Nameless Permutation Types”, which fuse permutation types as used in Nominal Isabelle with the locally nameless representation. We show that this combination is particularly useful when formalizing programming languages where bound names may become free during execution (“extrusion”), common in process calculi. It inherits the generic definition of permutations and support, and associated lemmas, from the Nominal approach, and the ability to stay close to pencil-and-paper proofs from the locally nameless approach. We explain how to use cofinite quantification in this setting, show why reasoning about renaming is more important here than in languages without extrusion, and provide results about infinite support, necessary when reasoning about countable choice.

A Practical Solution for Achieving Language Compatibility in Scripting Language Compilers

Paul Biggar, Edsko de Vries, David Gregg, Science of Computer Programming Volume 77, Issue 9, 1 August 2012, Pages 971–989
Published on August 1, 2012

Abstract: Although scripting languages have become very popular, even mature scripting language implementations remain interpreted. Several compilers and reimplementations have been attempted, generally focusing on performance.

Based on our survey of these reimplementations, we determine that there are three important features of scripting languages that are difficult to compile or reimplement. Since scripting languages are defined primarily through the semantics of their original implementations, they often change semantics between releases. They provide C APIs, used both for foreign-function interfaces and to write third-party extensions. These APIs typically have tight integration with the original implementation, and are used to provide large standard libraries, which are difficult to re-use, and costly to reimplement. Finally, they support run-time code generation. These features make it difficult to design a fully compatible compiler.

We present a technique to support these features in an ahead-of-time compiler for PHP. Our technique uses the original PHP implementation through the provided C API, both in our compiler and in our generated code. We support all of these important scripting language features. Additionally, our approach allows us to automatically support limited future language changes. We present a discussion and performance evaluation of this technique.

Uniqueness typing for resource management in message-passing concurrency

Edsko de Vries, Adrian Francalanza and Matthew Hennessy, J Logic Computation (2014) 24 (3): 531-556.
Published on June 6, 2012

Abstract: We view channels as the main form of resources in a message-passing programming paradigm. These channels need to be carefully managed in settings where resources are scarce. To study this problem, we extend the pi-calculus with primitives for channel allocation and deallocation and allow channels to be reused to communicate values of different types. Inevitably, the added expressiveness increases the possibilities for runtime errors. We define a substructural type system, which combines uniqueness typing and affine typing to reject these ill-behaved programs.

Reverse Hoare Logic

Edsko de Vries and Vasileios Koutavas, G. Barthe, A. Pardo and G. Schneider (Ed.): SEFM 2011, LNCS 7041, pp. 155-171, 2011
Published on November 14, 2011

Abstract: We present a novel Hoare-style logic, called Reverse Hoare Logic, which can be used to reason about state reachability of imperative programs. This enables us to give natural specifications to randomized (deterministic or nondeterministic) algorithms. We give a proof system for the logic and use this to give simple formal proofs for a number of illustrative examples. We define a weakest postcondition calculus and use this to show that the proof system is sound and complete.

Reasoning about Explicit Resource Management (Abstract)

Edsko de Vries, Adrian Francalanza and Matthew Hennessy, PLACES 2011
Published on April 2, 2011

Abstract: We define a pi-calculus variant with a costed semantics where channels are treated as resources that must explicitly be allocated before they are used and can be deallocated when no longer required. We use a substructural type system tracking permission transfer to construct compositional proof techniques for comparing behaviour and resource usage efficiency of concurrent processes.

Liveness of Communicating Transactions (Extended Abstract)

Edsko de Vries, Vasileios Koutavas and Matthew Hennessy, K. Ueda (Ed.): APLAS 2010, LNCS 6461, pp. 392—407, 2010
Published on November 28, 2010

Abstract: We study liveness and safety in the context of CCS extended with communicating transactions, a construct we recently proposed to model automatic error recovery in distributed systems. We show that fair-testing and may-testing capture the right notions of liveness and safety in this setting, and argue that must-testing imposes too strong a requirement in the presence of transactions. We develop a sound and complete theory of fair-testing in terms of CCS-like tree failures and show that, compared to CCS, communicating transactions provide increased distinguishing power to the observer. We also show that weak bisimilarity is a sound, though incomplete, proof technique for both may- and fair-testing. To the best of our knowledge this is the first semantic treatment of liveness in the presence of transactions. We exhibit the usefulness of our theory by proving illuminating liveness laws and simple but non-trivial examples.

Formal Polytypic Programs and Proofs

Wendy Verbruggen, Edsko de Vries and Arthur Hughes, Journal of Functional Programming, Volume 20, Special Issue 3-4, pp 213-269
Published on September 23, 2010

Abstract: The aim of our work is to be able to do fully formal (machine verified) proofs over Generic Haskell-style polytypic programs. In order to achieve this goal, we embed polytypic programming in the proof assistant Coq and provide an infrastructure for polytypic proofs. Polytypic functions are reified within Coq as a datatype PolyFn and can be specialized by applying an ordinary (dependently typed) function specTerm. Polytypic functions are thus first class citizens and can be passed as arguments or returned as results. Likewise, we reify polytypic proofs as a datatype PolyProof, and provide a lemma specProof (analogous to specTerm) that a polytypic proof can be specialized to any datatype in the universe. The correspondence between polytypic functions and their polytypic proofs is very clear: programmers need to give proofs for, and only for, the same cases that they need to give instances for when they define the polytypic function itself. Finally, we discuss how we can write (co)recursive functions and do (co)recursive proofs in a similar way that recursion is handled in Generic Haskell.

Communicating Transactions (Extended Abstract)

Edsko de Vries, Vasileios Koutavas and Matthew Hennessy, Paul Gastin and François Laroussinie (Eds.): CONCUR 2010, LNCS 6269, pp. 569-583, 2010
Published on August 31, 2010

Abstract: We propose a novel language construct called communicating transactions, obtained by dropping the isolation requirement from classical transactions, which can be used to model automatic error recovery in distributed systems. We extend CCS with this construct and give a simple semantics for the extended calculus, called TransCCS. We develop a behavioural theory which is sound and complete with respect to the may-testing preorder, and use it to prove interesting laws and reason compositionally about example systems. Finally, we prove that communicating transactions do not increase the observational power of processes; thus CCS equivalences are preserved in the extended language.

Uniqueness Typing for Resource Management in Message-Passing Concurrency

Edsko de Vries, Adrian Francalanza and Matthew Hennessy, M. Florido and I. Mackie (Eds.): First International Workshop on Linearity (LINEARITY 2009). Electronic Proceedings in Theoretical Computer Science, 22, 2010, pp. 26-37
Published on September 12, 2009

Abstract: We view channels as the main form of resources in a message-passing programming paradigm. These channels need to be carefully managed in settings where resources are scarce. To study this problem, we extend the pi calculus with primitives for channel allocation and deallocation and allow channels to be reused to communicate values of different types. Inevitably, the added expressivity increases the possibilities for runtime errors. We define a substructural type system which combines uniqueness typing and affine typing to reject these ill-behaved programs.

Polytypic Properties and Proofs in Coq

Wendy Verbruggen, Edsko de Vries and Arthur Hughes, WGP '09: Proceedings of the 2009 ACM SIGPLAN workshop on Generic programming, Edinburgh, UK. August 2009
Published on August 30, 2009

Abstract: We formalize proofs over Generic Haskell-style polytypic programs in the proof assistant Coq. This makes it possible to do fully formal (machine verified) proofs over polytypic programs with little effort. Moreover, the formalization can be seen as a machine verified proof that polytypic proof specialization is correct with respect to polytypic property specialization.

A Practical Solution for Scripting Language Compilers

Paul Biggar, Edsko de Vries and David Gregg, SAC '09: Proceedings of the 2009 ACM Symposium on Applied Computing
Published on March 9, 2009

Abstract: Although scripting languages are becoming increasingly popular, even mature scripting language implementations remain interpreted. Several compilers and reimplementations have been attempted, generally focusing on performance.

Based on our survey of these reimplementations, we determine that there are three important features of scripting languages that are difficult to compile or reimplement. Since scripting languages are defined primarily through the semantics of their original implementations, they often change semantics between releases. They provide large standard libraries, which are difficult to re-use, and costly to reimplement. They provide C APIs, used both for foreignfunction- interfaces and to write third-party extensions. These APIs typically have tight integration with the original implementation. Finally, they support run-time code generation. These features make the important goal of correctness difficult to achieve.

We present a technique to support these features in an ahead-oftime compiler for PHP. Our technique uses the original PHP implementation through the provided C API, both in our compiler, and an our generated code. We support all of these important scripting language features, particularly focusing on the correctness of compiled programs. Additionally, our approach allows us to automatically support limited future language changes. We present a discussion and performance evaluation of this technique, which has not previously been published.

Making Uniqueness Typing Less Unique

Edsko de Vries, PhD thesis
Published on December 14, 2008

Abstract: The PhD thesis contains expanded versions of the three papers mentioned below (Uniqueness Typing Redefined, Equality Based Uniqueness Typing and Uniqueness Typing Simplified) and an in-depth discussion of avenues for future work, as well as a detailed introduction to uniqueness typing and substructural logics in general (including a comparison to Haskell's monadic approach) and a survey of related work in the area.

Polytypic Programming in Coq

Wendy Verbruggen, Edsko de Vries, and Arthur Hughes, Ralf Hinze. (Ed.): ACM SIGPLAN Workshop on Generic Programming 2008, pp. 49-60
Published on September 20, 2008

Abstract: The aim of our work is to provide an infrastructure for formal proofs over Generic Haskell-style polytypic programs. For this goal to succeed, we must have a definition of polytypic programming which is both fully formal and as close as possible to the definition in Generic Haskell. In this paper we show a formalization in the proof assistant Coq of type and term specialization. Our definition of term specialization can be interpreted as a formal proof that the result of term specialization has the type computed by type specialization.

Uniqueness Typing Simplified—Technical Appendix

Edsko de Vries, Technical report TCD-CS-2008-19
Published on August 13, 2008

Abstract: This technical report is an appendix to Uniqueness Typing Simplified, in which we show how uniqueness typing can be simplified by treating uniqueness attributes as types of a special kind, allowing arbitrary boolean expressions as attributes, and avoiding subtyping. In the paper, we define a small core uniqueness type system (a derivative of the simply typed lambda calculus) that incorporates these ideas. We also outline how soundness with respect to the call-by-need semantics can be proven, but we do not give any details. This report describes the entire proof, which is written using the proof assistant Coq.

Processing ASTs in C++: maketea

Edsko de Vries, John Gilbert and David Abrahamson, rejected from CC 2008
Published on March 29, 2008

Abstract: We present maketea, a tool which generates a C++ infrastructure for processing ASTs based on an object oriented context free grammar. The generated code includes a class hierarchy for storing ASTs with support for cloning, equality checking, pattern matching, a general visitor API and a transformation API. The tool is available under an open source license and can be downloaded from www.maketea.org.

Uniqueness Typing Simplified

Edsko de Vries, Rinus Plasmeijer, and David Abrahamson, Olaf Chitil, Zoltán Horváth and Viktória Zsók (Eds.): IFL 2007, LNCS 5083, pp. 201-218, 2008
Published on September 27, 2007

Abstract: We present a uniqueness type system that is simpler than both Clean's uniqueness system and the system we proposed previously. The new type system is straightforward to implement and add to existing compilers, and can easily be extended with advanced features such as higher rank types and impredicativity. We describe our implementation in Morrow, an experimental functional language with both these features. Finally, we prove soundness of the core type system with respect to the call-by-need lambda calculus.

Design and Implementation of a PHP Compiler Front-end

Edsko de Vries, John Gilbert, Technical Report TCD-CS-2007-47, Trinity College Dublin
Published on September 21, 2007

Abstract: This technical report describes the design and implementation of a front-end for phc, the open source PHP compiler. This front-end provides an excellent basis for developing tools which process PHP source code, and consists of a well-defined Abstract Syntax Tree (AST) specification for the PHP language, a lexical analyser and parser which construct the AST for a given script, and a visitor and transformation API for manipulating these ASTs.

Equality Based Uniqueness Typing

Edsko de Vries, Rinus Plasmeijer, and David Abrahamson, presented at TFP 2007
Published on April 2, 2007

Abstract: We define a uniqueness type system for the core lambda calculus which, unlike Clean's uniqueness system and the system we proposed in a previous paper, does not involve inequalities. We claim that this makes the type system sufficiently similar to the Hindley/Milner type system that standard type inference algorithms can be applied, and that it can easily be modified to incorporate modern extensions such as arbitrary rank types and generalized algebraic data types. We substantiate this claim by sketching out how such a system would be defined.

Uniqueness Typing Redefined

Edsko de Vries, Rinus Plasmeijer, and David Abrahamson, Zoltán Horváth, Viktória Zsók, and Andrew Butterfield (Eds.): IFL 2006, LNCS 4449, pp. 181-198, 2007
Published on September 4, 2006

Abstract: We modify Clean's uniqueness type system in two ways. First, where in Clean functions that are partially applied to a unique argument are necessarily unique (they cannot lose their uniqueness), we just require that they must be unique when applied. This ultimately makes subtyping redundant. Second, we extend the type system to allow for higher rank types. To be able to do this, we explicitly associate type constraints (attribute inequalities) with type schemes. Consequently, types in our system are much more precise about constraint propagation.