Lockstep-style testing with quickcheck-dynamic
Recently IOG and QuviQ released a new library for testing stateful systems called quickcheck-dynamic
. In this blog post we will take a look at this library, and how it relates to quickcheck-state-machine. We will focus on the state machine testing aspect; quickcheck-dynamic also has support for dynamic logic, but we will not discuss that here.
Specifically, we will consider how we might do lockstep-style testing with quickcheck-dynamic
. This is a particular approach to testing that we described in great detail in an earlier blog post, An in-depth look at quickcheck-state-machine. We will recap the general philosophy in this new blog post, but we will focus here on the hows, not necessarily the whys; it might be helpful to be familiar with the previous blog post to understand the larger context of what we’re trying to achieve.
We have developed a library called quickcheck-lockstep
which builds on top of quickcheck-dynamic
to provide an abstraction called InLockstep
which provides support for lockstep-style testing. In this blog post we will describe this library in two parts:
In the first half we will show a test author’s perspective of how to use the abstraction. In the second half we show how we can implement the abstraction on top of quickcheck-dynamic
.
Verifying initial conditions in Plutus
On a UTxO-style blockchain such as Cardano, transaction outputs are normally to the (hash) of a public key; in order to spend such an output, the spending transaction must be signed with the corresponding private key as evidence that the party creating the transaction has the right to spend the output.
The extended UTxO model introduces a second kind of output: an output to a smart contract: a piece of code f
. Along with the contract itself, the output also contains an additional argument d
, known as the datum. When a transaction spends such an output, f(d)
is executed; when f
indicates that all is fine, the transaction is approved; otherwise, it is rejected.
An immediate consequence of this model is that outputs are only verified when they are spent, not when they are created. This can lead to some tricky-to-verify initial conditions. We will explore this problem in this blog post, and suggest some ways in which we can solve it. Along the way, we will recap some often misunderstood Plutus concepts.
The Plutus Compilation Pipeline: Understanding Plutus Core(s)
Plutus is a strict, pure functional language. It is developed by IOHK for use on the Cardano blockchain, but in this blog post we will not be concerned with specific applications of the language, but instead look at its compilation pipeline.
Technically speaking, Plutus is not one language, but three, and most people who write “Plutus” are not really writing Plutus at all, but Haskell. These Haskell programs are translated to the Plutus Intermediate Representation (PIR). After that, data types are replaced by their Scott encodings, and recursion is replaced by explicit fixpoints to get to typed Plutus Core (PLC). Finally, types are erased to get to the Untyped Plutus Core (UPLC).
In this blog post, we will explain this entire process, with a particular focus on
- The consequences of polymorphism in the typed language on the untyped language
- Scott encoding
- Type and term level fixpoints
(Blog posts/Videos) Avoiding quadratic core code size
I’ve thought, written and spoken quite a lot about this topic:
Blog post: Avoiding quadratic core code size with large records
A module that contains nothing but the definition of a single record with 100 fields and some type class instances, using ghc’s standard representation for records, along with the code generated by the RecordDotPreprocessor
plugin and the Generic
instance generated by generics-sop
, results in a core representation of a whopping 450,000 terms/types/coercions, takes 3 seconds to compile, and requires 500M of RAM to compile. With the large-records
library, we get a module with essentially the same functionality, but with a core size of a mere 14,000 terms/types/coercions, which compiles within 1 second and requires roughly 100M of RAM. In this blog post we describe why this simple module generates so much code, and how the large-records library manages to reduce this by more than an order of magnitude.
Blog post: Induction without core-size blow-up (a.k.a. Large records: anonymous edition)
An important factor affecting compilation speed and memory requirements is the size of the core code generated by ghc
from Haskell source modules. Thus, if compilation time is an issue, this is something we should be conscious of and optimize for. In [part 1][part1] of this blog post we took an in-depth look at why certain Haskell constructs lead to quadratic blow-up in the generated ghc
core code, and how the [large-records
][lr-hackage] library avoids these problems. Indeed, the large-records
library provides support for records, including support for generic programming, with a guarantee that the generated core is never more than O(n)
in the number of record fields.
The approach described there does however not directly apply to the case of anonymous records. This is the topic we will tackle in this part 2. Unfortunately, it seems that for anonymous records the best we can hope for is O(n log n)
, and even to achieve that we need to explore some dark corners of ghc
. We have not attempted to write a new anynomous records library, but instead consider the problems in isolation; on the other hand, the solutions we propose should be applicable in other contexts as well. Apart from section “Putting it all together”, the rest of this blog post can be understood without having read part 1.
Blog post: Type-level sharing in Haskell, now
The lack of type-level sharing in Haskell is a long-standing problem. It means, for example, that this simple Haskell code
apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
=
apBaseline f pure f
<*> pure A
<*> pure B
<*> pure C
results in core code that is quadratic in size, due to type arguments; in pseudo-Haskell, it will look something like
apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
=
apBaseline f pure f)
(<*> @A @(B -> C -> r) (pure A)
<*> @B @( C -> r) (pure B)
<*> @C @( r) (pure C)
Each application of (<*>)
records both the type of the argument that we are applying, as well as the types of all remaining arguments. Since the latter is linear in the number of arguments, and we have a linear number of applications of <*>
, the core size of this function becomes quadratic in the number of arguments.
We recently discovered a way to solve this problem, in ghc
as it is today (tested with 8.8, 8.10, 9.0 and 9.2). In this blog post we will describe the approach, as well as introduce a new typechecker plugin, which makes the process more convenient.
Blog post: New large-records release: now with 100% fewer quote
The large-records library provides support for large records in Haskell with much better compilation time performance than vanilla ghc does. Well-Typed and MonadFix are happy to announce a new release of this library, which avoids all Template Haskell or quasi-quote brackets. Example:
{-# ANN type User largeRecordLazy #-}
data User = MkUser {
name :: String
active :: Bool
,
}deriving stock (Show, Eq)
instance ToJSON User where
= gtoJSON
toJSON
john :: User
= MkUser { name = "john", active = True }
john
setInactive :: User -> User
= u{active = False} setInactive u
This makes for a nicer user experience and provides better integration with tooling (for example, better syntax highlighting, auto-formatting, and auto-completion). Importantly, avoiding Template Haskell also means we avoid the unnecessary recompilations that this incurs1, a significant benefit for a library aimed at improving compilation time.
Blog post: large-anon
: Practical scalable anonymous records for Haskell
The large-anon library provides support for anonymous records; that is, records that do not have to be declared up-front. For example, used as a plugin along with the record-dot-preprocessor plugin, it makes it possible to write code such as this:
magenta :: Record [ "red" := Double, "green" := Double, "blue" := Double ]
= ANON { red = 1, green = 0, blue = 1 }
magenta
reduceRed :: RowHasField "red" r Double => Record r -> Record r
= c{red = c.red * 0.9} reduceRed c
The type signatures are not necessary; type inference works as aspected for these records. If you prefer to use lenses, that is also possible:
reduceBlue :: RowHasField "blue" r Double => Record r -> Record r
= over #blue (* 0.9) reduceBlue
The library offers a small but very expressive API, and it scales to large records (with 100 fields and beyond), with excellent compilation time performance and good runtime performance. In this blog post we will first present the library from a user’s perspective, then give an overview of the internals with an aim to better to understand the library’s runtime characteristics, and finally show some benchmarks. The library is available from Hackage and is currently compatible with ghc 8.8, 8.10 and 9.0 (extending this to 9.2 should not be too hard).
Presentations (videos)
Avoiding quadratic GHC core code size: Introducing the
large-records
library, Haskell Implementors’ Workshop 2021Avoid quadratic blow-up during compilation, Haskell Exchange 2021
The fundamental problem solved by blockchain
There are many blog posts and descriptions of blockchain technology on the web, but I feel that very few of them get to the heart of the matter: what is the fundamental problem that blockchains solve? In this blog post I want to take a look at this question; this blog post is intended for a general audience and I will assume no technical knowledge. That does not mean however that we will need to dumb things down; we will see what the fundamental problem solved by blockchain is, and we will even see two alternative ways in which this problem is solved, known as proof-of-work (used by for example Bitcoin) and proof-of-stake (used by for example the Cardano blockchain).
Read moreReconstructing the Ableton velocity compand curve in Max for Live
The Velocity effect in Ableton Live allows us to apply some transformations to the velocity of incoming MIDI notes.
In this brief blog post we will look at how we might recreate this velocity curve; specifically, the Comp
part of the curve. The goal won’t necessarily be to recreate the curve exactly, but rather to have something that functions similarly.
Designing a custom Push2 instrument in Max for Live/JavaScript: Part 2
This is part 2 of the two-part series on designing custom Push instruments in Max for Live (using JavaScript). In Part 1 we explored the LiveAPI object object, and used it to construct an abstraction that allows us to monitor when the track that our instrument lives on is selected or deselected. In this part 2 we will get to the fun stuff, and actually develop our Push instrument.
Read moreDesigning a custom Push2 instrument in Max for Live/JavaScript: Part 1
As any Ableton Push user knows, the layout of the Push is different when you are using a standard instrument and when you are using a drum rack. In this two-part tutorial we are going to explore how we can design our own instruments that use a completely custom Push layout, but yet integrate well with the normal Ableton work flow.
I have a short video on YouTube demonstrating what the instrument does and how it is used.
Read moreProgramming in Max (a Haskeller's perspective)
Introduction
The programming language Max has been around since the mid 80s, as long as Haskell has. As a programming language it is quite interesting. Where Haskell takes purity to its logical extreme, Max does the opposite and takes stateful programming to its logical extreme. Where Haskell is entirely pull based (laziness drives everything, nothing is evaluated unless and until the result demands it), Max is entirely push based: computation is exclusively driven by incoming data.
Read more(Video) Being lazy without being bloated
This presentation was part of the Munihac 2020.
Laziness is one of Haskell’s most distinctive features. It is one of the two features of functional programming that “Why Functional Programming Matters” identifies as key to modularity, but it is also one of the most frequently cited features of Haskell that programmers would perhaps like to change. One reason for this ambivalence is that laziness can give rise to space leaks, which can sometimes be fiendishly difficult to debug. In this talk we will present a new library called nothunks which can be used to test for the absence of unexpected thunks in long-lived data; when an unexpected thunk is found, a “stack trace” is returned identifying precisely where the thunk is (“the second coordinate of a pair in a map in a list in type T”). In combination with QuickCheck, this can be used to test that an API does not create any thunks when it shouldn’t and thunks that are created are easily identified and fixed. Whilst it doesn’t of course fix all space leaks, it can help avoid a significant proportion of space leaks due to excessive laziness.
Watch the presentation on YouTube
(Video) There is no fork: supporting an ever evolving blockchain
This presentation was part of the Cardano Virtual Summit 2020. It presents an overview of the consensus layer, emphasizes again its abstract nature (see also my previous blog post on that topic), and then focusses on the hard fork combinator: a way to combine different ledgers into a single blockchain without ever executing a traditional change-the-world hard fork.
Watch the presentation on YouTube
Watch the Q&A session on YouTube
The abstract nature of the Cardano consensus layer
The Cardano consensus layer has two important responsibilities:
It runs the blockchain consensus protocol. In the context of a blockchain, consensus—that is, ‘majority of opinion’—means everyone involved in running the blockchain agrees on what the one true chain is. This means that the consensus layer is in charge of adopting blocks, choosing between competing chains if there are any, and deciding when to produce blocks of its own. It is responsible for maintaining all the state required to make these decisions. To decide whether or not to adopt a block, the protocol needs to validate that block with respect to the state of the ledger. If it decides to switch to a different chain (a different tine of a fork in the chain), it must keep enough history to be able to reconstruct the ledger state on that chain. To be able to produce blocks, it must maintain a mempool of transactions to be inserted into those blocks.
The consensus layer mediates between the network layer below it, which deals with concerns such as communication protocols and peer selection, and the ledger layer above it, which specifies what the state of the ledger looks like and how it must be updated with each new block. The ledger layer is entirely stateless, and consists exclusively of pure functions. In turn, the consensus layer does not need to know the exact nature of the ledger state, or indeed the contents of the blocks (apart from some header fields required to run the consensus protocol).
Integrated versus Manual Shrinking
Property-based testing is an approach to software testing where instead of writing tests we generate tests, based on properties that the software should have. To make this work, we need to be able to generate test data and, when we find a counter example, we need to shrink that test data to attempt to construct a minimal test case.
In Haskell, the library QuickCheck
has long been the library of choice for property based testing, but recently another library called Hedgehog
has been gaining popularity. One of the key differences between these two libraries is that in QuickCheck
one writes explicit generation and shrinking functions, whereas in Hedgehog
shrinking is integrated in generation. In this blog post we will explain what that means by developing a mini-QuickCheck
and mini-Hedgehog
and compare the two. We will consider some examples where integrated shrinking gives us benefits over manual shrinking, but we we will also see that the belief that integrated shrinking basically means that we can forget about shrinking altogether is not justified. There is no such thing as a free shrinker.
An in-depth look at quickcheck-state-machine
Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.
In this blog post we will take an in-depth look at quickcheck-state-machine
, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for quickcheck-state-machine
tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.
Compositional zooming for StateT and ReaderT using lens
Consider writing updates in a state monad where the state contains deeply nested structures. As our running example we will consider a state containing multiple “wallets”, where each wallet has multiple “accounts”, and each account has multiple “addresses”. Suppose we want to write an update that changes one of the fields in a particular address. If the address cannot be found, we want a precise error message that distinguishes between the address itself not being found, or one of its parents (the account, or the wallet) not being found. In this blog post we will show how we can develop some composable abstractions that will help with writing code like this in a concise manner.
(Video) The Cardano Wallet
This is a recording of a presentation I gave for the Cardano foundation at the first Cardano meetup in the Netherlands. The presentation was aimed at a general audience, and assumed no prior knowledge. The abstract was:
A cryptocurrency wallet is the primary means by which end users from private individuals to large exchange nodes interact with a blockchain. Its primary function is to keep track of the user’s unspent outputs and enable them to submit new transactions into the network. As part of the redesign of the Cardano wallet we have written a detailed mathematical specification, which we use for development and testing. In addition, we have done extensive research into “coin selection”, the process of choosing unspent outputs from the user’s wallet to fulfill a payment request. In this presentation we will give a high-level description of both of these strands of work.
Watch the presentation on YouTube
Self Organisation in Coin Selection
The term “self organisation” refers to the emergence of complex behaviour (typically in biological systems) from simple rules and random fluctuations. In this blog post we will see how we can take advantage of self organisation to design a simple yet effective coin selection algorithm.
Coin selection is the process of selecting unspent outputs in a user’s wallet to satisfy a particular payment request (for a recap of UTxO style accounting, see section “Background: UTxO-style Accounting” of my previous blog post). As Jameson Lopp points out in his blog post The Challenges of Optimizing Unspent Output Selection, coin selection is thorny problem. Moreover, there is a surprising lack of academic publications on the topic; indeed, the only scientific study of coin selection appears to be Mark Erhardt’s masters thesis An Evaluation of Coin Selection Strategies.
Semi-Formal Development: The Cardano Wallet
As part of our consulting work for IOHK, Well-Typed have been working with IOHK on the design and implementation of the new version of the Cardano cryptocurrency wallet. As a crucial component of this process, we have written a semi-formal specification of the wallet: a mathematical model of the wallet along with invariants and lemmas about how it behaves.
We refer to this specification as “semi-formal” because while it states many of the wallet’s properties, and proves some of them, it by no means proves all of them. As we will see, however, we can use QuickCheck to test such properties, producing counter-examples where they fail to hold. Not only is this an invaluable tool during the development of the specification itself, it also gives us a very principled way of testing the real implementation, even if later we do prove all the remaining properties as well.
Object Oriented Programming in Haskell
In object oriented programming an object is a value with a well-defined interface. The internal state of the object is closed to the outside world (encapsulation), but the behaviour of an object can be modified by redefining one or more of the functions in the object’s interface, typically through subclasses (inheritance). The internal state of the object is open to, and can be extended by, such subclasses. This is known as the open/closed principle.
In this blog post we will explain what open recursion is, provide some runnable examples that you can evaluate one step at a time to develop a better intuition for how it works, and show how it can be used to solve the problem above. These results are not new, but we hope that this blog post can serve to make them a bit more accessible to a larger audience. We then discuss how we can extend this approach using lenses to allow “inheritance” to also extend the “state” of the object.
Writing a High-Assurance Blockchain Implementation
In his blog post Cryptocurrencies need a safeguard to prevent another DAO disaster, Duncan Coutts discussed the need for high assurance development of cryptocurrencies and their underlying blockchain protocols. He also sketched one way in which one might go about this, but did not give much detail. In this follow-up blog post I delve into computer science theory a bit more, and in particular will take a closer look at process algebras. I will have no choice but omit a lot of detail still, but hopefully we can develop some intuition and provide a starting point for reading more.
(Video) Static Pointers, Closures and Polymorphism
A static pointer can be thought of as the address of symbol in a program. Their primary use lies in the fact that they can always be serialized and hence sent across a network; thus, while we cannot send a Haskell function over a network, we /can/ send a static pointer to that function. This makes them very useful in distributed frameworks such as Cloud Haskell (an library for Erlang-like distributed programs) and Sparkle (Haskell bindings to the Apache Spark cluster computing framework). In this talk, you will learn how static pointers work, what kind of infrastructure we can define on top to make working with static pointers more practical, and outline some use cases.
Watch the presentation at skillsmatter.com
Visualizing lazy evaluation
Haskell and other call-by-need languages use lazy evaluation as their default evaluation strategy. For beginners and advanced programmers alike this can sometimes be confusing. We therefore developed a tool called visualize-cbn. It is a simple interpreter for a mini Haskell-like language which outputs the state of the program at every step in a human readable format. It can also generate a HTML/JavaScript version with “Previous” and “Next” buttons to step through a program. We released the tool as open source to github and Hackage, in the hope that it will be useful to others. In this blog post we will illustrate how one might take advantage of this tool. We will revisit the infamous triple of functions foldr
, foldl
, foldl'
, and show how they behave. As a slightly more advanced example, we will also study the memory behaviour of mapM
in the Maybe
monad. Hopefully, this show-rather-than-tell blog post might help some people understand these functions better.
Binary instances for GADTs (or: RTTI in Haskell)
In this blog post we consider the problem of defining Binary
instances for GADTs such as
data Val :: * -> * where
VI :: Int -> Val Int
VD :: Double -> Val Double
Linearity, Uniqueness, and Haskell
There is a group of people consisting of Tweag I/O, ghc head-quarters and JP Bernardy working on a proposal for adding linear types to Haskell. In this blog post I will attempt to put this proposal in some context. I will explain what linearity is and how it relates to its close cousin uniqueness typing, and I will compare the proposal to some other ways of doing things. My hope is that this will make the work more accessible.
Read moreSharing, Memory Leaks, and Conduit and friends
TL;DR: Sharing conduit values leads to memory leaks. Make sure to disable the full laziness optimization in the module with your top-level calls to runConduit or ($$) (skip to the end of the conclusion for some details on how to do this). Similar considerations apply to other streaming libraries and indeed any Haskell code that uses lazy data structures to drive computation.
Efficient Amortised and Real-Time Queues in Haskell
A queue is a datastructure that provides efficient—O(1)—operations to remove an element from the front of the queue and to insert an element at the rear of the queue. In this blog post we will discuss how we can take advantage of laziness to implement such queues in Haskell, both with amortised and with worst-case O(1) bounds.
The results in this blog post are not new, and can be found in Chris Okasaki’s book “Purely Functional Data Structures”. However, the implementation and presentation here is different from Okasaki’s. In particular, the technique we use for real-time datastructures is more explicit and should scale to datastructures other than queues more easily than Okasaki’s.
Dependently typed servers in Servant
Suppose we have a webserver that can perform different kinds of operations on different kinds of values. Perhaps it can reverse or capitalize strings, and increment or negate integers. Moreover, it can echo back any value. However, it does not make sense to try to reverse an integer or increment a string.
This is an example of a dependently typed server: the value that we are given as input determines the type of the rest of the server. If we get a string as input, we expect a string operation as the second argument and the response is also a string; if we get an integer as input, we expect an integer operation as the second argument and the response is also an integer.
Lightweight Checked Exceptions in Haskell
Consider this function from the http-conduit library:
-- | Download the specified URL (..)
--
-- This function will 'throwIO' an 'HttpException' for (..)
simpleHttp :: MonadIO m => String -> m ByteString
Notice that part of the semantics of this function—that it may throw an HttpException
—is encoded in a comment, which the compiler cannot check. Wouldn’t it be much nicer if we could simply express in the type that simpleHttp
may throw a HttpException
? In this blog post I will propose a very lightweight scheme to do precisely that.
Parametricity Tutorial
A powerful feature of Haskell’s type system is that we can deduce properties of functions by looking only at their type. For example, a function of type
f :: ∀a. a -> a can only be the identity function: since it must return something of type a, for any type a, the only thing it can do is return the argument of type a that it was given (or crash).
To reason about the properties of functions based only on their types we make use of the theory of parametricity, which tells how to derive a so-called “free theorem” from a type. This blog post is a tutorial on how to do this; it won’t explain why the theory works, only how it works. A Haskell practitioner’s guide to parametricity, if you will.
Part 1: Constant types, functions and polymorphism
Part 2: Type constructors and type classes
Qualified Goals in the Cabal Solver
When you ask cabal-install
to install one or more packages, it needs to solve a constraint satisfaction problem: select a version for each of the packages you want to install, plus all their dependencies, such that all version constraints are satisfied. Those version constraints come both from the user (“please install lens version 4”) and from the packages themselves (“this package relies on mtl between version 2.1 and 2.2”). Internally cabal-install uses a modular constraint solver, written by Andres Löh. It was first presented at the Haskell Implementor’s Workshop in 2011, and is based on the paper Modular lazy search for Constraint Satisfaction Problems.
For the Industrial Haskell Group, Well-Typed has recently extended the constraint solver to deal with qualified goals. In this blog post we will explain how this constraint solver works by means of a running example, and look at how the new qualified goals change things.
Comprehensive Haskell Sandboxes, Revisited
For the last few years I’ve been using a system of symlinks as Haskell sandboxes. This worked well, but has one important drawback: symlinks are basically user-global state, and hence it is not possible to have multiple sandboxes active at the same time. However, as Haskell tools are getting better and better we can actually simplify the entire setup dramatically. Here’s a summary of my new approach:
Read moreSwitching from Vim to Atom
(A Haskeller's Perspective)
I have been a Vim user for many many years, but have long wanted to switch to an editor that was more easily hackable. I tried Emacs and Sublime, but both had various hurdles that were just too annoying for me and both times I switched back to vim very quickly. However, I started using Atom a while back, and I have to say, I liked it from the start. I very quickly felt almost as productive as I was in vim, and in some ways I already much prefer Atom. (And since it’s JavaScript, we can write plugins in Haskell!)
Below are some tips that helped me transition from Vim to Atom.
Read moreWriting Atom plugins in Haskell using ghcjs
Atom is an editor developed by the people behind GitHub. Written in JavaScript, it is expressly designed to be hackable, and hackable it is; I find writing Atom plugins a rather pleasant experience, even though this means writing CoffeeScript. Nonetheless, one of the things that initially attracted me to Atom is the existence of ghcjs
, and hence the possibility of writing Atom plugins in Haskell. Why bother? Well, Haskell versus JavaScript, need I say more? Moreover, it means we can use almost any existing Haskell library in our Atom plugins; for instance, my Cabal extension calls into the Cabal
library to parse .cabal
files.
Monads: From Web 2.0 to Hardware Drivers
Monads are often considered to be a stumbling block for learning Haskell. Somehow they are thought of as scary and hard to understand. They are however nothing more than a design pattern, and a rather simple one at that. Monads give us a standardized way to glue a particular kind of computations together, passing the result of each computation to the next. As such they are useful well beyond functional programming.
Core JavaScript Semantics
This post is a summary of the core JavaScript semantics, based on the papers The Essence of JavaScript and A Tested Semantics for Getters, Setters, and Eval in JavaScript from the semantics of JavaScript project at Brown University. Note that throughout this post “reference” has a very precise meaning as a pointer to a memory location; think ML-style references (or, if you must, C++-style references).
Read moreSimple SMT solver for use in an optimizing compiler
This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like
if a == 0 then
if !(a == 0) && b == 1 then
1
write else
2
write else
3 write
into
if a == 0 then
2
write else
3 write
without much effort at all. Read more at well-typed.com
Quasi-quoting DSLs for free
Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.
Haskell CPP macros
Listing these here mostly because whenever I need to use them I seem to be unable to find them and have to dig for them in previous code I wrote.
Checking for minimal package versions
This relies on macros generated by Cabal. Example:
#if MIN_VERSION_process(1,2,0)
-- something
#else
-- something else
#endif
Dealing with Asynchronous Exceptions during Resource Acquisition
Consider the following code: we open a socket, compute with it, and finally close the socket again. The computation happens inside an exception handler (try), so even when an exception happens we still close the socket:
example1 :: (Socket -> IO a) -> IO a
= do -- WRONG
example1 compute <- openSocket
s <- try $ compute s
r
closeSocket scase r of
Left ex -> throwIO (ex :: SomeException)
Right a -> return a
Although this code correctly deals with synchronous exceptions–exceptions that are the direct result of the execution of the program–it does not deal correctly with asynchronous exceptions–exceptions that are raised as the result of an external event, such as a signal from another thread.
Debugging Haskell at assembly level
by scripting lldb in Python
Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.
Understanding the RealWorld
In my previous blog post Understanding the Stack I mentioned that “RealWorld tokens disappear” in the generated code. This is not entirely accurate. Simon Marlow gives a very brief explanation of this on the old glasgow-haskell-users mailing list. In this blog post we will explore this in more depth and hopefully come to a better understanding of the RealWorld, though perhaps not of the real world.
Understanding the Stack
One of our clients sent us a bug report, which consisted of a 700 line Haskell program with a complaint that “it deadlocks”. After studying the code I concluded that, for the sake of the bug report, it could be summarized as
print a bunch of stuff, then crash with a stack overflow
So I wanted to replace the original code with code that did just that: print a bunch of stuff, then crash with a stack overflow. No big deal:
Haskell (including GTK) on Mavericks
This post is out of date. See Comprehensive Haskell Sandboxes, Revisited instead.
Installing GTK
Although the GTK libraries can be installed through fink, macports, or brew, I’ve never had much luck with any of them. I recently tried installing them through brew and although they appeared to install ok, as soon as I tried to run a Haskell GTK application it segfaulted with a null pointer exception somewhere in libcairo. I therefore prefer to install from source. This is a fairly painless process; some hints are in my Comprehensive Haskell Sandboxes post. You can follow those instructions or just follow the instructions on the GTK+ website. The only important thing is that you need set set LDFLAGS so that it can find -lpython:
export LDFLAGS=-L/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/config
Read more
Pointwise Lenses
Lenses are a current hot topic in the Haskell community, with a bunch of packages providing implementations (data-accessor, fclabels, lens, amongst others). Although we will recall definitions, this post is not meant as an introduction to lenses. If you have not worked with lenses before, the talk from Simon Peyton Jones or the blog post by Sebastiaan Visser about fclabels are good starting points.
In this blog post we will propose a generalization to the lens representation used in fclabels and in many other packages (with various minor variations); we will consider the relation to the representation used in lens in a separate section.
Performance profiling with ghc-events-analyze
ghc-events-analyze is a new simple Haskell profiling tool that uses GHC’s eventlog system. It helps with some profiling use cases that are not covered by the existing GHC profiling modes or tools. It has two major features:
- While ThreadScope shows CPU activity across all your cores, ghc-events-analyze shows CPU activity across all your Haskell threads.
- It lets you label periods of time during program execution (by instrumenting your code with special trace calls) and then lets you visualize those time periods or get statistics on them.
sendmail (postfix) on OSX
Scenario: you want to be able to run sendmail from the command line and have it deliver mail to mail servers other than localhost. It’s not particularly difficult once you know what to do but it was a bit of a pain to figure out. So, mostly for my own benefit:
- Fix configuration and permissions
- Set up SASL authentication for your favourite SMTP server
- Tell postfix which users are local (none)
(Video) Lazy I/O and Alternatives in Haskell
Haskell is one of the few programming languages that use lazy evaluation: computations get performed only when their result is demanded. It is however not so straightforward to combine lazy evaluation with side effects, such as reading from a file.
Lazy I/O is the standard solution to this in the current base libraries, but has its problems. We explain how lazy I/O works, what the problems are, and explore some of the solutions that are being proposed at the moment: iteratees, pipes, stream I/O, etc.
Watch the presentation at skillsmatter.com
The darker corners of throwTo
We will assume
import System.IO.Unsafe (unsafePerformIO, unsafeInterleaveIO)
import System.IO.Error (catchIOError)
import Control.Concurrent (throwTo, myThreadId, threadDelay)
import Control.Concurrent.Async (async, cancel, wait)
import qualified Control.Exception as Ex
in the remainder of this post.
Whatever work the target thread was doing when the exception was raised is not lost: the computation is suspended until required by another thread.
We can observe this directly when using unsafePerformIO :
Read moreBrief Intro to Quasi-Quotation (Show me the types already)
This posts serves as a brief introduction to quasi-quotation in Haskell, because I couldn’t make much sense of the existing documentation, which gets bogged down in unnecessary details about writing parsers, the use of SYB, etc. Also, although quasi-quotation borrows syntax from Template Haskell, the TH expression [|e|]
and the QQ [qq|e|]
are false friends at best.
Comprehensive Haskell Sandboxes
For an alternative approach, see Comprehensive Haskell Sandboxes, Revisited instead.
Serious Haskell developers will want to test their packages against different versions of ghc, different versions of libraries, etc. It is therefore useful to have multiple Hackage development environments, or sandboxes. A comprensive sandbox contains:
- An installation of ghc
- A package database
- cabal configuration (for instance, so that we can pin packages)
Moreover, good sandboxes are
- isolated from each other: only the active sandbox can be modified
- relocatable: it should be possible to copy the sandbox ~/env/ghc762 to ~/env/client1 to use as the basis for a new sandbox, or to copy the entire set of sandboxes (~/env) from one machine to another (suitably similar) machine (this means that paths in config files or baked into executables should never mention the name of the sandbox)
- self-contained: there should be no need for a system wide install of ghc
- transparent: we should be able to use the standard tools (ghc, cabal) as they are
There are existing tools for Haskell sandboxes (such as cabal-dev or hsenv), but none satisfy all of the above requirements (to the best of my knowledge). However, it is easy enough to set up our own.
Read morePerformance problems with -threaded?
Consider the following two implementations of a very simple ping/pong server and client, one written in C and one written in Haskell:
Neither of them has been particularly optimized, that’s not the point of this post. The point of this post will become evident when we measure the latency for the following three programs:
- The C program compiled with -O2
- The Haskell program compiled with -O2
- The Haskell program compiled with -O2 and -threaded
ThreadScope 0.2.2 binary for Mac OS X 10.8.2
This is an application bundle which should hopefully Just Work without any additional libraries (in particular, you should not have to install any GTK libraries).
Communication Patterns in Cloud Haskell (Part 4)
K-Means
In Part 3 of this series we showed how to write a simple distributed implementation of Map-Reduce using Cloud Haskell. In this final part of the series we will explain the K-Means algorithm and show how it can be implemented in terms of Map-Reduce.
K-Means is an algorithm to partition a set of points into n clusters. The algorithm iterates the following two steps for a fixed number of times (or until convergence):
- Given a set of points and n cluster centres, associate each point with the cluster centre it is nearest to.
- Compute the centre of each new cluster.
Communication Patterns in Cloud Haskell (Part 3)
Map-Reduce
In Part 1 and Part 2 of this series we described a number of ways in which we might compute the number of prime factors of a set of numbers in a distributed manner. Abstractly, these examples can be described as
a problem (computing the number of prime factors of 100 natural numbers) is split into subproblems (factorizing each number), those subproblems are solved by slave nodes, and the partial results are then combined (summing the number of factors)
Communication Patterns in Cloud Haskell (Part 2)
Performance
In Part 1 of this series we introduced a number of basic Cloud Haskell communication patterns. All these patterns will work fine for modest numbers of work items; in this blog post we’re going to look at the performance and memory use for much larger numbers of work items, so we can see how each pattern scales up. We will see that scalability improves as we move from master-slave to work-pushing to work-stealing, and that many further improvements can still be made. More importantly, we hope to help you to predict runtime behaviour from your code, so that you can write more scalable and high-performance code.
Communication Patterns in Cloud Haskell (Part 1)
Master-Slave, Work-Stealing and Work-Pushing
In this series (2, 3, 4) of blog posts we will describe a number of basic communication patterns in Cloud Haskell. We don’t assume much familiarity with Cloud Haskell, but it will probably be useful to be familiar with the basics; Towards Haskell in the Cloud (Epstein, Black and Peyton Jones, Haskell Symposium 2011) is a good starting point. We will start simple but we will finish with some more advanced techniques.
Memory Layout for Multiple and Virtual Inheritance
In this article we explain the object layout implemented by gcc for multiple and virtual inheritance. Although in an ideal world C++ programmers should not need to know these details of the compiler internals, unfortunately the way multiple (and especially virtual) inheritance is implemented has various non-obvious consequences for writing C++ code (in particular, for downcasting pointers, using pointers to pointers, and the invocation order of constructors for virtual bases). If you understand how multiple inheritance is implemented, you will be able anticipate these consequences and deal with them in your code. Also, it is useful to understand the cost of using virtual inheritance if you care about efficiency. Finally, it is interesting :-)
DLX/MIPS tutorial
This is a tutorial about the MIPS microprocessor, that I wrote for Prof Jeremy Jones at Trinity College, Dublin. The original tutorial can be found on his website, and if you are running Windows I recommend you go there instead because all the examples are in fact interactive. Unfortunately this does not work on Linux or OSX, however, so I’ve mirrored the tutorial here and created YouTube videos for all the examples.
Read more