(Blog posts/Videos) Avoiding quadratic core code size

Posted on August 20, 2021 (last updated 2021-12-17)

I’ve thought, written and spoken quite a lot about this topic:

Blog post: Avoiding quadratic core code size with large records

A module that contains nothing but the definition of a single record with 100 fields and some type class instances, using ghc’s standard representation for records, along with the code generated by the RecordDotPreprocessor plugin and the Generic instance generated by generics-sop, results in a core representation of a whopping 450,000 terms/types/coercions, takes 3 seconds to compile, and requires 500M of RAM to compile. With the large-records library, we get a module with essentially the same functionality, but with a core size of a mere 14,000 terms/types/coercions, which compiles within 1 second and requires roughly 100M of RAM. In this blog post we describe why this simple module generates so much code, and how the large-records library manages to reduce this by more than an order of magnitude.

Read more at well-typed.com

Blog post: Induction without core-size blow-up (a.k.a. Large records: anonymous edition)

An important factor affecting compilation speed and memory requirements is the size of the core code generated by ghc from Haskell source modules. Thus, if compilation time is an issue, this is something we should be conscious of and optimize for. In [part 1][part1] of this blog post we took an in-depth look at why certain Haskell constructs lead to quadratic blow-up in the generated ghc core code, and how the [large-records][lr-hackage] library avoids these problems. Indeed, the large-records library provides support for records, including support for generic programming, with a guarantee that the generated core is never more than O(n) in the number of record fields.

The approach described there does however not directly apply to the case of anonymous records. This is the topic we will tackle in this part 2. Unfortunately, it seems that for anonymous records the best we can hope for is O(n log n), and even to achieve that we need to explore some dark corners of ghc. We have not attempted to write a new anynomous records library, but instead consider the problems in isolation; on the other hand, the solutions we propose should be applicable in other contexts as well. Apart from section “Putting it all together”, the rest of this blog post can be understood without having read part 1.

Read more at well-typed.com

Blog post: Type-level sharing in Haskell, now

The lack of type-level sharing in Haskell is a long-standing problem. It means, for example, that this simple Haskell code

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
        pure f
    <*> pure A
    <*> pure B
    <*> pure C

results in core code that is quadratic in size, due to type arguments; in pseudo-Haskell, it will look something like

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
                          (pure f)
    <*> @A @(B -> C -> r) (pure A)
    <*> @B @(     C -> r) (pure B)
    <*> @C @(          r) (pure C)

Each application of (<*>) records both the type of the argument that we are applying, as well as the types of all remaining arguments. Since the latter is linear in the number of arguments, and we have a linear number of applications of <*>, the core size of this function becomes quadratic in the number of arguments.

We recently discovered a way to solve this problem, in ghc as it is today (tested with 8.8, 8.10, 9.0 and 9.2). In this blog post we will describe the approach, as well as introduce a new typechecker plugin, which makes the process more convenient.

Read more at well-typed.com

Blog post: New large-records release: now with 100% fewer quote

The large-records library provides support for large records in Haskell with much better compilation time performance than vanilla ghc does. Well-Typed and MonadFix are happy to announce a new release of this library, which avoids all Template Haskell or quasi-quote brackets. Example:

{-# ANN type User largeRecordLazy #-}
data User = MkUser {
      name   :: String
    , active :: Bool
  deriving stock (Show, Eq)

instance ToJSON User where
  toJSON = gtoJSON

john :: User
john = MkUser { name = "john", active = True }

setInactive :: User -> User
setInactive u = u{active = False}

This makes for a nicer user experience and provides better integration with tooling (for example, better syntax highlighting, auto-formatting, and auto-completion). Importantly, avoiding Template Haskell also means we avoid the unnecessary recompilations that this incurs1, a significant benefit for a library aimed at improving compilation time.

Read more at well-typed.com

Blog post: large-anon: Practical scalable anonymous records for Haskell

The large-anon library provides support for anonymous records; that is, records that do not have to be declared up-front. For example, used as a plugin along with the record-dot-preprocessor plugin, it makes it possible to write code such as this:

magenta :: Record [ "red" := Double, "green" := Double, "blue" := Double ]
magenta = ANON { red = 1, green = 0, blue = 1 }

reduceRed :: RowHasField "red" r Double => Record r -> Record r
reduceRed c = c{red = c.red * 0.9}

The type signatures are not necessary; type inference works as aspected for these records. If you prefer to use lenses, that is also possible:

reduceBlue :: RowHasField "blue" r Double => Record r -> Record r
reduceBlue = over #blue (* 0.9)

The library offers a small but very expressive API, and it scales to large records (with 100 fields and beyond), with excellent compilation time performance and good runtime performance. In this blog post we will first present the library from a user’s perspective, then give an overview of the internals with an aim to better to understand the library’s runtime characteristics, and finally show some benchmarks. The library is available from Hackage and is currently compatible with ghc 8.8, 8.10 and 9.0 (extending this to 9.2 should not be too hard).

Read more at well-typed.com

Presentations (videos)