shoot-yourself-in-the-foot-with-NFData-Doc

How to shoot yourself in the foot with NFData Doc

TL;DR: Don’t normalize pretty’s Docs. This library relies on laziness in a foundational way.

If you want to know what happens when you DeepSeq.force a Doc, and why the world then collapses on you, read on.

An innocuous refactoring, so it seemed

I worked on the problem in Agda that warnings would not print in color on the terminal (as opposed to errors), even though they were produced as annotated Docs and there was some renderer that turned the annotations into ANSI codes when printing.

It turned out that this renderer was not used for warnings as they were communicated through a data structure with just String messages and thus got rendered too early, losing the annotations. My strategy to tackle this problem was to replace the String by Doc in the data structure and let GHC point me to the places I had to update. There was suprisingly little to do, and I was quite happy, having expected a large refactoring ordeal.

The world crashing on me

The bill came when I tried to run the testsuite. It always stalled after a short while, having to be killed, and I found in the Activity Monitor that each of the running agda processes took 5GB. With 8 processes spawned by the testsuite, my machine was soon swapping itself to a halt. Once I even saw agda eating 90GB before it crashed and took the Terminal App down with it.

Diagnosis

Crying for help in my pull request Amélia analysed the situation as the normal form of a Doc is horrible and should never be stored. Her analysis was backed by some colored heap profiles. Indeed I had turned a DeepSeq.force on a rendered Doc (so, a String) into one on the Doc itself. (The purpose of the normalization was to bring exceptions to the surface and handle them, in order to not make internal errors in (debug) printing crash the whole process—but that detail is not important.) Commonly space problems are caused by retaining thunks and can be fixed by normalization, but here I encountered the opposite problem, so it took some time to understand it.

Amélia was (of course) right and I implemented her suggested fix, to render annotated Docs into simple annotated trees (DocTree) so as to preserve the annotations but prevent the memory explosion of normalized Docs.

Diving into guts of `pretty`

What is going on? Why cannot we normalize Docs?

The problem is the Union constructor in Doc that holds two documents that render to the same text (with maybe different layout). Relying on laziness, usually the second alternative is only used when the first does not work out well. However, NFData’s rnf will normalize both alternatives. Also, the way that the pretty DSL is designed, common Doc constructors are implemented shallowly (“smart constructors”), so they distribute over Union. For instance, take beside which is ubiquitous because it is invoked whenever you concatenate documents horizontally (such as in <>, <+>, hcat, hsep…).

beside (p1 `Union` p2) g q = beside p1 g q `union_` beside p2 g q

You can see that the arguments g and q are duplicated when beside distributes itself over a Union. And unions are also ubiquitous as they are introduced by seps (excerpt):

 -- Specification: sep ps = oneLiner (hsep ps) `union` vcat ps
 ...
 sepNB g Empty k ys
   = oneLiner (nilBeside g (reduceDoc rest)) `mkUnion`
   nilAboveNest False k (reduceDoc (vcat ys))

So the normal forms of Doc are quadratic and laziness is essential to make the pretty DSL work. Having a normalizer like NFData is contrary to the whole design of this library.

Why is there a `NFData` instance for `Doc`?

The blameful NFData Doc instance exists since 2014, while the library is much older than that, going back to articles by John Hughes from 1992 and 1995. Originally, the library was internal to GHC, coded by Simon Peyton Jones in 1996. It was first publicly released to Hackage by Ross Paterson in 2007.

The genesis of the NFData Doc instance as recorded in issue #12 can be paraphrased as follows:

Contributor: “Could we have a NFData instance? Could be useful.”

Maintainer: “Yeah, why not?”

This is not the original wording, but summarizes well the (publicly available) communication on the topic.

Let’s be fair. I can see myself in both roles. I could easily have been the contributor seeing a gap in the API and wanting to fill it. I could easily have been the maintainer seeing no fault in the proposal and waving it through.

We are confronted by a fundamental problem in software engineering here (amplified in the resource-starved Haskell open-source community): The original authors of the software leave, taking their deep understanding and unspoken dos and don’ts with them, and second generation maintainers have to take over without being equipped with the expertise and insights of the original authors.

In the case of prettys Docs, filling the gap of a missing NFData instance must be considered a bad idea, in hindsight at least. But I guess the original author would immediately have reacted with concerns to the proposal.

What about the competition?

The main competitor on the pretty-printing market is likely prettyprinter, an heir of the Wadler/Leijen pretty printer. Neither of these equips their Doc type with a NFData instance (see prettyprinter’s Doc and wl-pprints Doc). In the former case, Doc embeds functions, so normalization wouldn’t get far anyway.

Conclusion

The NFData instance for prettys Doc should be deprecated, warning the user of the unexpected fatal behavior. I opened an issue suggesting this.

If one wanted a Doc type that supports normalization without blowing up, one would need a deeper embedding, making all combinators that now duplicate documents into proper constructors of Doc. This would however be a major rewrite of the library, and there might be unforseen obstacles on this path.

Acknowledgements

Thanks to Amélia for finding the problem and outlining a solution. Thanks to the developers of pretty the other pretty-printing libraries, part of the army of unpaid volunteers in the Haskell community.

References

Commenting

Write a comment by opening an issue in the issue tracker!