After having fought for a while trying to get my FontConfig (the standard library for looking up fonts on most opensource desktops) language bindings to work correctly, I decided to reimplement them. To find more reliable strategies to better split the C code from the Haskell wrapper, making it easier to test. And easier to proof-read!
The bulk of FontConfig’s API serves to implement collection types, so I needed to find a way to transfer those collections from Haskell to C & back. There’s numerous ways to do this, but I wanted something rapid & reliable! For this I ultimately chose the MessagePack. After considering other solutions which ended up not helping.
MessagePack is a compact binary encoding with a datamodel loosely resembling JSON’s. Each value in MessagePack is preceded by a byte encoding its type, into which it strives to encode the data itself (or at least its length) when possible. And being a binary format it takes practically nothing to read numbers from it!
I used an existing MessagePack hackage to encode & decode the data on the Haskell side. I mostly leaned on preexisting conversions to the MessagePack model, though I did write some of my own. Aiming to avoid using extraneous bytes, without going through too much trouble. I took special care to delta-compress CharSets in the hopes that this allows MessagePack to encode each char in those sets in a single byte!
QuickCheck is used to check that I’m maintaining the invariant that if we encode data into MessagePack, parsing those bytes yields the same data. QuickCheck randomly generates example data (of the desired Haskell type) to run our tests on. This proved invaluable for ensuring there’s no major bugs, especially as I ran this data through the C code! It even caught ambiguities in the conversion!
Most, but not all, FontConfig functions can be considered pure functions. At least once we’ve substituted out its collection types. However passing those collections to C requires us to use pointers. And pointers require us to write impure functions.
So I wrote a couple internal utilities which promises that the only impure code in a language binding is that for handling the transient pointers we’re passing to it.
Once we’ve passed data from Haskell to C, I still needed to convert it into FontConfig’s types. The C MessagePack library I used for this read the data a byte at a time to parse the types I specify as expected. I didn’t realize this at first, but if I don’t know what MessagePack type to expect I needed to parse it to a dynamically-typed value first so I could branch on that type. Converting back didn’t have the same issues.
QuickCheck tests helped to give me confidence that no data is lost when converting from Haskell types to FontConfig’s & back, which isn’t always the case. There’s certain data that FontConfig refuses to store, so I added validation functions to catch these failure cases. When too much data is invalid I had to constrain the data QuickCheck generates.
The API is left mostly unchanged, so minimal codechanges to callers should be necessary. Though I realized that how I previously defined Patterns was a bit messy: It is now a proper dictionary. And I chose a faster underlying datastructure for CharSets, if slightly less convenient.
Even if I changed all the code behind those APIs for improved stability!
In addition to those QuickCheck tests ensuring no data gets unexpectedly lost when converting from Haskell to C & back, I transliterated several tests from FontConfig’s repository to ensure no bugs are introduced by my language bindings.
Several bugs were caught & fixed thanks to this automated testing! Which took some time.