--- title: "Golfing Run Length Encoding in Haskell" description: "Haskell: step by step refactoring to concision" tags: Haskell, tutorial created: 26 Sep 2008 modified: 16 Jan 2013 status: finished confidence: highly likely importance: 2 cssExtension: drop-caps-yinit ... [Once I was playing with and working on a Haskell clone]{.smallcaps} of the old [_Gradius_](!Wikipedia "Gradius (series)") arcade games, _[Monadius](!Hackage)_. Most of my changes were not particularly interesting (cleaning up, Cabalizing, fixing warnings, switching all `Integer` to `Int` and so on), but in its [Demo.hs](http://code.haskell.org/monadius/Monadius/Demo.hs), I found an interesting solution to a problem, and it seems like a good small example of Haskell abstraction. # Examples One of the problems with language advocacy is that it's hard to have good examples, since someone who is writing an essay probably will only come up with trivial ones, and someone dealing with meaty problems isn't thinking about advocacy but how to solve their problem. I hope this example will be a little more substantial than the one-liners examples of [closures](!Wikipedia "Closure (computer science)") or [partial application](!Wikipedia). # Level data in _Monadius_ Suppose we have levels which are specified by a pair of numbers and then a long list of numbers, often very repetitious. Perhaps a particular level might be represented this way: ~~~{.Haskell} level1 :: ((Int, Int), [Int]) level1 = ((2,1),[0,0,0,0,0,0,0,0,0,0,0,0,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,73,73,73, 65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65, 69,69,65,17,17,17,17,17,17,17,17,17,17,17,17,17,25,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,1,1,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,1,1,1,1,1,1,1,1,33,33,1,1,1,1,33,33, 33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,49,49,49,49,33,33,33,1,1,1,1,1,1,1,9,9, 33,33,33,33,1,1,1,1,1,1,1,1,1,9,9,1,1,1,1,1,33,33,33,33,33,33,33,33,33,33,33,33,33, 33,33,33,33,1,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17, 17,17,17,81,81,81,81,81,81,81,81,81,81,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17, 17,17,17,17,17,17,17,17,17,17,17,17,17,1,1,1,1,1,1,1,17,17,17,17,17,17,17,17,17,17, 1,1,17,17,17,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65, 65,65,65,65,65,65,65,65,65,97,33,33,33,33,37,5,69,69,65,65,65,65,65,65,67,67,67,67, 75,75,75,75,75,75,75,75,75,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11, 11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11, 11,11,11,11,11,11,11,11,11,11,11,11,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11,3,3,3,3,3,3,3,3, 3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,67,67,67, 67,67,67,67,67,3,3,3,3,3,3,3,67,67,67,67,67,67,3,3,3,67,67,3,3,3,3,3,67,67,67,67,67, 3,3,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11, ,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,3,3,3,35,35,35,3,3,3,3,3,3,3,3,3, 3,3,3,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,51,51,51,51, 51,51,51,51,51,51,51,51,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35, 35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35, 43,43,43,43,43,43,43,43,43,43,43,43,43,43,43,11,11,11,11,11,43,43,43,43,43,43,43,43, 3,3,3,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,67,67,67,67, 67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67, 67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35, 35,35,3,3,3,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,35,35,35,35,35,35,35,35,35,3, ,3,3,3,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,19,19,19,19, 3,3,3,3,3,3,19,19,19,19,3,3,3,3,3,3,3,19,19,19,19,19,3,3,3,3,3,3,3,3,19,19,19,19,19, 67,67,3,3,19,19,19,19,19,19,19,19,19,19,19,19,83,83,83,83,83,19,19,19,19,19,19,19,19, ,19,83,83,83,83,83,83,83,83,83,19,19,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67, 11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11, ,11,43,35,35,35,35,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67, ,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67, 19,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,3,3,3,3,3,3,3,3,67,67,67,67, 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, 3,3,3,3,19,19,19,19,19,19,19,19,19,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35, 3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,35,35,43, 43,11,11,11,11,11,11,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67, 67,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,43,43,43,43,43,11,11,11,11, ,83,19,19,3,3,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]) ~~~ This is an ugly way to define a level. We could just scrap this representation as `Int` completely and perhaps define it using a hypothetical [enumerated type](!Wikipedia) as ~~~{.Haskell} level1 :: ((Geometry,Geometry),[Enemy]) level1 = ((Tall,Narrow), [Flyer,Flyer,Flyer,Flyer,Flyer,Flyer,Shooter,PowerUp,Boss..]) ~~~ But this representation, while certainly more understandable, is still very repetitious. It will take even more space to write down. We *could* auto-derive the [Read](!Hoogle) and [Show](!Hoogle) typeclasses for `Geometry` and `Enemy` datatypes, and then we could store `level1`, `level2` etc. as some sort of "levelinformation.dat" file, and read the level information in at runtime. But that makes installation and running more difficult^[Although Cabal makes this quite easy with its [Paths_\*](http://www.haskell.org/cabal/users-guide/#accessing-data-files-from-package-code) mechanism.] still quite possible, , and it doesn't address the core issue: the information is written in a way that is so ungainly that it's next to impossible to write or even *modify*! # The solution We need some way of expressing this more concisely, of, in short, *compressing* it. In this vein of thought, our first observation should be that we do not need to resort to fancy [compression libraries](http://hackage.haskell.org/packages/archive/pkg-list.html#cat:codec) or anything; there is an obvious way to compress it---the entire thing is a series of repeated numbers. We should be able to replace the lengthy enumeration of fragments like `[0,0,0,0,0,0,0,0,0,0,0,0]` with something simpler. For example, the number of repetitions, and what is to be repeated: `(12,0)`. This [run-length encoding](!Wikipedia) is shorter and easier to modify. It is possible that there may be a performance benefit here, as we've gotten rid of a large constant that would have to be defined in the program itself and instead replaced it with a shorter function which evaluates to the same thing. (If we're only on level 1, we don't need to carry around the expanded version of the other levels, and when we go to level 2, level 1 will be garbage-collected.) ## Writing it So, what is the type of our decompressing function? Well, we need to turn a `(Int,Int)` into a `[Int]`; even better, we want to turn a whole list of `(Int,Int)`s into a single list of `Int`s. Thus, our end goal is going to be a function of this type: ~~~{.Haskell} rleDecode :: [(Int,Int)] -> [Int] ~~~ Let us tackle the single tuple example first. The second entry defines what we need, and the first entry defines how many we need, and that's our type right there: ~~~{.Haskell} rleRecursiveDecode :: (Int, a) -> [a] ~~~ We could write a [primitive recursive](!Wikipedia) function that takes the parameter, decreases the counter by 1, and cons on a copy of the second entry. It could look like this: ~~~{.Haskell} rleRecursiveDecode (0,_) = [] rleRecursiveDecode (n,b) = b : (rleRecursiveDecode (n-1,b)) ~~~ But this really is not the best way to go. It is not necessarily easy to follow, and if there is one thing I have learned about Haskell programming, it is that the most obvious approach (in this case, primitive recursion) may not be the best way to go. This is code that could have as well been written in Scheme or something. It is complicated because we are trying to ensure that we generate an item for the list only at the exact moment we need to add it into the list; we are programming as if our language is strict, in other words. So, what is the lazy way of doing things? Why do we like laziness to begin with? One of the key rationale for [lazy evaluation](!Wikipedia) is that it promotes [separation of concerns](!Wikipedia)---one function doesn't need to know when or how much has been done by another function. In this case, what concerns ought to be separated? Well, our desired list has 2 properties: its length, and the single repeated item. Since we can create infinite lists, there is no need for the length to be set at the same time as the repeated-item is being generated. We can create an infinite list containing the item, and we demand as many as we need. Lazy evaluation means our infinite list doesn't blow up. Simple enough! The generation is too easy for words: ~~~{.Haskell} -- Define an infinite list of our item. duplicate :: a -> [a] duplicate a = a : duplicate a ~~~ But what's our demand function? A little thinking and we know that we have a list, a number, and we get back a list. [Int -> [a] -> [a]](!Hoogle) describes a few functions, the second of which turns out to be what we want: [take](!Hoogle). ~~~{.Haskell} rleLazyDecode :: (Int,Int) -> [Int] rleLazyDecode (n,b) = take n (duplicate b) ~~~ Now, `duplicate` is a simple enough function to define, but another type-search will show that [a -> [a]](!Hoogle) is already defined in the standard libraries---[repeat](!Hoogle). So we scrap `duplicate`: ~~~{.Haskell} rleLazyDecode (n,b) = take n (repeat b) ~~~ Might as well remove the parentheses: ~~~{.Haskell} rleLazyDecode (n,b) = take n $ repeat b ~~~ We can do better. What is the type of `\n b -> take n $ repeat b`? It is `Int -> a -> [a]`, and a [Hoogle type-search](!Hoogle "Int -> a -> [a]") turns up an exact match: [replicate](!Hoogle). So we can improve it further: ~~~{.Haskell} rleLazyDecode (n,b) = replicate n b ~~~ Are we quite done? Well, one could ask: `rleLazyDecade` is now basically nothing but an alias or new name for `replicate`, except for how it converts from a tuple `(n,b)` to normal uncurried arguments `n b`. Is there any function which will do this conversion for us? Well, we have a function `replicate :: a -> b -> c`, and data `(a,b)` passed into said function to produce a `c`, so we want a function with the type signature [(a -> b -> c) -> (a,b) -> c](!Hoogle), which turns out to be something called [uncurry](!Hoogle). Having reasoned our way this far, the implementation is much clearer than the type: ~~~{.Haskell} uncurry :: (a -> b -> c) -> (a, b) -> c uncurry f (a,b) = f a b ~~~ That's what we were just doing manually. So now the next version is clear: ~~~{.Haskell} rleLazyDecode = uncurry replicate ~~~ A satisfying, short, functional, and lazy one-liner. From here the definition of `rleDecode` is almost trivial: we extend it to a list of tuples by throwing in a [map](!Hoogle), and we turn the resulting list of lists into a single list by way of [concat](!Hoogle): ~~~{.Haskell} rleDecode ns = concat $ map rleLazyDecode ns ~~~ We can tweak this further, as `concat . map` is a common enough idiom that there is a shortcut: ~~~{.Haskell} rleDecode ns = concatMap rleLazyDecode ns ~~~ As before, we could have found `concatMap` by searching for the type signature of `concat . map`: [(a1 -> [a]) -> [a1] -> [a]](!Hoogle). Aw heck---let's make it [points-free](!Wikipedia "Tacit programming") like `rleLazyDecode`: ~~~{.Haskell} rleDecode = concatMap rleLazyDecode ~~~ And then we substitute in the `rleLazyDecode` definition: ~~~{.Haskell} rleDecode = concatMap (uncurry replicate) ~~~ We can actually go even further into the realms of incomprehensibility. It turns out that lists are a monad. This means we can use `bind` and all the rest of the operations defined by the [Monad](!Hoogle) typeclass to operate on lists and other things. So we can write this bizarre (but short and effective) version of `rleDecode`: ~~~{.Haskell} rleDecode = (uncurry replicate =<<) ~~~ And we are done! We can now represent the first level like this: ~~~{.Haskell} d1 = ((2,1),d) where d = rleDecode [(5, 3), (10, 67), (6, 3), (5, 67), (29, 3), (2, 11), (29, 3), (5, 35), (3, 3), (3, 35), (24, 3), (2, 35), (19, 3), (7, 19), (21, 51), (63, 35), (15, 43), (5, 11), (8, 43), (8, 35), (3, 3), (9, 35), (20, 3), (52, 67), (32, 3), (13, 35), (3, 3), (11, 35), (5, 3), (9, 35), (4, 3), (6, 35), (4, 3), (5, 35), (13, 3), (14, 19), (4, 3), (4, 19), (6, 3), (4, 19), (7, 3), (5, 19), (8, 3), (8, 19), (5, 83), (3, 67), (2, 3), (12, 19), (5, 83), (17, 19), (9, 83), (2, 19), (5, 3), (20, 67), (1, 75), (38, 11), (1, 43), (4, 35), (7, 3), (57, 67), (2, 3), (5, 19), (17, 3), (6, 19), (8, 3), (6, 67), (7, 83), (59, 3), (9, 19), (1, 51), (17, 35), (20, 3), (14, 35), (7, 3), (2, 35), (11, 43), (6, 11), (7, 3), (26, 67), (7, 3), (13, 35), (5, 43), (7, 11), (2, 3), (5, 67), (1, 83), (2, 19), (2, 3), (10, 67), (21, 3), (147, 0)] ~~~ Much nicer! And since our list-processing code is [polymorphic](!Wikipedia "Type polymorphism"), so even if we replace the arbitrary numbers by constructors, we change nothing. (Thanks to the denizens of #haskell for refactoring tips, and [Don Stewart's blog entry](https://donsbot.wordpress.com/2007/07/31/run-length-encoding-in-haskell/) on run-length encoding/decoding using [Arrows](!Hawiki "Arrow").)