Persist the List

A few years ago now, I developed a bit of a thing for functional programming. I explored a few different functional programming languages including Clojure, Haskell, Scala and F#. I’ve used Scala professionally and continue to do so to some extent in my current role. I also went full on into F# for 18 months and luckily found a job where I could use it professionally. These days I am back developing software mainly with JVM languages and technologies and mainly using Kotlin both professionally and for spare time coding fun.

I prefer now to mix OO and functional programming paradigms as I believe they complement each other well and Kotlin is a wonderful language for this. It has great tooling and great language features providing all the type safety I need along with concise ways to express common OO and functional programming patterns and data transformations.

When first coming to Kotlin, one feature I soon missed from languages like Scala, F# and Clojure is fully immutable data structures in the language’s standard library. Kotlin does have read-only collection interfaces and differentiates these from mutable interfaces. For example the following creates an implementation of the List interface in Kotlin. This interface doesn’t provide any methods that would allow a client of the interface to modify the underlying list.

The list type declaration (List<Int>) is optional as Kotlin can infer the type but I have it in the snippet to show the interface that is returned from the call to listOf.

A mutable list can be created in Kotlin as follows:

The interface is called MutableList and it provides methods to modify the underlying list implementation. When compiling Kotlin to target the JVM, the underlying list implementations that are created are standard java ArrayLists. The read-only List interface offers some protection against the underlying list implementation being mutated. However, because the underlying list implementation is still a Java mutable list, it is possible for it to be mutated from elsewhere in the code as the following snippet demonstrates:

This prints the following:

listNotToBeAmended before add: [4, 5, 6]
listNotToBeAmended after add to underlying list: [4, 5, 6, 3]

At first, this really made me miss fully immutable data structures from, say, Scala. However, good code encapsulation and defensive copies can prevent the types of errors in
the above code snippet from arising. I find Kotlin read-only interfaces are good enough most of the time and reap great rewards in terms of ease of interoperability with Java and Java libraries. So fully immutable data structures aren’t a deal breaker for me using and thoroughly enjoying using Kotlin. They are more a “would be really nice to have” feature for me now.

I first came to functional programming with fully immutable data structures when first using languages like Scala and Clojure about 4 years ago now. At the time I wondered how these data structures could be performant. Most of my development experience up until that time had been with OO, imperative languages like Java. I was imagining that in order to program with immutable data structures, one would have to be constantly making copies of an entire data structure that one wanted to “update”. I soon learned that this wasn’t necessarily the case because of what are called “persistent data structures”.

Persistent Data Structures

Persistent data structures allow multiple versions of a data structure to share parts of their underlying structure. So, for example, let’s say we have a list called myList1 that is fully immutable. Now let’s say we want to update this list by pre-pending one element to the head (first element) of the list. The list myList1 is fully immutable but that doesn’t mean that we can’t use its structure to create a new list with one extra element prepended to it called myList2. myList1 doesn’t know anything about the element that was prepended to it. Its head remains the same element that it always was. myList2 has a different head element but the rest of myList2 is exactly the same as myList1. So, in order to “update” myList1, there was no copying required. The diagram below depicts this:

One might say, “well that’s just a linked list and that’s what happens when you prepend something to a linked list”. However there is a difference. When programming with mutable, imperative data structures, the underlying data structure is encapsulated behind an ADT. ADT in this context stands for Abstract Data Type. Operations such as addFirst(myElement) are exposed as the List abstract data type’s interface. The underlying representation is hidden and, when prepending an element to the start of it, it is mutated with, in the case of a linked list implementation, the head private field being mutated to point to a node containing the new first element and that node pointing to the original first node. In the case of the underlying implementation being an array, the new first element needs to be inserted as the first element in the array or as the last filled element in the array depending on the choice of representation. There is no data sharing – instead there is only one version of the list that has become mutated.

This becomes more apparent if we consider a standard functional programming transformation such as:

The List in the above function is not a List from the Kotlin standard library. Instead it is a persistent list. This function uses the list that gets passed to it to construct a new version by essentially skipping all elements of the original that satisfy the predicate.Its head element is then the first element of the original that doesn’t satisfy the predicate and the rest of the elements in the original list are shared to make up the new list. Again, this can be shown better with a diagram:

Persistent list as an Algebraic Data Type

Earlier I described the List interface in Java and other imperative languages as an ADT meaning an Abstract Data Type. When considering how to go about implementing a persistent list in Kotlin, it helps to think about it as an ADT except that, in this context, ADT stands for Algebraic Data Type. In a previous blog post I gave a small intro to the concept of algebraic data types and showed how they can be used in Scala to prevent illegal states occurring in one’s code. When implementing a persistent list, instead of thinking of the List as an abstract data type with its representation encapsulated, it is easier to see it for what it is which is an algebraic data type. It has a type constructor meaning that one can construct different List types which hold different types of elements and it has only 2 data constructors. One data constructor creates an empty list – let’s just call that Empty. The other constructs a data value containing an element AND a list. This is often called Cons. One can think of the Cons constructor as a constructer which prepends an element to a list to create a new list. In this way the list algebraic data type is recursive. The diagram below depicts the constructing of 3 lists starting with an empty list and shows the recursive nature of the List algebraic data type.

Persisting the List in Kotlin

The code below shows one way to represent the List algebraic data type in Kotlin.

In the code above, the List algebraic data type is represented as a sealed class that can only ever have 2 implementations. Marking a class as sealed in Kotlin is a way of telling the Kotlin compiler that every subclass of the sealed class will be declared in the same file that the sealed class is declared in. In this way we can guarantee that a List can only ever be constructed to be an Empty list as declared by the object singleton subtype of List called Empty OR it can be constructed using the Cons data constructor with Cons being the other subtype of the List sealed class. The Cons data constructor creates a list by combining a head element AND an existing list. I have capitalised the words “AND” and “OR” in the previous two sentences to highlight that the List type is an algebraic data type – it is an OR type or “discriminated union” with its Cons data constructor being an “AND” type or “product type”.

In the main function in the above code we create three lists just like the ones represented in the previous diagram. Each list is represented by pre-pending (or “consing”) a head element to an existing list to create a new list. It can be seen that myList2 and myList3 share part of their structures with each other and with myList1. As myList1 is the singleton Empty list, its structure will be shared with any other brand new list that is ever created. The toString function is overridden for both the Empty implementation and the Cons implementation so that the list representation prints nicely in the console. Running the main function above produces the following output.

myList1: []
myList2: 1 :: []
myList3: 2 :: 1 :: []

Transformations

All the common functional transformations one would expect can be implemented on the persistent list represented in the above code. It’s usually best to implement what are called foldLeft and foldRight functions and then implement other transformations in terms of these. However to show these and all the common List transformations in here would result in quite a long blog post. To see these in action and to learn more about persistent data structures and their implementations in Scala and Kotlin, I highly recommend studying two fantastic books:
Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason
The Joy of Kotlin by Pierre-Yves Saumont

Both of these books are full of practical exercises for the reader to build persistent data structures for themselves. They are both well worth the time and effort needed to go through all of their exercises.

To finish off this post though, let’s look at an example of one transformation – an implementation of the dropWhile function that I talked about and depicted in an earlier diagram.

The output of the above is:

3 :: 4 :: []
1 :: 2 :: 3 :: 4 :: []
[]

The recursive nature of the List persistent data structure lends itself very well to a naturally recursive implementation of dropWhile. Often implementing operations over a persistent list recursively can lead to a stack overflow for large lists. Luckily Kotlin supports tail recursion. This is where the last expression to be evaluated in the a function is the recursive call to the function itself allowing the current stack frame to be re-used. The above solution is actually tail recursive because the else part of the if does not need to be evaluated if the predicate returns true. So whenever the recursive call is made, it is the last expression to be evaluated. We still need to tell Kotlin that we intend the function to be tail recursive by adding the tailrec keyword as below:

Unfortunately it’s not always as easy as simply adding the tailrec keyword. Often the implementation needs to change in order for a non tail recursive function to become tail recursive. This usually involves using an inner function or an auxiliary function to perform what can can be thought of as a loop only without any mutation. An example using this approach is shown below:

Conclusion

I hope this has served as an intro to persistent data structures and, in particular, the list persistent data structure. Whether they become part of the Kotlin standard library, time will tell but, either way, Kotlin is a beautiful and extremely practical and productive language that I love to use!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s