Algebraic data types(ADTs) sound pretty esoteric but they are very practical and they are a key feature of typed functional programming languages like Scala. They allow us to model our domains in a type safe way and to delegate some of the testing burden to the compiler.
Two of the essential words to understand in the context of ADTs are the words “and” and “or”. In fact I believe that even just knowing the meaning of these two words in the context of domain modelling can take you a long way in modelling complex domains. When the domain conforms strictly to the words “and” and “or” we can make it impossible for illegal states in our application to be created (unless one were to get pretty hacky with reflection! – which would be caught in a pull request as quickly as if one decided to delete important tests). That being said, there are types that can be modelled that have one implementation type or that are made up of one implementation type.
So this “and” and “or” business. If we have a “thing”, we can use the words “AND” or “OR” to describe that thing. If we use the word “and”, we say “this thing is made up of this AND that AND that AND…” and so on. If we use the word “or”, we say “this thing is either this OR that OR…” and so on. In the case where something is made up of one implementation type, it is still similar to other “AND” types but it just happens to have one thing so the “AND” word isn’t needed. The same is true for “OR” types. There can be an “OR” type where there is only one implementation type to choose from.
So lets look at an example. We are going to model a domain of Jazz bands. One common way for a jazz band be constructed is by combining a piano player AND a bass player AND a drummer. This is commonly called a piano trio. Because we are combining musicians to form the piano trio, we say that the piano trio is an “AND” type or, to be more formal about it, a product type.
Here is what defining the piano trio looks like in Scala:
Instances of this class are immutable and have to be made up of a piano player, a bass player and a drummer. I will get to the types PianoPlayer, BassPlayer and Drummer shortly. The case class is final so that it can’t be overridden.
As much as I love listening to swingin’ piano trios, there’s a lot more types of jazz bands out there. For our application, we only want to make it possible to create piano trios, quartets and quintets. These are defined with the following product types.
So our application is only going to handle piano trios, quartets and quintets – but how do we enforce that? How can we make sure that whenever we pass a jazz band to a function that it will only be either a piano trio OR a quartet OR a quintet. The answer is in the “OR” word – more commonly referred to as the sum type. This is where we specify that an instance (also called a data term value) is either this OR that OR that etc.
To define a sum type in scala, we can use a sealed trait and then define implementations of that trait in the same scala file. The sealed keyword ensures that no other instances of the trait can be created outside of the file and it allows the compiler to check for exhaustive pattern matching as I will show shortly – so, when pattern matching on implementations of the trait, the compiler will let us know if we have missed a possible type of jazz band (one or more implementations of the trait).
So we define our Jazz band sum type in scala as follows:
When creating these jazz bands, we want to capture the name of each musician in the band. However, if our case classes just took strings as parameters, we would lose a lot of type safety. If, for example, the drummer parameter was a string to represent a drummer’s name and not a Drummer type, we could easily just pass in the name of a bass player instead. A piano trio with two bass players instead of a bass player and a drummer would be pretty interesting but it wouldn’t be what our application expects – we don’t want this illegal state to be possible.
So the first step in preventing an illegal state like this is to create types for are musicians that each contain the musician’s name. Each one of these types is actually a product type but it only contains one thing – a string to represent the name. So there isn’t really an “AND” here but a type with one property is still called a product type. It is also possible to have a sum type where there isn’t an “OR” describing it. This is when there is only one choice. We could, for example, create a trait called PianoPlayerT and one case class implementing it – so this would be a single case sum type or single case union:
However, to keep things a bit simpler, lets just define a separate product type as a case class for each musician as follows:
So this is great – instead of just passing strings around for the musician names, we have our nice types to pass around so we don’t mix up a bass player with a drummer……or do we really have this guarantee? who is to stop someone simply creating a drummer by passing a string for the name of someone who isn’t a drummer?
Paul Chambers was an amazing bass player – a stall-worth of hard-bop bands of the 50s and 60s – but now we’ve just made him a drummer!
In order to prevent this, we need to make it only possible to create a BassPlayer inside our tightly controlled Domain module. We do this with a combination of making the constructor private and exposing a “smart constructor”. This smart constructor returns an Option type which will be the Drummer wrapped in the Some Option case if the name we pass in is that of a drummer OR it will be the None Option case if we pass in a name of someone who is not a Drummer (notice the use of “OR” back there – Option is a sum type with two choices).
The Drummer type is defined inside the Domain module and its constructor is private. So from outside the Domain module, an instance of the Drummer can’t be created with
However in scala an apply function can be defined in a companion object to the type that you have defined. Scala creates this under the hood but you can create your own by defining it in the companion object (an object/module with the same name as the type you defined). The default scala implementation of it allows you to create a Drummer instance by simply calling:
However we have defined our own in the Drummer companion object and defined it as a “smart constructor” which will validate that the name you give it is actually the name of a drummer – in our case, we just check if the name is in a list. If it is, it will return the created Drummer wrapped in the Some Option case or if the name is not on the list, it will return the None Option case.
Scala case classes also have a copy method though. So, in order to prevent someone simply copying a drummer instance and changing the name of the drummer, we have overridden the copy method of the Drummer case class to simply return a straight immutable copy of the Drummer.
From a different module, lets try and create two drummers – one valid and the other invalid.
drummer evaluates to
scala> res0: Option[MISU.Domain.Drummer] = Some(Drummer(Max Roache))
and invalidDrummer evaluates to
scala> res1: Option[MISU.Domain.Drummer] = None
So, from outside the Domain module, we can’t create an invalid Drummer.
We can extend the Domain module to include our other musician types along with their smart constructors. The full domain with these and the JazzBand ADT is shown below:
Now if we want to combine name strings to form one of our jazz band types, we can chain together the creation of each musician using flatMap and map on the Option type. Calling flatMap on an option allows you to pass in a function. If the Option is the Some case, the function will be evaluated on the value that is wrapped in the Some case and it will return another Option – so now we can potentially have a Some wrapping a Some but flatMap will flatten this down so that there is only one Some wrapper. In the case where the original Option is None, then the passed in function is not evaluated and None is returned from flatMap. So, the first time we encounter a smart constructor for a musician that returns None, the chain of flatMaps is essentially short circuited to return None.
This is shown in the scala code below:
This is can be made much simpler using a for comprehension in scala which will do the above for us under the hood but it gives a much cleaner syntax:
Evaluating a call to this function below demonstrates the case where Some containing a the piano trio is returned when all names correspond to correct musicians and when None is returned if one of the musician names is not valid.
scala> import MISU.Domain._ createPianoTrio("", "", "") import MISU.Domain._ scala> res0: Option[MISU.Domain.PianoTrio] = None scala> import MISU.Domain._ createPianoTrio("Bill Evans", "Paul Chambers", "Max Roache") import MISU.Domain._ scala> res1: Option[MISU.Domain.PianoTrio] = Some(PianoTrio(PianoPlayer(Bill Evans),BassPlayer(Paul Chambers),Drummer(Max Roache)))
Its worth noting that, even though the constructors for the musicians (PianoPlayer, TrumpetPlayer etc.) are all private, it is still possible to pattern match on these constructors from outside the Domain module.
So now our application can pass around jazz bands safe in the knowledge that, when it gets it encounters an instance of JazzBand type, that instance has to be a PianoTrio, OR a Quartet OR a Quintet and it has to contain the types of musicians that the application expects for each of these bands.
This lets the compiler check things that we would otherwise have to write extra tests for and makes our code nice and robust – no chance of any strange piano trios with two bass players!