Saturday, April 30, 2011

Expressive Code and the Alternative Vote

One of the joys of writing code in Scala is how expressive it looks. Instead of dealing with the minutia of handling data, I can concentrate on what the code is actually doing. With experienced Scala programmers, that goes without saying. Newcomers have a harder time, because they are still a bit confused by grammar and vocabulary to pay proper attention to what is being said, so to speak.

The most striking example of that comes from the use of Scala's collection. There are many powerful collection libraries out there, but you are usually made very aware that you are handling a collection -- code to be hidden inside a class, to avoid contaminating business logic with it. Scala's collections, on the other hand, can easily fit in the highest abstraction levels of the code.

Let me take an example from the upcoming British referendum about the adoption (or not) of Alternative Vote. In this voting system, each voter ranks the candidates in order of preference. Depending on the specifics of its implementation, it may be possible or not to leave candidates unranked. The winner is the candidate that manages to get 50% of the votes, with the the candidate with the least votes being removed and his or her votes being reassigned according to the voter's preferences until some candidate gets elected.

So, let's consider how one could implement the algorithm that decides who the winner is. Let's say we have a set of candidates, identified by their names, and a list of votes. Each vote is a list of candidates ranked by preference. From that, we have to produce the name of the winner. In other words:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {

So, where do we begin? There are three essential tasks: we need to tally the votes for every candidate, we need to see if the candidate with most votes has at least 50% of all votes, and we need to discover who's the candidate the the least amount of votes.

Let's say we did these tasks, and now we know who the candidate with least and most votes are, and we have a tally of votes for all candidates. In this case, we can return the winner very easily, if there's one:

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2) mostVotesCandidate

Which still leave us the problem of how to handle the (expected) usual case of no candidate reaching 50% on first tally. There's an easy solution for that, though: just remove the candidate with least votes from the pool of candidates, and try to get a winner out of that. It's easy, because we already have a function that does that:

    else elect(candidates - leastVotesCandidate, votes)

Of the three tasks posed earlier, two are pretty simple as well: deciding who's the candidate with least and most votes. We could sort all candidates by vote and take the first and last candidate, but we don't actually need to sort anything: just knowing top and bottom is enough. We can do that like this:


    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

Now all that's left is finding how many votes each candidate has. We could pick the first preference in each vote and do a tally on that, but some candidates may have been removed from the pool. Instead let's say the valid candidates of a vote are the ranked list of candidates for a vote that are still in the running. We can compute that for a vote by doing:

    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates

This doesn't read right, actually. A minor complain some people have about collection methods is that filter seems to do exactly the opposite of what's wanted: if you say filter X (those for which X is true), then it will keep X instead of discarding it. So, when we say "filter candidates", it will keep these candidates in the vote, and discard the rest.

The other non-obvious thing about this line is "candidates" itself. What does it mean to say "filter candidates"? Well, "filter" takes a function which, given a value of the collection, will return true or false depending on whether that value must be kept or discarded. That means "candidates" must be a function which, given the name of a candidate, returns true or false.

However, "candidates" is a set! We declared it so in the very first line of code presented, didn't we? Well, in Scala a set is also a function that tests whether a value is present in the set or not, returning true or false accordingly. In fact, sequences and maps are also functions in Scala, the former from indices to values, and the latter from keys to values.

Well, enough about that. We can now just take the first candidate in the list of valid candidates and tally the votes... as long as all candidates are ranked in each vote. If that is not the case, then a vote may well not contain any candidates still in the running, in which case the list of valid candidates will be empty.

Since this example comes from the British referendum, and the AV proposed in that referendum does not require votes to rank all candidates, we'll deal with that. Let's say the first valid candidate of a vote may be some candidate or no one. That is,

    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption

We can then use this to get a list of first choices for all votes with a valid first candidate:

    val firstChoices = votes flatMap firstValidCandidate

The votes for a candidate are the number of first choices for that candidate. We'll make a map out of it to avoid recounting that every time.


    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

Finally, we have to do something about the possibility of no candidate reaching 50%, which is possible in a system where not all candidates are ranked. I don't know how the proposed system will do in that case, but I'll just choose the most voted candidate if there aren't more than two.

With that fix in, this is what the whole code looks like:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {
    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates
    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption
    val firstChoices = votes flatMap firstValidCandidate
    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2 || candidates.size <= 2) mostVotesCandidate
    else elect(candidates - leastVotesCandidate, votes)
}

While there's a Scala oddity here and there, the code is pretty clear for all that it is doing.