One joy of code golf is when a complex algorithm reduces to a short “magic” formula. codeglf.com hosted a challenge to find the shortest Python program to calculate the length of the longest increasing subsequence (LIS). LIS is a classic algorithmic task: given an array of numbers, what is the longest subsequence (not necessarily contiguous) that is strictly increasing? For example, given the array [3, 2, 3, 1, 4], the LIS is 2, 3, 4 of length 3. (For this particular problem, we only need to report the length, not the subsequence itself.)
I found a surprisingly short solution to this problem with a magical-looking mathematical formula. This article will explain how it’s derived.
#Calculates the length of the LIS of a list of postive integers
g=lambda a,b=0:[b:=b-(~b&b-64**x)for x in a][-1]%63Patience sorting
Solving LIS in sub-quadratic time is an interesting algorithmic problem by itself. Although speed is not the priority in code golf, I started approaching this problem by reviewing standard algorithms for LIS. One such algorithm can be phrased in terms of a card game, nicknamed “patience sorting” in reference to games like solitaire.
Let’s pretend that our input array is a deck of cards, each labeled with the corresponding number from the array. We deal the cards one by one, placing them into piles. There is only one rule: place the card face up on the first pile that shows a number at least as big, otherwise create a new pile. When we’re done, the number of piles is the length of the LIS! (It’s a bit more work to actually recover the LIS from the piles, because you need to pick a particular card out from each pile. See the Wikipedia article for all the details.) The intuition why this works is that if you have an increasing subsequence, each element must be dealt to a different pile, because the piles contain non-increasing numbers. So, the number of piles must be at least the length of the LIS.
Using our earlier example of [3, 2, 3, 1, 4], the game proceeds as follows:
- Our first card is
3, and we have no piles. So, we put it in a new pile. - Our next card is
2. It’s smaller than the3showing on the first pile, so we place it on top. Now we have one pile with2showing on top. - Our next card is
3. It’s larger than the2on our pile, so we create a new pile for it. Now we have two piles showing2and3on top. - Next is
1. We place it on the first pile, so now our piles show1and3. - Finally we have
4. It’s too large to place on any piles, so we create a new pile for it. We have three piles, showing1, 3, 4.
At the end of the game, we have three piles, and so the LIS has length 3.
Coding it up
One of my earlier attempts looked like this:
#LIS by simulating patience sorting
g=lambda a,b=[]:a and g(a,[b.pop(0)for y in b*1if y<a[0]]+[a.pop(0)]+b[1:])or len(b)ais the input (the cards we need to deal) andbcontains the numbers showing on the piles we have. We will call the function recursively to deal the cards.a and ... or len(b)is the base case of the recursion: ifais non-empty, deal the next card; otherwise, return the number of piles we have.- Our “dealing” algorithm is
[b.pop(0)for y in b*1if y<a[0]]+[a.pop(0)]+b[1:]. There are three parts:[b.pop(0)for y in b*1if y<a[0]]contains the piles ofbthat are less than the card we are dealinga[0]. We usepopto permanently remove these piles fromb, and we useb*1to clonebto prevent thepopfrom interfering with the iteration.[a.pop(0)]is the pile where we deal the card. Again, we usepopto permanently remove the card froma.b[1:]are all the piles ofbthat are left.
This comes in at 84 bytes. We can remove the recursion to streamline it a bit, yielding 77 bytes:
#77 bytes
g=lambda a,b=[]:max(len(b:=[b.pop(0)for y in b*1if y<x]+[x]+b[1:])for x in a)Switch to sets
I was stuck at 77 for a while, but the leaderboard had people with 69, so I still had work to do. At some point, I realized that our piles always show distinct numbers (because when dealing, if a pile already shows the number, you deal to that pile). So, we could try using a set instead of list. This allows us to rephrase our dealing algorithm:
#81 bytes
g=lambda a,*b:max(len(b:={*b}-{min([999]+[y for y in b if y>=x])}|{x})for x in a)Now the dealing algorithm works as follows:
- For each card
xthat we deal, - We create a set of all the piles
{*b}(this could just beb, except in the first iteration,bis a tuple, so we have to cast it to a set by doing{*b}) -{min([999]+[y for y in b if y>=x])}: Remove the smallest (i.e., first) pile with a number at leastx, using 999 as a dummy value if there are no such piles.|{x}: Addxto the piles.
We’ve lost a few bytes—now we’re at 81—but it feels like the “smallest pile” logic has room for improvement. Indeed, taking advantage that the cards are always positive integers, we can cleverly use range to subtract off the piles less than x:
#74 bytes
g=lambda a,*b:max(len(b:={*b}-{min({*b,999}-{*range(x)})}|{x})for x in a)Now we’re at 74 bytes: an improvement, but still short of the 69 on the leaderboard.
Big integers, small code
In both real code and golf, it’s often useful to use the bits of an integer to represent membership of a set, typically called a “bitset”. That is, a set of nonnegative integers \(n_i\) can be represented as \(\sum_i 2^{n_i}\). With this representation, some set operations become very streamlined: union is just bitwise OR, intersection is bitwise AND, and testing for membership is S & 1<<x.
My “eureka” moment was when I realized the smallest-pile logic could be phrased in terms of bit-twiddling operations, and the whole solution becomes easy to express with bitsets. Let b be the integer holding the bitset of our piles. To find the smallest pile no less than x, there are two steps: (1) b>>x<<x removes all piles less than x (2) p&-p is a bit-twiddling trick that only keeps the least significant bit. Putting them together, p=b>>x<<x; p&-p calculates the least significant bit that is greater than or equal to x. Our solution becomes:
#71 bytes
g=lambda a,b=0:max((b:=b-((p:=b>>x<<x)&-p)|1<<x).bit_count()for x in a)This brings us to 71 bytes, and there is a lot of potential for mathematical simplifications. For starters, ((p:=b>>x<<x)&-p) can be simplified to b&-(b&-1<<x) resulting in
#65 bytes
g=lambda a,b=0:[b:=1<<x|b^b&-(b&-1<<x)for x in a][-1].bit_count()Now we’re down to 65 bytes and overtook first place on the leaderboard! But there’s more to do. We can eliminate bit_count with a math trick. The general idea is that for a number written in base B, the sum of its digits modulo \(B-1\) is the same as the number itself modulo \(B-1\):
\(\displaystyle \sum_i d_i B^i = \sum_i d_i \mod(B-1)\)
bit_count is the sum of the digits in base 2. But we can’t use base 2 because mod 1 reduces everything to 0. We need a bigger B so that the mod operation doesn’t change the sum. For this problem, setting \(B = 2^6 = 64\) was enough, and our solution becomes
#58 bytes
g=lambda a,b=0:[b:=64**x|b^b&-(b&-64**x)for x in a][-1]%63The formula can be simplified even more, but rather than tinker with it by hand, I used pysearch to systematically search for the smallest formula. Although pysearch is limited to int64, I worked around that by searching for the formula in base 4 (using 4**x) and using small x, with the hope that it would generalize to base 64. A few minutes of searching unveils this beautiful piece of code golf:
#51 bytes
g=lambda a,b=0:[b:=b-(~b&b-64**x)for x in a][-1]%63