Compressing JavaScript with JavaScript

(This is the fifth article in the series for Golf Horse: Wordle.)

eval is a nightmare for code maintenance, but a dream for code golfers! eval allows us to create and run JavaScript on-the-fly, creating opportunities to use metaprogramming to save space.

As I was nearing completion of my submission for the Golf Horse: Wordle challenge, I had about 620 bytes of JavaScript code and ~10KB of compressed data. It looked something like this (parameters have been zeroed out to protect my lead!):

At this point, I felt I had exhausted nearly all my ideas both for improving the compression algorithm as well as reducing code size. But there was one thread I hadn’t yet pulled on: would it be possible to compress the JavaScript code itself?

First, I tried compressing the source with a Huffman decoder, because several other Golf Horse entries have publicly shared their implementation. Even with a highly optimized decoder, though, the size of the decoder was more than the savings.

With Huffman appearing hopeless, I instead started looking into simple string substitutions. I had a lot of recurring substrings in my program, such as “[0]” and “[1]” for looking into the context data, and “.map(x=>” for array operations. The general idea is as follows:

Identify common substrings in my program.
Create a mapping between unused symbols and the common substrings. Note that we have lots of unused symbols available, such as most of Unicode 0-31.
In the source code, replace common substrings with their corresponding symbols.
At run-time, re-insert the common substrings.

The decompressor

Steps #3-4 were the easy part, although I did spend quite a bit of time micro-optimizing the code:

w="";
for(x of"Program\x00with\x01substitutions\x02")
  w+="substring1#substring2#substring3".split`#`[x.charCodeAt()]||x;
eval(w)

/*
w = Programsubstring1withsubstring2substitutionssubstring3
*/

This decompressor costs about 60 bytes, so we’ll need at least that much savings to justify it.

Let’s break down the expected savings from this approach. If we have a substring of length n, applying this compressor to that substring:

Costs us n+1 bytes to store in the substring lookup string (n for the string and 1 for the # separator)
Saves n-1 bytes at each location we do the substitution (the -1 is because the symbol we substitute is 1 byte)

Doing the math, we see we save space if n=2 and there are 4 substitutions, if n=3 and we make 3 substitutions, or n>3 and we make 2 substitutions.

Next, let’s look at how we can find these profitable substitutions.

Finding common substrings

The harder part is steps #1-2, where we find common substrings to replace. The reason this is unexpectedly hard is because common substrings often overlap, and you can get very different savings depending on which substrings you choose to replace. Consider, for example, this program:

x[i++]+x[j++]+x[i++]

One option is to replace 3 occurrences of “++]“:

x[i@+x[j@+x[i@

As discussed above, this will cost us 3+1 bytes and save us 3*(3-1), for a total of 2 bytes saved. There are no further profitable substitutions.

On the other hand, had we replaced the 2 occurrences of “x[i++]“,

@+x[j++]+@

This will cost us 6+1 bytes and save us 2*(6-1), for a total of 3 bytes saved. So, this is the better substitution to make.

It seems like a hard problem (NP-complete, maybe?) to find the optimal set of substitutions, so I coded up a simple greedy algorithm:

Find every common substring with length of 2 or more.
For each such substring, calculate the savings if we were to replace it.
Do the replacement on the substring with greatest savings.
Repeat #1-3 until no more profitable replacements are found.

This seemed to work well enough, better than anything I could do by hand.

Un-optimizing for gains

Unfortunately, after running this compressor, I still wasn’t saving any bytes. However, I could still try to modify my program to make it more compressible. Often this meant increasing the size of the uncompressed code!

As a representative example, suppose we want to minimize this JavaScript:

a=x[i]+1;
b=x[i]*x[i];

Any code golfer would immediately optimize this to:

y=x[i];
a=y+1;
b=y*y;

That saves 2 bytes. Indeed, I had made many similar micro-optimizations in my code. However, had we run the original code through our compressor, it would substitute for “x[i]“:

a=@+1;
b=@*@;

This saves 9 bytes of code and costs 4+1 bytes in the decompressor, for a total savings of 4 bytes. We’re better off without the JavaScript micro-optimizations!

I spent hours experimenting with undoing micro-optimizations, as well as adjusting code style to create more common substrings. In the end, I saved 11 bytes. Probably not worth the time, but it was fun!

Compressing JavaScript with JavaScript

2 thoughts on “Compressing JavaScript with JavaScript”

David Chambers says:

March 18, 2024 at 10:35 pm

I’ll right away grasp your rss as I can not find your email subscription hyperlink or e-newsletter service.
Do you’ve any? Please let me realize so that I could subscribe.
Thanks.

1. lukeg3 says:
  
  March 20, 2024 at 6:30 am
  
  Hi, the RSS feed is available at https://www.luke-g.com/feed/. I don’t have any other subscription options currently, but it’s something I’ll consider in the future!

Compressing JavaScript with JavaScript

The decompressor

Finding common substrings

Un-optimizing for gains

2 thoughts on “Compressing JavaScript with JavaScript”

Leave a Reply Cancel reply