The Computer Language Benchmarks Game is an amazing project, featuring some meticulously optimised programs.
However, it’s not always easy to interpret the results, to see why some programs or languages are faster than others. It’s also not clear how representative the test programs are of typical performance of that language.
Instead, I thought it would be interesting to measure the performance of a single aspect of programming languages, and explore why the implementation led to that performance.
The Benchmarks Game defines a pidigits benchmark which measures the performance of arbitrary size integers (‘BigInt’), and test programs are widely available.
However, many of the test programs are just using a foreign-function interface to call out to a highly-optimised C numerics library. Instead, I decided to constrain my test programs to only use BigInts provided by the language or its standard library. It also ensures our test programs are fairly readable.
Limiting our benchmark to languages with built-in BigInts significantly limits the number of languages we can test. I decided to include multiple implementations of the same language to see the variation within the same language.
We aren’t requiring all the test programs for the same language to be identical. I wasn’t able to run the same program on all Scheme implementations anyway.
By default, The Benchmarks Game shows the same test program for different implementations. It also allows users to submit programs that improve performance for one implementation at the expense of another, and we have used those programs here.
I ran these programs an AMD Phenom X4 965 with 4 GiB RAM. The kernel was Linux x86_64 3.16.4 and the exact code I used was in this git commit.
So, let’s look at the results!
Wow, we can see a huge range of performance here. Click and drag on the chart to zoom in. We’ll take a look at how the different languages have been implemented, starting with the slowest.
The slowest BigInt here is PHP. PHP implements its arbitrary size arithmetic in C. However, it doesn’t have a BigInt data type, so its arbitrary size numerical functions take and return strings! For example, check out the docs for addition. This adds huge overhead as the numbers get larger, because an increasing amount of time is spent serialising and deserialising strings representing numbers. If you watch the test program running in a shell, the linear slowdown is very clear.
The next slowest is JRuby. JRuby depends on the BigInteger.java implementation provided by the JDK. As a result, it’s unlikely to outperform the Java test program (which also users BigInteger.java). The performance difference between OpenJDK and JRuby is presumably due to the additional interpreter overhead in JRuby.
MRI has its
own BigInt implementation written in C,
which uses GMP (the GNU MP Bignum Library) when available.
ldd reports that
linked to libgmp on my system. In spite of this, performance is still
fairly slow. I presume the interpreter overhead is high in this benchmark.
Rubinius was the fastest Ruby implementation in this benchmark. The design of Rubinius is somewhat based on the design of Smalltalk-80 as described in the ‘blue book’. Much of the interpreter is written in Ruby, but the BigInt implementation is C++. Beating any GMP based implementation is very impressive.
We’re seeing a significant range of speeds for Scheme implementations. It’s interesting to note that Guile and Racket are JIT interpreters, and Gambit is interpreted in this benchmark (I couldn’t get the compiler to run on my machine).
Gambit is a self-hosting Scheme compiler (although it offers an interpreter too). Gambit implements numerics in Scheme. This is extremely impressive technically and makes this compiler more hackable.
SBCL is a very well-optimised Common Lisp compiler, with some rather elegant compiler implementation features. Its BigInt implementation is written in Common Lisp and runs very quickly. SBCL includes GMP bindings but as GMP’s memory representation of BigInt is different, it’s not a drop-in replacement (though it may be in the future).
Rust demonstrates superb performance for a young language, with BigInt being implemented in the host language. Rust is seeking to be standalone systems language, and BigInts will actually be an external library in a future release.
Python is often a slow language, but its performance here took me completely by surprise. CPython has a BigInt implementation written in C. CPython includes a number of interesting numerical optimisations, including reusing integer objects between -5 and 256. PyPy is a little slower here, as its JIT is unable to optimise this code.
Julia takes a ‘batteries included’ approach, especially to its numeric support, so it uses GMP through its FFI. Julia targets scientific computing and carefully monitors numeric performance, so the strong performance here is unsurprising.
Go is the most heavily hand-optimised language for this benchmark. Many of Go’s arithmetic functions have been written in hand-optimised assembly! If you’re not on an optimised architecture your code will use an equivalent Go implementation. The performance numbers in this benchmark are deservedly impressive.
Finally, GHC uses GMP, and its performance is nothing short of extraordinary. GHC is clearly tuned to call into GMP with extremely low overheads.
The GNU Multiple Precision Arithmetic Library is really, really fast on a wide range of hardware and over a wide numeric range. This is a result of many years of careful optimisation.
GMP isn’t perfect. It is licensed under the GNU Lesser General Public License (LGPL), which presents some problems with statically linked programs. When implementing a language you don’t want to constrain the licensing of programs compiled with your compiler.
It’s also important we don’t lose sight of hackability. A language implementation that has its own BigInt implementation is often easier to work on. In some cases, that may be a higher priority than raw performance. Any language worth its salt provides a FFI so users with major performance needs can always call GMP themselves anyway.
There’s also the interesting question of whether a language should provide fixed-size or arbitrary-sized integers by default. The Rust community has wrestled with this. Other languages, such as Julia, settle for providing 64-bit integers by default (on an x86-64 machine) in the hope that they’re ‘big enough’ for typical usage without requiring the memory allocation of arbitrary sized integers. Others have commented that hardware could support BigInts much more efficiently than it does today.
Finally, it’s very hard to do a fair test, and I strongly suspect my methodology isn’t ideal. I have collected these test programs from a variety of sources in order to have test programs that met my criteria. I’m sure some these programs have significant scope for optimisation still.
I hope you enjoyed this exploration of language implementations. Languages are not fast or slow; implementations are. All these implementations are amazing, production-ready feats of engineering, and I’ve barely scratched the surface.