Trifle: Lists and CONSequences

Lists are a spectacularly important data type. Even more so for lisp languages, as a lisp program is just a text file with some list literals.

In Trifle, we want our list structure to be easy to reason about, fast, and consistent. After exploring a range of possibilities, we settled on using dynamic arrays.

The major possibilities we considered were:

Sparse arrays. These are often harder to reason about. JavaScript is the most widely used language with this feature, yet many popular list libraries (such as underscore.js) do not support sparse arrays. Sparse arrays are also slower, leading to silly situations where user functions can be faster than interpreter built-ins.

Cons cells. This is the classic lisp datastructure. It’s easy as an experienced lisper to forget how awkward cons cells are.

Lists made out of cons cells have become similar to sparse arrays in that users make simplifying assumptions about their structure. For example, dash.el assumes that all lists are proper lists. Try to write a map function that accepts arbitrary cons cell structures and throws an error if the ‘list’ is improper or cyclic. It’s hard, and it’s inefficient.

Cons cells also introduce infix syntax (1 . nil) (bizarre in a lisp) and have strange naming conventions. A cons cell is really just a 2-tuple with the first element named ‘car’ and the second element ‘cdr’. This leads to ugly function names like caddr for getting the third element of a list.

To make matters worse, cons cells are as slow as linked lists but less versatile. Some lisp machines improve performance with cdr-coding but this just allows cons cell lists to approximate dynamic arrays. Whilst some list operations benefit from using linked lists (e.g. insert!), it’s not actually possible to implement push! or pop! as functions when using cons cells.

Linked lists. Linked lists are a more popular option and fix several of our objections to cons cells. Statically typed languages often default to linked lists (e.g. Haskell) and offer stronger guarantees about the well-formedness of the lists.

Proper linked lists do provide push! and pop!, and immutable singly-linked lists allow you to share their tails, reducing memory usage. However, the performance overhead of traversing the list with poor cache locality means linked list performance is frequently outperformed by dynamic arrays.

Dynamic arrays. Dynamic arrays are fast, since you’re accessing contiguous memory. Finding their length is free and they are always well-formed. Writing a map function is now trivial.

In Trifle, lists are mutable, zero-indexed, and accessing values out-of-bounds is an immediate error. Parsing s-expressions returns lists. We also benefit from using a datastructure that will be immediately familiar to users of many other languages.

This is a fundamental design choice that can only really be changed in the earliest stages of a language’s development. Trifle lists offer a nice abstraction in that we can expose the same API for lists, strings and bytestrings. Only time and many lines of code will reveal if our decisions were the best, but we are excited to see what form Trifle programs will take.

Recent Posts

These Weeks in Remacs

This Week in Remacs

Announcing Remacs: Porting Emacs to Rust

Introspecting Glue Code

Searching A Million Lines Of Lisp