The Definitive Guide To Syntax Highlighting27 September 2014
What do you expect your editor to highlight? What are the different ways that we can highlight code without calling external tools? Whilst most editors have converged on a common set of base functionality, there’s still innovation occurring in this field.
The limitation of highlighting tools is that you can’t use all of them at the same time. We’ll explore what’s available to help you choose.
I’m taking these examples from Emacs, but many of these are available on other editors too. We’ll limit ourselves to programming language highlighting that the editor itself can do, ignoring lint tools and VCS integrations.
A programmer typically expects syntax highlighting to look like
this. Different lexical categories – function names, keywords,
comments and so on – are each shown in a different colour. Virtually
all editors provide this, with the notable exception of
Emacs, this is largely done with
font-lock-keywords is usually used too.
Simple lexical highlighting is already useful. Syntactic mistakes, such as typos in keywords or unclosed string or comments, become obvious.
A note on screenshots: The above image is the default colour scheme in Emacs. In other images I’ve customised the styling to only show the highlighting that’s related to the feature mentioned. The code samples aren’t particularly idiomatic or elegant, I’ve simply chosen them to show off relevant parts of the syntax.
It’s interesting to see that default Emacs colour scheme does not choose a washed out grey for comments, preferring to make comments prominent.
Extended Lexical Highlighting
Depending on your taste for ‘angry fruit salad’ highlighting, you
might choose to distinguish more lexical classes. Emacs has
font-lock-maximum-decoration to adjust how many distinct things are
highlighted, but it’s rarely used in practice.
There are variety of minor modes that offer additional highlighting of specific syntactic features. What’s great about these minor-modes is that they compose nicely, allowing them to be reused for highlighting many different languages.
This is highlight-numbers. It’s a simple, non-intrusive extension to highlighting that makes sense in pretty much every language.
Fun fact: Vim has a Common Lisp highlighting mode that highlights more syntax classes that Emacs does! Here’s a screenshot. One great feature of Vim’s mode is highlighting quoted values. This is available with highlight-quoted. As pictured above, it highlights quotes, backticks, and quoted symbols.
All these minor modes are matters of preference. If a major mode
developer likes this extended highlighting, they tend to include it in
their major mode anyway. In the above example,
highlights all infix operators (in addition the standard Ruby highlighting).
Some modes include a full parser rather than just a lexer. This enables more sophisticated highlighting techniques.
js2-mode is the best example of this. js2-mode includes a full-blown recursive-descent ECMAScript parser, plus a number of common JS extensions. This enables js2-mode to distinguish more syntax types. For example, it can distinguish parameters and global variables (pictured above).
This is an amazing achievement and even allows the editor to do many
checks that are traditionally done by lint tools. Highlighting globals
is particularly useful because use of a global is not necessarily an
error, but it’s useful information about the code. js2-mode can also
be configured to highlight globals specific to your current project or
JS platform (see
Emacs also offers a number of specialist highlighting modes for s-expressions.
paren-face is a simple minor mode that assigns an additional face to brackets, enabling you to style brackets separately. It’s intended to fade out the brackets, so you can focus on the rest of your code.
the opposite approach. Each level of brackets is assigned a unique face,
enabling you give each one a different colour. This works particularly
well when using
cond, as it’s easy to spot the different boolean
By default it allows nine levels of nesting before cycling colours
rainbow-delimeters-max-face-count) but you will have to choose
a tradeoff between more levels and contrast between the colours of the
different levels. I settled for six levels that are very distinct (the
defaults are rather subtle).
If you like rainbow-delimeters, rainbow-blocks applies the same technique, but colours everything according to the nesting depth. It’s fantastic for seeing nesting, but it does limit how much else you can highlight.
highlight-stages specifically targets quoted and quasi-quoted s-expressions. It enables the reader to easily spot unquoted parts of a quasi-quote, and is particularly useful if you’re nesting quasi-quotes.
Standard Library Highlighting
Another school of thought is that you should highlight all functions from the language standard or standard library. Xah Lee subscribes to this philosophy, and has released JS and elisp modes that provide this.
This is difficult to do in elisp as it’s a lisp-2, and this mode can
confuse variables and functions slots (so
list is highlighted even
when used as a variable).
The default Python mode in Emacs takes a similar approach,
highlighting the 80 built-in functions and some methods on built-in
types. This is hard to do in general, and
incorrectly highlight similarly-named methods on other types or
methods whose name matches built-in functions.
Docstrings are conceptually between strings and comments: they’re for
the reader (like comments), but they’re available at runtime (like
strings). Emacs exposes separate faces for comments, strings and
Elisp docstrings may also contain additional syntax for
cross-references. Emacs will highlight these differently too (though
their primary purpose is linking cross-references in
Some languages support elaborate syntax in their comments, both to help the reader and to aid automatic documentation tools. In this example, js2-mode offers additional highlighting of JSDoc comments.
Another important area of highlighting is to highlight elements based on where the cursor (‘point’ in Emacs terminology) is currently located.
The most basic contextual highlighting is showing the matching bracket
to the bracket currently at the cursor. This is part of Emacs, but off
show-paren-mode will switch it on.
Highlighting the current line is a very common feature of editor
highlighting, and Emacs provides
hl-line-mode for this. This works
well for line-oriented programming languages.
When dealing with s-expressions, you can take this a step further with hl-sexp. This shows the entire s-expression under point, avoiding confusion when editing deeply nested expressions.
highlight-parentheses takes a more subtle approach. It highlights the current bracket as ‘hot’, and highlight outer brackets in progressively ‘cooler’ colours.
The last example in this section is the superb
is invaluable for showing you where else the current symbol is being
used. highlight-symbol is conservative and only does when the point
isn’t moving, but set
highlight-symbol-idle-delay to 0 to override
highlight-symbol-mode is particularly clever in that it’s able to
inspect the current syntax table. This prevents it from becoming
confused with strings like
x-1, which is usually a single symbol in
lisps, but equivalent to
x - 1 in many other languages.
There comes a point where automatic highlighting isn’t sufficient, and
you want to explicitly highlight something. Emacs provides
hi-lock-mode for this, and supports a special comment syntax that
allows other readers to see the same highlighting.
It’s also possible to configure Emacs to change how it displays the text itself.
There are several modes in Emacs for substituting strings like
<= with their mathematical counterparts. Emacs 24.4 will
also include a
prettify-symbols-mode that provides this.
This works very well when editing LaTeX documents, but can be tricky
with code. In cases like
lambda you’re replacing with a shorter
string, which means you end up indenting differently depending on
whether you have substitutions switched on.
glasses-mode is a fun minor mode for users who don’t like
CamelCase. It displays camel case symbols with underscores, so
Foo_Bar, without changing the underlying text.
One novel approach to highlighting code is to give each symbol a different colour. You simply hash each string and assign a colour accordingly. This means that variables with similar spellings get completely different colours.
This was popularised recently by an article by Evan Brooks and color-identifiers-mode was released as a result (pictured above). KDevelop has had this feature for some time, calling it ‘semantic highlighting’. IRC clients often use this technique for nickname highlighting.
Whilst powerful, it’s tricky to get right. Too few colours, and
different symbols end up the same colour. Too many colours, and it’s
hard to visually distinguish some pairs of colours. In the above
encodeURIComponent are quite
similar. This small code snippet does not really take full advantage
of hashed-based highlighting: it’s most effective when you have a
larger piece of code with more distinct symbols.
Finally, self-hosting environments, such as Emacs or Smalltalk, can offer additional highlighting possibilities. highlight-defined enables you to highlight functions, variables or macros that are currently defined.
This works well for spotting typos in variable names, but it’s a
little more sophisticated. In the above image, we can see that
fibonacci has been evaluated, so the recursive calls are
highlighted. We can even see whether we’ve forgotten to evaluate any
Pharo (a Smalltalk implementation) is also able to do this. The methods of classes (called ‘selectors’) may be changed at any point, but the environment can introspect to see if the current selector is appropriate for the value it is being called on.
This is quite different from the traditional Java-style IDE integration, as it’s based on runtime information in the current process, instead of static analysis.
In practice however, many of the benefits of introspective highlighting are provided by calling an external language-specific lint tool from the editor.
It’s really hard to compose syntax highlighting tools. Some of the examples here are very intrusive (particularly rainbow-blocks and color-identifiers-mode), preventing you from using them in addition to other tools. The contextual highlighting tools are the best in this regard.
There’s a lot of information that could be displayed by the editor, but relatively little can be shown at once. The primary options for highlighting are only text colour, background colour, weight, lines (underline, overline, strikethrough) and fringes (colours shown at the left edge of the editor window).
If you’re writing a highlighting tool in Emacs, try to define your own faces wherever possible. For example, highlight-stages doesn’t provide a face, so it can only highlight quasi-quotes by changing the background colour. If you’re already using the background colour to highlight something else, you cannot make highlight-stages use underlines instead. I had similar problems with modes that dynamically define faces, as you can’t customise them in the normal way.
When you release a highlighting tool, please include screenshots. It’s amazing how many tools that I’ve listed have no screenshots on their GitHub pages.
Personally, I like angry fruit salad. Lots of contrasting colours for different lexical classes, plus tons of contextual highlighting, is the sweet spot for me. Experiment, and see what suits you.