Notation For Balanced Indentation

Notation

One has to adapt the Balanced Indentation system to each new programming language. For this reason it is worth having a little bit of notation for writing down the rules. It also means that you can have a sensible conversation about how the system works with other people - and avoid the unstructured mess that seems to be so prevalent.

We start with the grammar of the programming language, which is usually written using BNF. Then what we do is to annotate the productions with indentation markers. The markers we use are:

  • » and « - these are the hard indent and outdent markers.
  • > and < - these are the soft indent and outdent markers.
  • || and | - which are the hard and soft nodent markers; they start a new line at same level.

Indent, outdent and nodent markers simply indicate where a line break and a change of indentation-level by +1, -1, or 0 tabs. Indent and outdent markers are paired together and may be nested. Indent/outdent pairs are always applied together; you can't apply an indent without applying the matching outdent. And nodent markers at the same level of nesting are grouped with the indent/outdent markers and applied too.

When we use a production rule it is mandatory to apply hard markers. Soft markers are conditionally applied if the line gets too long. To make life a little easier we may have several alternative versions of production rules. These rules are typically called short or long. Short rules are applied preferentially and long rules are fallbacks. When we need a formal notation for this, we simply write the lower priority rules with the ::== symbol instead of ::=.

Example

In C++ we allow a special short rule for guarded returns but a more general longer form. Both could apply to writing a guarded return but we would pick the short form provided the entire statement fits on a single line.

IF_EXPR ::= if ( EXPR ) return;
IF_EXPR ::== if ( > EXPR < ) { >> STMNTS << }