Exceptions, Notes Towards Transactional Programming

Let me begin with a very short apology for the unexpected length of this essay, which has ended up being written over a matter of three weeks. I started with the good intention of explaining some issues with handling exceptions in real programming languages, kicked off by a conversation with Luc Beaudoin. But I started pulling in many tightly connected issues and became an argument for the idea for transaction based programming.

I have decided to frame this as a proposal for Ginger, as it is easier to see in a concrete syntax. Corrections and comments are all welcome!

[1] Introduction

One of the unresolved topics for Ginger is exception handling. Over a decade ago when we were mapping out Spice, Chris and I felt that exceptions in modern day languages were unsatisfactory. We believed there was a natural relationship with coroutines, so that a thrown exception was better understood as a simplification for an emergency switch to a supervisor thread/coroutine. But with so many other more popular topics to work on, exception handling got neglected.

I returned to the topic in April this year, prompted by trying to write Luc an essay on defensive programming (unsent). The focus of the essay was that defensive programming was (largely) a system for composing partial functions. As I wrote I realised I was explaining the topic much better than before and the essay quickly went off-track as I chased a definitive description of exceptions and exception handling. 

So rather than an essay on defensive programming, this email sets out an account of exceptions that I believe captures the notion that Chris and I were appealing to over a decade ago! It covers the motivation for throwing and catching exceptions, the implementation options of partial functions, a novel explanatory framework for exceptions and finally a proposal for how exceptions might work in Ginger.

If you are in a hurry, skip to section [5] and [6] to get a quick overview of where we're headed. The rest provides the background as to how we ended up with this viewpoint.

[2] Motivations for Throwing and Catching Exceptions

Exceptions have become a standard feature of modern programming languages. But unlike other features such as "if" or "class" they have a limited number of well-defined uses and programmers are typically strongly discouraged from using them in other ways. We can easily list these 'legitimate' uses of exception throws and catches. 

[2.1] Fast, Non-Silent Failure

A basic reason for throwing exceptions is to guarantee that the failure modes of a method are non-silent and immediate. The easiest way to appreciate this is to consider an operation such as int java.lang.Math.abs( int n ). This function is defined to return the unsigned value of the integer n, which it does for all but one value Integer.MIN_VALUE.

In the case of Integer.MIN_VALUE, the most negative integer that Java can represent, java.lang.Math.abs does not change the sign but leaves it as the most negative integer. This is most naturally regarded as a silent failure. Such a failure might permit computation to proceed for a very long time before it was obvious things had gone badly wrong. 

Programmers are therefore inclined to make their code fail-fast by detecting the problem conditions and throw a exception. Here is how int java.lang.Math.abs might have been written for integers:

int abs( int n ) {
    if ( n >= 0 ) {
        return n;
    } else {
        n = 0 - n;
        if ( n < 0 ) throw new IllegalArgumentException( "abs not defined for java.lang.Integer.MIN_VALUE" );
        return n;
    }
}

[2.2] Simplify Method's Main Case

The typical reason for throwing an exception in a method is to simplify the most common usage. For example, opening a file for reading might either successfully return a file-descriptor or fail in a variety of ways e.g. non-existence, no permission, interrupted by a signal, symbolic link expansion failed, too many file descriptors open, etc. So the successful main path has a very simple interface but the path we are typically not interested in has a very complicated interface.

# With exceptions: return type is suitable only for main case.
openFile : String -> FileDescriptor

# Without exceptions: return type is general but messy.
datatype OpenFile = 
    Success of FileDescriptor | 
    DoesNotExist of { path : String } | 
    NoPermission of { cause : String, path : String } | etcetc ;
openPathName : String -> OpenFile

This complexity in the return type causes further complexity when it comes to invoking the method.

# With exceptions: main case is simple.
val fd = openPathName( path );
... do main processing ...

# Without exceptions: main case is entangled with failure cases.
switch openPathName( path )
case Success( fd ): 
    ...do main processing...
default: 
    error( "Could not open file", "path-name" => path )
endswitch

This simplification if the only case that needs to be considered is the main success and because deviation from that is immediately detected. Should we need to accommodate the error cases then the catch-code is precisely as cumbersome as the switch code.

# With exception: the general case is no simpler.
try {
    val fd = openPathName( path );
    ... do main processing ...
} catch ( e : IOException ) {
    ... do failure processing ...
}

It should be noted that there are halfway houses, such as returning two or more results, one result per case. In languages where second and further results can be silently ignored (in effect they are optional) this can be just as neat as using exceptions in the main case and still handle the general case.

# Without exceptions but multiple "optional" results.
openFile: String -> ( FileDescriptor, IOException of { cause : String, pathname : String } );

# Main case, ignoring the error value.
val fd = openFile( path );
... do main processing ...

# General case.
val ( fd, err ) = openFile( path );
if ( err.isntAbsent ) {
    error( "Could not open file", "path-name" => path )
}
... do main processing ...

The convenience of this approach depends on the concept of an optional return value, which is not a widespread or especially desirable language trait. It also has the subtle downside that ignoring the error value can lead to delayed error detection. Experienced programmers would be likely to defensively code up the general case, sacrificing the neatness benefit entirely. The convenience in writing more compact code would not balance the risk of silent errors.

[2.3] To Redisplay a Error Condition as a Human-Friendly Message

Inspecting a variety of programmers code, the most common reason for catching an exception appears to be to provide a more human-friendly description of something that has gone wrong. This usually means reinterpreting a low level complaint as a fault that the user could understand and act on.

A good example would be to report a failure to open a configuration file as "Could not open configuration file ${NAME}. Check file is present and that it is readable by user ${USER} in group ${GROUP}." 

Such an error message might also be displayed somewhere more natural for the application, such as a message dialog box or a log file/daemon. 

[2.4] To Ensure Continuation past an Error Condition

GUI applications need to intercept failures and return control to the user. Furthermore, when control is returned to the user, the application must be in a clean state and the underlying virtual machine (interpreter) must be free of corruption. 

In practice the difficult part is ensuring that the runtime system is in a good state and object invariances are respected. It is especially difficult because IDEs give programmers only limited assistance in :-

  • Identifying sources of potential corruption - An example of this is that Java offers checked exceptions that are utilised by Eclipse. This is very limited because Java also supports unchecked exceptions.
  • State isolation - try-catch blocks isolate a dynamic extent - namely part of the application state that is captured by the call-stack. This includes the nesting of functions calls, the values of local variables in the dynamic extent, and any further nesting of try-catch blocks. It excludes open resources, global variables, objects in the heap and the updateable captured variables of closures or coroutines. Some programming languages also provide a means of localising state changes to the dynamic extent (e.g. C# provides "using" and Pop11 provides "dlocal".)
  • State repair - The main repair mechanism is the deletion of the potentially corrupt dynamic extent by the cutting of the call-stack back to the try-catch block. In addition, most programming languages provide a means of intercepting a propagating exception (e.g. "finally" in Java) which may be used to manually fix up the state; they also provide object finalisation that gives the programmer the opportunity to release resources when the object is confirmed as unreachable by the garbage collector.

In the special case of functional programming languages, this is a reasonable strategy. Functional languages do not permit direct access to operating system resources, global variables to be updated, heap objects or captured variables to be mutated. Hence the deletion of the dynamic extent is a satisfactory form of error recovery and it is straightforward to implement this accurately as the low-level mutations that do occur are highly controlled.

In imperative programming languages, this is a very unsatisfactory situation. In the event of an exception, it is typically an obligation on the programmer to reliably return all allocated resources that are no longer usable. In most cases, it is sufficient to ensure that the resource is localised to the dynamic extent of the exception. However, if an open resource needs to be passed out of the dynamic extent (e.g. as part of a filing-systm backed object) then dynamic localisation is useless.

In Java, for example, the programmer may handle this by means of a catchall.

try {
    Reader reader = new FileReader( file );
    ...
} catch ( Throwable t ) { // not finally.
    reader.close();
}

This typically works. It depends on the object that holds the resource open, the java.io.Reader in the example, having a close method to deallocate the resource. It will go wrong when extending an interface (e.g. List) with a class that holds onto an open resource (e.g. List of lines from a file); the interface has no expectation that the LinesFromFileList needs special resource management. 

What is required to make this work properly is the ability to state that an instance has a teardown action associated with the failure of the current catch extent, where 'current' refers to the catch context in force at the time of instance creation. The closest today's programming languages get to this is the finalisation action of an object. However, object finalisation is not reliable in many programming languages because of defects in the runtime machine e.g. the JVM will halt because of a failure to allocate a new resource without first determining if there are unreachable resources that might be reclaimed.

A second problem for programmers in imperative programming languages is posed by updating global variables. Global variables are guaranteed to survive the deletion of the catch context, so it is likely to be important that their content is repaired. This worry has been ameliorated by widespread improvements in programming practice, so that it is rare to see an update of a global variable. When it is necessary to update a global variable, the programmer can make use of a try-catch block to repair the value.

MyClass old_value = MyApp.globalVariable;
try {
    MyApp.globalVariable = new_value;
    ....
} catch ( Throwable t ) {
    MyApp.globalVariable = old_value;
}

This 'solution' suffers from exactly the same issues as the previous open-resource challenge. In practice it is more tolerable since the use of global variables is increasingly unusual. However, to solve this design issue properly requires associating an UnDo action with the current catch context, where 'current' refers to the catch context in force at the time of the assignment. On success of the catch-context the UnDo actions can be discarded.

The third problem is especially problematic for programmers, which is the mutation of heap objects that are left in an inconsistent state. As an exception punches its way out through call-frames, it can interrupt methods that are in the middle of updating logically related fields of a heap object. The effect is to leave heap objects in a corrupted state.

Although modern programming practice has rejected the widespread use of global variable updates, it embraces object mutation strongly. Popular standard libraries (e.g. C++, Java, C#) provide mutable collections and omit Lisp-like declarative lists. This presents the conscientious programmer with a dilemma: should they sprinkle try/catch blocks liberally throughout their code, micro-managing a huge percentage of method calls, or should they adopt a strategic approach of targeting those objects that might still be reachable after the deletion of the catch context?

The sheer clumsiness of the try-catch syntax typically tips the balance. To litter the code with try-catch blocks would render it close to unreadable and decrease its maintainability. That in turn would be the indirect cause of a new generation of programming mistakes. 

But there is nothing that identifies the fact that code might run in a catch context. So this means the conscientious programmer must take another decision. Should they write all their code under the shadow of doubt, that it might be run in a catch-context and need to repair? Or should they target only those regions that may be run in a catch context? Again, experience shows that programmers overwhelmingly avoid the liberal use of try-catch blocks.

The effect of these tactical decisions is to make code exceedingly fragile, meaning that attempts to modify the code subsequently are made riskier because of the possibility of introducing new catch contexts or failing to accommodate existing catch contexts.

In fact there is not a great deal of difference between object mutation and global variable update. Both suffer from the same issue as resource management - that state changes can be concealed and survive the deletion of the catch context. On the positive side, if mutation is performed on objects only referenced from the dynamic extent, which is typical for new objects, then no repair action is necessary. 

And the best fix would be for an UnDo action to be associated with the catch-context in force at the time of  the update. This is essentially the same fix as for global variable update.

The fourth case is the problem of closure-captured variables (or equivalently, coroutines). If a closure updates a captured variable, this is  effectively identical to the update of a field of a heap object. It is only considered as a distinct class here because (1) recognising it requires a slightly different viewpoint and (2) no repair action is possible. Closures are relatively opaque objects and their state is typically inaccessible.

One way around this is for programmers to design their mutable closures so that they are intact at the point of exit. But this is not a adequate solution as a closure may itself call another function and the exception may interrupt its internal working and it may be left with completely corrupt state. As a result, closures that self-mutate must be written with an exacting eye for detail. It is easy to see that the programming productivity of a closure is heavily compromised.

An alternative approach is for the conscientious programmer to eschew the mutation of captured variables. Unfortunately none of the popular IDEs or programming languages offer help. As a consequence of this and many other practical issues, conscientious programmers tend to adopt defensive styles that can be easily policed and, if followed, will prevent or highlight the problem. For example, in Java the programmer may elect to use the "final" modifier on local variables wherever possible - and the absence of a "final" modifier is a powerful signal that a variable is stateful.

It is a matter for genuine public concern that writing non-stop applications is so challenging. Speaking personally, I have no doubt that it is impractical for software developers to meet this challenge at reasonable cost - or even to get close. Until this issue is properly solved, buggy applications, defective servers and slowly corrupting operating systems will remain an uncomfortable and sometimes frightening fact of life. In other sections, we discuss a potential solution - integrating transactional execution into virtual machines.

[2.5] Short-Circuit Non-local Return

One final use of exceptions is generally frowned on, as a means of short-circuiting the return from recursively nested function calls. However the technique persists for the simple reason that the alternatives are obscure, clumsy and/or grossly inefficient.

void findGoodTree() {
    if ( this.isGood() ) throw new Answer( this );
    this.leftTree().findGoodTree();
    this.rightTree().findGoodTree();
}

Tree findGoodTreeInBinaryTree() {
    try {
        this.findGoodTree();
    } catch ( Answer ans ) {
        return ans.getTree();
    }
    return null;
}

Programmers are encouraged to find a more elegant solution. In the above case this could be handled by representing a failure by null and explicitly testing for null.

Tree findGoodTreeInBinaryTree() {
    if ( this.isGood() ) {
        return this;
    } else {
        Tree lhs = this.leftTree().findGoodTreeInBinaryTree();
        if ( lhs != null ) {
            return lhs;
        } else {
            return this.rightTree().findGoodTreeInBinaryTree();
        }
    }
}

In a simple situation such as this, the rewrite does not seem particularly problematic. But as soon as the recursion becomes more complicated or the body of the method involves loop and conditionals, it can quickly seem that the early exit is making the code hard to maintain.

The root issue seems to be that the early-exit is short-circuiting the complete search. A more elegant but decidedly inefficient solution might look like this:

// We assume some imaginary subclasses that implement List. SingletonList builds an immutable
// List with only one item in it. Lists creates an immutable list from a collection of lists
// without expanding the sublists.
List<Tree> findAllGoodTreeInBinaryTree() {
    if ( this.isGood() ) {
        return new SingletonList<Tree>( this );
    } else {
        return(
            new Lists(
                this.leftTree().findAllGoodTreeInBinaryTree(),
                this.rightTree().findAllGoodTreeInBinaryTree()
            )
        );
    }
}

Tree findGoodTreeInBinaryTree() {
    List<Tree> trees = this.findAllGoodTreeInBinaryTree();
    return trees.isEmpty() ? null : trees.get( 0 );
}

There are two ways to transform this inefficient solution into a efficient one. The first is to introduce explicit computational deferral. In the following example the "wide" brackets [| … |] are used to explicit mark computations as deferred. This language extension requires low-level support in the virtual machine but is amenable to simple but deep compile-time optimisations making it unexpectedly efficient.

List<Tree> findAllGoodTreeInBinaryTree() {
    if ( this.isGood() ) {
        return new SingletonList<Tree>( this );
    } else {
        return(
            new AppendedLists(
                this.leftTree().findAllGoodTreeInBinaryTree(),
                [| this.rightTree().findAllGoodTreeInBinaryTree() |]
            )
        );
    }
}

The second is to introduce coroutines into the language; this is the route taken by C# when it recently added iterators. Coroutines avoid the need for working with collections of solutions at the expense of programmer directed suspensions. 

In the following Java-like pseudo-code, we imagine an coroutine extension to Java in terms of a class Coroutine< InputChannelType, OutputChannelType > that implements Iterable< OutputChannelType >. This example is rather bulky but will generalise especially nicely.

// Input channel not used in this example, set to (imaginary) Null type.
class AllGoodTreeInBinaryTree extends Coroutine< Null, Tree > {
    void scan( Tree t ) {
        if ( this.isGood() ) {
            // Yield is a built-in method that freezes the dynamic extent back to the
            // active call of the initialiser as applied to this. The argument to
            // yield is sent to the output channel.
            this.yield( t );
        }
        this.scan( t.leftTree() );
        this.scan( r.rightTree() );

    }
 
    // Initialiser is an entry point for the coroutine. When the initialiser yields the
    // constructor will return with the dynamic extent frozen in. Subsequent calls to
    // resume will cause the dynamic extent to be unpacked and restarted from where it
    // left off. Arguments to resume would be sent to the input channel.
    AllGoodTreeInBinaryTree( Tree t ) {
        this.scan( t );
    }
}

// This code takes advantage of the fact that Coroutine implements Iterable.
Tree findGoodTreeInBinaryTree() {
    for ( Tree t : new AllGoodTreeInBinaryTree( this ) ) {
        return t;
    }
    return null;
}

// This is an alternative version that shows how to drive the coroutine explicitly.

Tree findGoodTreeInBinaryTree() {
    AllGoodTreeInBinaryTree all = new AllGoodTreeInBinaryTree( this );
    OutputChannel< Tree > out = all.outputChannel();
    return out.isReady() ? out.get() : null;
}

[3] Partial Methods/Functions

The need for exceptions arises because methods cannot always complete successfully, either because of their actual parameters (e.g. divide by zero) or because of the runtime context (e.g. device not connected). In this section we focus on the four practical ways in which programmers handle such partial functions: undefined behaviour, totalising, exceptions and policies.

[3.1] Undefined Behaviour

We can simply refuse to fully define the action of a method beyond its intended input range or "source", following James Anderson's terminology. The benefit of this approach is that it reserves the programmer a great deal of freedom, which can be exploited for runtime performance or development productivity. The disadvantage is, of course, that it creates the potential for the program to enter a region of uncontrolled behaviour and perhaps cause significant damage.

Additionally, there are shades of "undefined behaviour". Perhaps it is only guaranteed that the function completes but the results are unpredictable, undefined values. Perhaps the results are meaningful but underconstrained. A good example is provided by the "%" function in the C language, which does not specify the sign of the result when the inputs are negative.

In general, leaving the consequences of a mistake wide-open is a tactic that is better suited for specification than implementation. The conscientious programmer would never permit their programs to enter a region of undefined behaviour, nor would they rely on implementation-defined behaviour except where that restriction was explicitly sanctioned by the lead architect and documented for all subsequent workers.

So how must the conscientious programmer work with functions that have unacceptable consequences for mistakes - or "knitting with barded wire"? They must arrange for the methods pre-condition to be satisfied on entry. Logically speaking, associated with every partial method is a guard. The following psuedo-code assumes that "this.mymethod_vanguard( inputs… )" will fetch and run the guard for the call "this.mymethod( inputs… )".

Chaining multiple methods "foo", "bar", "gort" together looks something like this:

assert( this.foo_vanguard( arg1, arg2 ) );
Result arg3 = this.foo( arg1, arg2 );
assert( this.bar_vanguard( arg1, arg2, arg3 ) );
Result arg4 = this.bar( arg1, arg2, arg3 );
assert( this.gort_vanguard( arg1, arg2, arg3 ) );
Result arg5 = this.gort( arg1, arg2, arg3, arg4 );

It is typically the case that many of the asserts can be eliminated because the post-condition of the previous call sets up the pre-condition of the next call. Hence in practice this code might be streamlined and look more like this.

assert( this.foo_vanguard( arg1, arg2 ) );
Result arg3 = this.foo( arg1, arg2 );
Result arg4 = this.bar( arg1, arg2, arg3 );
Result arg5 = this.gort( arg1, arg2, arg3, arg4 );

However, experience has shown that eliminating the internal asserts is a mistake. Firstly, programmers have a very high probability of making mistakes when chaining partial functions. On the basis of code reviews that I have conducted over the years I estimate the error rate commonly ranges as high as 30% (of uses of partial functions) and never sinks below a base rate of 2%. The conscientious programmer would therefore retain all the internal asserts, which are optionally compiled, but make the initial check mandatory.

if ( ! this.foo_vanguard( arg1, arg2 ) ) throw new IllegalArgumentException();
Result arg3 = this.foo( arg1, arg2 );
assert( this.bar_vanguard( arg1, arg2, arg3 ) );
Result arg4 = this.bar( arg1, arg2, arg3 );
assert( this.gort_vanguard( arg1, arg2, arg3 ) );
Result arg5 = this.gort( arg1, arg2, arg3, arg4 );

This is not pleasing code to read and it comes as no surprise that modern programming languages and libraries have tended to eliminate the use of undefined behaviour.

[3.2] Totallisation

[3.2.1] Motivation

Another popular approach to working with partial functions is to force the function to be total by making the returned values some kind of union of 'success' and 'failure' values. A good example of this is java.lang.String.indexOf, which returns the index of a search when successful and -1 when unsuccessful. Another somewhat weaker example is given by java.lang.Map.remove, which returns the value removed if successful at removing a value and otherwise returns null, as the 'failure' value overlaps with the 'success' value.

The downside is that the return values have to be processed using a relatively clumsy conditional switch. Furthermore, what constitutes a failure value is not consistent between functions and so the fact that the code is dealing with error conditions may be obscure.  And often the code has no good continuation and simply throws an error - so all that has happened is that the obligation to deal with an error condition has been passed upwards with the inevitable code-duplication as a result.

Composing code of this kind is much the same as for undefined behaviour except that the test is a post-condition rearguard. 

Result arg3 = this.mymethod( arg1, arg2 );
assert( this.mymethod_rearguard( arg3 ) ); // throws exception unless satisfied.

And the conscientious programmer applies the same logic of human fallibility to their code and ends up with.

Result arg3 = this.foo( arg1, arg2 );
this.foo_rearguard( arg3 );
Result arg4 = this.bar( arg1, arg2, arg3 );
assert( this.bar_rearguard( arg4 ) );
Result arg5 = this.gort( arg1, arg2, arg3, arg4 );
assert( this.gort_rearguard( arg5 ) );

And just as for undefined behaviour, this code is so visually unpleasing that many programmers will seek to eliminate any intermediate rearguard checks that can be proved redundant. Again, this creates code that is prone to break when modified in team environments, which is the norm for today's commercial work. I call this weakness in the code "change- brittleness" or simply "brittleness". Programmers will recognise the surprisingly common scenario where code has become so brittle that the team is scared to attempt modifications.

There are four common and important tactics used for totalising returns: 

  • Returning a reserved value of the normal return type T that is otherwise unused.
  • Widening the return type to allow the inclusion of error values: Option< T >
  • Widening the return type to be a list: List< T >
  • Returning both the normal value and a status code.

Each of these is examined in turn.

[3.2.2] Repurposing Otherwise Unused Values

Repurposing an otherwise unused value is probably the most common solution. However, it has the disadvantage that it hides the error value in amongst normal returns and so the type checker cannot offer much support. In addition, there is an ad hoc element to this solution, meaning that programmers and reviewers have to figure out the right value for each function. A couple of widespread defaults ease the ad hoc element.

  • The reserved value of a reference type should be null
  • The reserved value of an integer type of any width (>1) should be -1

A more subtle problem with reserved values is that they make the code brittle. A good example here is java.util.Map.get( T key ) which returns the value in a map associated with a key. If the key is not in the map, however, the value null is returned. Programmers reading the manual would have felt justified in interpreting a null return as a missing entry. But when the language was changed to permit map-entries with null values, this interpretation and all the code written depending on it was broken.

[3.2.3] Widening the Return Type to a Disjoint Union of Success and Failure Types

A much more direct solution is to promote the return type into a union-type. And it is possible to capture this generically.

abstract class Option< T > {
    abstract boolean isOK();
    abstract T getSuccess();
}

class OK< T > extends Option< T > {
    T value;
    Success( T x ) { this.value = x; }
    T getSuccess() { return this.value; }
    boolean isOK() { return true; }
}

class Fail< T > extends Option< T > {
    String message;
    Fail( String s ) { this.message = s; }
    String getMessage() { return this.message; }
    T getSuccess() { throw new RuntimeException( this.message ); }
    boolean isOK() { return false; }
}

The composition of partial functions using this approach is fairly neat. This is the composition when we are ignoring the failure result.

Result arg3 = this.foo( arg1, arg2 ).getSuccess();
Result arg4 = this.bar( arg1, arg2, arg3 ).getSuccess();
Result arg5 = this.gort( arg1, arg2, arg3, arg4 ).getSuccess();

Unfortunately there is a serious disadvantage, which is that returning a successful result entails allocating an extra heap record. The cost of this in the majority of programming languages is also a factor that discourages programmers. Of course in some implementations the compiler is smart enough to avoid allocating the store; functional language implementations typically will avoid the overhead and so can the best Java compilers.

[3.2.4] Widening the Return Type to be a List of Success Types

Another widening that is occasionally very natural is to widen a return value R to List< R >. A candidate would be a method like java.lang.String.indexOf. A natural generalisation of this method would be one that returns all the possible indexes rather than just the first.

    List< Integer > String.findAllIndexesOf( char ch ) 

An immediate problem is that this might find many solutions, which is wasted work. To keep that under control we might pass in a limit for the number of solutions, which is simple but a little ad hoc. The alternative is to return a dynamic list, or use lazy evaluation, which eliminates the ad hoc quality at the expense of retaining a little more store.

This kind of widening is quite elegant in the sense that the success case represents the set/list of all solutions and the failure case is the empty set/list so that there's no sharp division between the two outcomes. Composing these functions depends a great deal on whether or not an error should be generated on failure or simply no action. The following version of the running example shows both.

Result arg3 = this.foo( arg1, arg2 ).get( 0 ); // Throw exception.
for ( Result arg4 : this.bar( arg1, arg2, arg3 ) ) { // Silent.
    for ( Result arg5 : this.gort( arg1, arg2, arg3, arg4 ) ) {}
}

Programmers are confused by the way in which widens might interact. Here is a typical question from a programmer on Stack Overflow in which the uses of null and the empty list have become confusible to them.

is it better to return null or empty collection

that's kind off a general question (but I'm using C#), what's the best way (best practice), do you return null or empty collection for a method that has a collection as a return type ?

As phrased, has no absolute answer. It is necessary to clarify whether or not the empty/null result is supposed to represent a failure case and, if so, whether or not that there is additionally a successful empty result that has to additionally be represented.

This approach to totalisation is a very neat generalisation. Unfortunately it is only available in a few situations and has performance drawbacks. 

[3.2.5] Widening the Return to be a Pair of Success and Failure Types

The last common tactic for totalisation is to return both failure and success values. In languages that support optional multiple values this may be quite an attractive technique. However, it immediately raises the question of what failure value should be returned in the event of success and, vice versa, what success value should be returned in the event of failure. We can call these the redundant results.

It is usually straightforward to design a good solution for the redundant failure. A typical answer is to return a null value in the case of success. The other way round may be a good deal trickier.

There are two common approaches - to leave the redundant success value as an undefined value. This is an ugly prospect as the conscientious programmer is obliged to nullify the variables after every call to prevent store leakage or corrupt destructors running. The second is to return a defined 'default' value that may or may not overlap with a valid return value. This tactic may be satisfactory if there is no overlap between redundant and non-redundant returns. If there is an overlap it is most likely that the programmer must carefully code around the cases.

The composition in this situation will typically look like this.

Result arg3, Error err3 = this.foo( arg1, arg2 );
if ( err3 != null ) throw new RuntimeException( err3.getMessage() );
Result arg4, Error err4 = this.bar( arg1, arg2, arg3 );
assert( err4 == null );
Result arg5, Error err5 = this.gort( arg1, arg2, arg3, arg4 );
assert( err5 == null );

This is unpleasant so programmers will be under pressure to inline and simplify the intermediate code, relying on ad hoc dynamic type checks. If programmers go down this route the code will quickly become brittle.

[3.2.6] Summary of Totalisation

Although there are a variety of techniques for making partial functions total, no technique can be uniformly applied without some unattractive drawback. Furthermore, there is a tendency to opportunistically re-purpose values to represent failures lead to ad hoc and unmemorable interfaces.

This sets the scene for handling partial functions with exceptions. This feature promises a uniform mechanism for errors, where ignoring errors is fail-fast and that has an efficient runtime implementation.

[3.3] Exceptions

Exceptions are now an established feature of modern programming languages and Java is an excellent, feature-rich example. But despite considerable evolution exceptions have become difficult to master and include advanced features such as exception hierarchies, checked exceptions, stack-trace captures, finally-clauses, using-statements (in C#) and nested rethrown exceptions. 

My own experience of reviewing code suggests that ordinary working developers are deeply confused about the proper use of exceptions and their misunderstandings translate into fragility and brittleness (input-sensitive and change-sensitive designs). Expert commentaries on the topic lead to indeterminate conclusions (e.g. http://tutorials.jenkov.com/java-exception-handling/checked-or-unchecked-exceptions.html), suggesting that this plethora of features miss the target in some hard-to-define way. 

In this section, I intend to pinpoint the key weaknesses of the general exception handling mechanism, exposing it as an unfit mechanism for Object Oriented Programming. This paves the way for a new mechanism that we propose for Ginger. 

The basic mechanism is triggered by a "throw" or "raise" of an exception value, followed by an unwinding of the call-stack (or dynamic extent) back to the a matching "catch" scope. If not match is found then the exception unwinds all the way to the start and the application exits.

Using exceptions has two powerful advantages. Firstly, composing functions while ignoring the failures is both neat and fail-fast. Secondly, it is a uniform mechanism that does not require a detailed understanding of ad hoc, re-purposed return values.

Here is the code using exceptions. It is identical to the code that would be used with total functions. 

Result arg3 = this.foo( arg1, arg2 );
Result arg4 = this.bar( arg1, arg2, arg3 );
Result arg5 = this.gort( arg1, arg2, arg3, arg4 );

If we need to intercept the failures and attempt reprocessing, the syntax quickly becomes clumsier. Each method foo, bar and gort potentially have their own failure mode represented by exception types FooException, BarException and GortException. And because a code region might generate many types of exception we have to pick these out precisely. This is exacerbated in Java by the fact that the try-catch construct is always a statement and never an expression. 

// Running example with failures trapped - Java syntax.
Result arg3;
try {
    arg3 = this.foo( arg1, arg2 );
} catch ( FooException e3 ) {
    arg3 = e3.result();
}
Result arg4;
try {
   arg4 = this.bar( arg1, arg2, e3.result() );
} catch ( BarException e4 ) {
   arg4 = e4.getResult();
}
Result arg5;
try {
   arg5 = this.gort( arg1, arg2, arg3, arg4 );
} catch ( GortException e5 ) {
   arg5 = e5.result();
}

In writing the above code there is a vital assumption, which is that at the point of the exception handling the runtime state is sound. This is a use of exceptions as an alternative return from the code. 

But what assurance do we have that the runtime state is correct? The unwinding of the call stack does guarantee the disposal of local variables that might be left in a confused state. However, it does not undo any persistent side-effects, such as updates to global variables or updates to reachable heap store, or close any resources that might become unreachable due to the unwinding. And there are more subtle effects such as the heap allocation of "lexically captured" local variables, which can lead to persistent side effects.

The answer is that we rely very largely on the programmer's ability to repair the runtime state on a call-by-call basis. That is to say every single function call in an application must be considered as a source of runtime corruption. This is potentially an immense challenge in anything other than small examples. In reality, programmers will adopt defensive styles to protect themselves against this unreasonable load.

A good example of defensive practice is the adoption of immutable objects. Once constructed, immutable objects are incorruptible and so no further consideration needs to be applied. Another tactic available in C# is to tie all resources to "using" statements, which forces their recovery on failure. Less elegant is the use of finally clauses, which also support guaranteed recovery.

These tactics arise because of the requirement that an object is properly finalised, by which we mean that it runs through a valid "life-cycle". The concept of life-cycle is central to object-oriented programs. Objects are created, initialised, configured, run through a sequence of changes and finally retired. Failure to adhere to a life-cycle puts the integrity of the objects and further risks resource leaks. The propagation of exceptions plainly in conflict with life-cycle completion.

How can the working programmer ensure that all "live" (or "reachable") objects are operating within their life-cycle restrictions? The answer for today's programmers is grim: either simplify the life-cycle to a bare minimum or they must assiduously manage the consequences of every possible propagating exception. In everyday commercial work, neither option is viable and the outcome is that large applications are always, without exception, full of subtle defects.

[3.4] Dynamic Resolution: Policies

These options for handling exceptions are not mutually exclusive. As a result there can be pressure to produce many variants of a single method with different actions on failure. This has the potential to create a good deal of confusion. One way to tame this proliferation of variants is to parameterise the exceptional action with a "Policy".

A good example would be provided by the java.util.Map.get( T key ) method, which is defined to return null if the key is not in the map. This often leads to rather tedious in-line code.

int n = 0;
{
    Integer tmp = this.map.get( key );
    if ( tmp != null ) n = tmp;
}

It would be a good deal more elegant if the map could be assigned a Policy to be applied if the entry was not found.

class DefaultValuePolicy< Key, Value > extends Policy {
    Value def;
    DefaultValuePolicy( Value t ) { def = t; }
    
    Value onFailure( Key key ) { return this.def; }
}

this.map.setPolicy( new DefaultValuePolicy( 0 ) );
...
int n = this.map.get( key );

If it was the case that the key is required to be in the map then the policy could be set to raise an exception.

class ExceptionPolicy extends Policy {
    Value onFailure( Key key ) { throw new RuntimeException( "No value for this key in map: " + key.toString() ); }
}

[4] Explanatory Framework for Exceptions

To motivate our proposal for Ginger, we put forward a model of programming with exceptions. Throughout the previous sections we have implicitly appealed to this model - now we put the different parts together.

The primary role of exceptions is to ensure that an application only executes while its runtime state is believed to be sound. Exceptions are either used to abort execution, to prevent execution when the runtime state might be corrupt, or handled in order to force repair and continuation from a known safe point.

The secondary role is to provide situational information about the cause of the exception, in order to assist with the operators progressing beyond the problem. 

A third and entirely distinct role of exceptions is to provide a "featherweight" idiom for returning the first value from a channel written to by a 1-shot process. It is our view that this is an inappropriate use of exceptions and does not need to be factored into the design of Ginger exceptions. It is, of course, an important programming idiom that needs to be supported adequately.

The primary role accounts for most of the features of exceptions. Firstly it accounts for why unhandled exceptions should bring the running process to an abrupt halt; the process is halted because the runtime state may be corrupt. Secondly, it explains why handled exceptions propagate outwards; it cuts away sections of the call-stack that may be corrupt and the programmer has no way of manipulating. Thirdly it explains why exceptions come in several types; programmers have to manually force the completion of object life-cycles and these may only be available in some instances.

It also accounts for the distinction between Checked and Unchecked exceptions. Checked exceptions are intended to be handled and hence the programmer is prompted in order to ensure the correct repair and forced continuation action.

The secondary role accounts for the message content of exceptions. These provide human-readable defaults. Furthermore, the segregation of exceptions into runtime types is also helpful for applications to perform message-enhancement, so that a low level message about (say) a failure to parse a file can be converted into a high level message about a malformed configuration file with "an error on or around line 150". 

Note that programmers often find themselves in a conflict between providing good error messages and efficiency. It is clearly wasteful to provide a detailed message and contextual information required for high quality reporting if the exception is going to be handled and the situation repaired. But there is no provision for programmers to anticipate the difference between handled and unhandled exceptions and the effect can be poor reporting.

[5] Proposal for Ginger Exceptions

[5.1] Overview of the Proposal

In Ginger we recognise that exceptions are used to represent two situations:

  • An alternative, "non-standard" return from a partial function. In this case the runtime state is known to be good on exit from a method. There needs to be a way to accept non-standard results and continue computation.
  • An attempt to rollback the state within a protected context, which we will call a transaction. At the point of the rollback there is no safe continuation and the state changes made since the beginning of the transaction should be reversed.

In this proposal we aim to provide a complete solution to these two uses. The proposal has multiple parts: non-standard returns, exceptions, exception decoration and emergencies. Note that we do not address the need for early-exit from nested code, which we see as better treated as a compiler optimisation issue.

[5.2] Non-Standard Returns

The first part of our proposal is to use the return statement (say) as a means of non-standard return. These are intended to be comparably efficient to a normal return and the programmer is obliged to ensure the state is sound. The return would be tagged, where tags are constants that are designed for fast matching. A normal exit would be treated as a return with a tag of OK. The Common syntax might look like this:

# The general tag set used for failures.
val FailureTags = newTagSet();
...
# Add some general purpose tags.
val IOFailure = FailureTags.newTag( 'IOFailure' );
val SecurityCheckFailure = FailureTags.newTag( 'SecurityCheckFailure' );
...
# Create a specific failure tag.
val NoFilePermission = FailureTags.newTag( 'NoFilePermission' );
# Assert that this new tag is an example of the more general tags. 
# You wouldn't have do to this unless you needed to assert classification
# relationships.
true -> NoFilePermission.isChild( IOFailure );
true -> NoFilePermission.isChild( SecurityCheckFailure );
...
# Use the newly created failure tag. 
return NoFilePermission( ... );

Returns might be handled using a try form, which would be a modified switch. Try forms would always be wrapped around a single function call and there would be no propagation of the return value.

try openFile( x ) 
case OK( fd ) then
    ...
case NoFilePermission( args... ) then
    ...
elsecase args... then  
    ...
endtry

A failure to match a tag within a try-form would throw an exception, aborting the current transaction and initiate the rollback process, described below. This would be implemented in terms of a low-level API e.g. it might include a special apply function tryApply.

tryApply( args..., f ) returns results.., tag

[5.3] Decorating a Non-Standard Return

Because of the possibility of a return being automatically converted into an exception, there has to be a mechanism for augmenting the exception that might be thrown. This is called exception decoration and the decoration syntax might look like this (ignoring issues of Internationalisation):

return NoFilePermission() 
with_culprits (
    "Filename" => fname,
    "File Ownership" => fname.owner,
    "File Group" => fname.group,
    "Actual Permission" => fname.permissions,
    "Effective user" => CurrentProcess.effectiveUser
);

Note that the contents of the with_culprits form are only evaluated if the return gets promoted into an exception. This solves the issue of computing costly error messages which might not actually be required.

Decorations are not permitted to be arbitrary values but are serialised. They are intended to provide explanations of the breakdown. 

[5.4] Exceptions, Transactions, Rollback and more Decorations

The second proposal is to handle exceptions within transaction-endtransaction blocks. These are not intended to be inexpensive but are intended to provide much greater support to the programmer for implementing safe continuation. Their twin purpose is to provide a means for safely forcing continuation after a fault and good quality reporting of the cause of a fault.

The programmer can elect to generate a fault directly by throwing a set of tagged values. As before we can decorate the cause with culprits, although these are always evaluated (since an exception is guaranteed to be thrown).

throw NoInternetConnection() 
with_culprits ( 
    "Hostname" => hostname, 
    "IP Address" => ip_address, 
    "Network Connection Used" => network 
);

Before any exception is thrown, the runtime system will inspect whether or not any enclosing transaction is prepared to handle it. If no such handler exists then an emergency arises. Emergencies are discussed below but, in summary, they pause the currently executing virtual machine and pass control to the supervising virtual machine.

A transaction is introduced using the transaction-endtransaction form as its syntax is modelled on the try-endtry form. In this imaginary example we introduce the idea of a decorating-rethrow.

transaction
    ...STATEMENTS...
case NoInternetConnection() then
    MessageBox.show( currentCulprits() )
case SecurityViolation() then
    rethrow with_culprits( "You do not have permission to access the configuration file" )
endtransaction

The statements introduced by the transaction are executed in a controlled environment. If an exception is thrown during the execution of the protected statements a transaction rollback is executed. The purpose of the rollback is to bring the runtime state back to the starting condition. 

How far is it possible to go? In general it is not possible to reverse changes made to other processes or physical devices. However, it easy to see that we can record all assignments to existing objects and use those records to undo modifications that have been allocated. Equally we could simply fork the process and use the child to hold onto the starting condition and allow execution to continue conditionally in the parent.

This rollback strategy is a good default approach. But for external resources and processes that have been modified during the transaction we also need to undertake repairs. In order to do this we need to mark slots of an object as external, typically meaning that they track the state of an external resource. As the external resource's state is not reverted by the rollback, neither should an external slot be reverted.

Modifying an external slot of an object marks the object as needing repair. Repair actions are scheduled immediately after the rollback. The order of the repair actions is important as dependencies may arise. For example an object that records a trail of activity might wish to reset the trail to the last-known-good location before closing the file descriptor. Repairs therefore have two phases: an initial phase that allows objects to declare their order dependencies (by marking a map) and an action phase. Repairs are then carried out in a sorted order.

[5.6] Emergencies

Ginger introduces the virtual machine as an API for multi-process programming. Each virtual machine is a completely isolated process and is typically mapped onto an operating system process. Because the virtual machines are isolated, the corruption of a virtual machine's runtime state is not contagious. 

Emergencies are caused by unhandled exceptions. The executing virtual machine is immediately halted and control, along with the decorations of the unhandled exception, are passed to a supervisory virtual machine. The supervisor may invoke a debugger, or perform a last-ditch recovery or simply halt. If it chooses to attempt to continue, it should close the halted virtual machine in order to return all resources to the operating system.

[5.5] Nothing Can Possibly Go Wrong <Click> Go Wrong <Click> …

Several opportunities for mischief remain. The first is that repairs may themselves fail and throw exceptions. This is inevitably an unfortunate situation and might correspond to a sequence of failures of external resources. Some failed repairs, such as an error on closing an operating resource are tolerable, at least temporarily. Other failed repairs such as attempting to reconnect to a lost database connection are intolerable. The rule is simple: repairs must be written with special care to trap all returns. Any exception raised during a repair will cause an emergency.

The second issue is that exceptions may attempt to smuggle corrupt objects out by embedding them in decorations. This is prevented by the requirement that decorations consist of serialisable objects. The objects are serialised at the point of execution immediately before the throw. The runtime state at that point is still known and hence it is possible to call the serialisation code safely. 

Exceptions thrown during the execution of a with_culprits expression will be trapped and substituted appropriately, so that it is clear that there was an additional error. 

The next issue is that repairs may declare a set of circular dependencies. Like any problem during repair, this is promptly escalated to an emergency.

Finally a transaction may throw an exception while it is executing the statements of a "case"-clause but there is nothing especially interesting about this. This is executed after the repair actions have been run and is treated as code that is not protected by the original try-form, exactly as in most programming languages.

[6] Summary

We began with the following use-cases for exceptions: 

  1. Fast, Non-Silent Failure
  2. Simplify Method's Main Case
  3. To Redisplay a Error Condition as a Human-Friendly Message
  4. To Ensure Continuation past an Error Condition
  5. Short-Circuit Non-local Return

Fast, non-silent failures are implemented by returns which, if unhandled, automatically convert into throws. 

Simplification of the main path of a method is achieved through the use of returns to represent alternative paths. The alternative paths are not themselves error conditions. Only through their deliberate neglect is an exception raised.

Redisplay of error conditions as a human-friendly message is effected by means of the with_culprits form. Anywhere an exception might be generated, it is possible to add explanations at the appropriate level of description.

Continuation past an error condition is assured by means of the transaction rollback and repair mechanisms, which greatly assist the programmer in returning the runtime system to a "last known good" state.

Short-circuit, non-local return is not supported in any form by this proposal - and this is a deliberate decision. Unifying short-circuit returns with the error handling process is a design error.