Self Homing on UNIX

What is Self-Homing

Applications typically consist of one or more executables and a collection of resources in a folder; this is the resource folder. Self-homing is the dynamic determination of the resource folder by the application's executables. The advantage of dynamic discovery are:

  • It is possible to move an application from place to place without the need for re-building the application or access to any development scripts/makefiles etc.
  • It is straightforward to host multiple versions of an application without conflict.
  • The development system can be tested without being installed in a set location.
  • The build system can be much simpler, with no need to propagate paths down into various make scripts - which is a big help when writing multi-language systems.

Why is it a problem?

Dynamic self-homing has been frowned on as a technique for UNIX systems over the years. As far as I can tell, the main justifications for this have been:

  • The UNIX file hierarchy is designed to support large networks and heterogeneous architectures. Part and parcel of this design is that executables are placed in their own folder paths, which can be localised to the relevant machines (for speed and architectural specialisation), and shared resources can be placed in shared folder paths (in order to save space).

This is seen to be in conflict with self-homing because self-homing typically depends on the executable's path, being one of the few attributes of a running process that relates to the static location of an executable. Because of this, it is inevitable that the self-homing takes away some freedom to independently move the executable binary and the resource folder.

Meta-data

A possible solution would be to utilise UNIX meta-data via the extended file attributes. The location of the resources folder could be added as user meta-data on the executable. Provided a process can reliably discover its own executable file, this would decouple the executable path from the resource folder. Of course this still leaves open the issue of reliably discovering the executable file name (see below).

However, extended file attributes is not standardised as yet. There is a different interface across different UNIXes.

  • OS X use xattr, seems to be a BSD built-in.
  • Linux use getfattr, setfattr
  • Python has a xattr package which works across Darwin & Linux. Note that os.listxattr and friends are still Linux only (as in Python 3.4).

I note that http://www.lesbonscomptes.com/pxattr/ is a portable command line program for working with extended file attributes.

Another issue with regard to the use of meta-data for determining the resource-home is that it is not necessarily dynamic. If the encoded pathname was absolute, then moving the application bundle will not work without further adjustment. So for fully dynamic discovery one needs to use relative pathnames.

Techniques for Reliably Discovering Executable Paths

Python

os.path.realpath(__file__)

Linux, any language

  • /proc/self/exe is a symlink to the executable

OS X, C

http://astojanov.wordpress.com/2011/11/16/mac-os-x-resolve-absolute-path-using-process-pid/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <libproc.h>

int main (int argc, char* argv[])
{
    pid_t pid; int ret;
    char pathbuf[PROC_PIDPATHINFO_MAXSIZE];

    if ( argc > 1 ) {
        pid = (pid_t) atoi(argv[1]);
        ret = proc_pidpath (pid, pathbuf, sizeof(pathbuf));
        if ( ret <= 0 ) {
            fprintf(stderr, "PID %d: proc_pidpath ();\n", pid);
            fprintf(stderr,    "    %s\n", strerror(errno));
        } else {
            printf("proc %d: %s\n", pid, pathbuf);
        }
    }

    return 0;
}

Java

import java.io.File;
 
class Home {
 
    File home() {
        try {
            return new File( this.getClass().getProtectionDomain().getCodeSource().getLocation().toURI() );
        } catch ( java.net.URISyntaxException e ) {
            throw new RuntimeException( e );
        }
    }
 
    File otherHome() {
        try {
            return new File( this.getClass().getClassLoader().getResource( "" ).toURI() );
        } catch ( java.net.URISyntaxException e ) {
            throw new RuntimeException( e );
        }        
    }
 
    public static final void main( String[] args ) {
        System.out.println( new Home().home() );
        System.out.println( new Home().otherHome() );
    }
 
}

Other Approaches

Of course, other operating systems have defined other ways for an application to dynamically discover its resources. These all depend on defining an application as a collection of resources that includes the executable and then passing the root folder into the executable as an argument.

  • On Windows applications can find an *.exe.config file which can describe the location of resources.
  • On OS X, applications can find their bundle folder, which acts as a resource folder.
  • In classic MacOS, files had both an executable part and a database-like resource fork.
  • In Java, applications can explicitly discover their resources in a platform independent way.

However, UNIX console executables do not have access to these. Sometimes environment variables are used - but the counter argument is that they are vulnerable to substitutions and hence less secure - whereas baked-in paths are impervious.