Essential Developer Documentation

When you're writing software, it is possible to spend enormous amount of time and energy writing documents intended for other developers. That's a lot of effort for a very small group of people (unless they are actually really your customers as well) and it is bound to be the case that the documentation ages very fast. Over the years I have experimented a lot and seem to have found an approach that I like now, which emphasises lightweight documentation and pinpoints where heavier documentation is needed. For what it is worth here's a summary of how I approach it.

Firstly, we try to avoid writing documentation at all. So we should adopt Clean Code principles for naming variables (so that variable names explain their function in terms of the business domain). The idea is that well-chosen variable names can often make the motivation behind the code very straightforward, eliminating the need for any other documentation. It also has the great advantage that Refactoring can guarantee that any improvements in the choice of name are reliably substituted everywhere the name is used, eliminating one of the commonest source of errors. If practised well, this becomes all the documentation a developer needs at the method level.

Next, we use in-line documentation systems such as javadoc, doxygen, ndoc etc. I am not a huge fan of these systems but they certainly have their place. In the special case of writing library code very helpful indeed because the end-users are developers and this is an effective way of writing the user docs. To make this approach work more generally, the documentation should be automatically built and updated by the build/CI system and the usual 0 warnings policy should be implemented. I recommend using in-line docs for the class and package level but not at the method level. This is because method-by-method documentation is a great deal of work and the benefit is marginal. By this I simply mean that a function called (say) reversedStringCopy isn't improved by an explanation that it returns a copy of a string but with the characters in reverse order.

The third form of in-place documentation I recommend are Folder Summaries (e.g. SUMMARY.md). The basic idea is that you provide a readable summary of the folder contents, typically in markdown/kramdown, YAML or plain-text, describing the purpose of this folder and a brief summary of the files and folders found here. You don't create these dogmatically for every folder in your code-base. But it should provide an overview of the entire structure of the system. e.g. Here's a markdown version.

Summary: The source code for the project

Description


This folder contains both the Java source code and any static resources (images, text strings) that are intended to be packaged in the deployment jar file. Large static resources or resources that we might want to configure on a per-site basis should not go in here but in the ${PROJ_CODE_NAME}/resources folder.

Contents


  • Makefile - Provides a one-stop shop for project-related command in this folder.
  • makefile.p - The Python3 script that does the heavy lifting for the build.
  • cpp - Folder containing the C++ implementation files.
  • hpp - Folder containing the C++ header files.
  • _build - Folder in which the build products are deposited.

[ASIDE: These days I am increasingly inclined to provide a Makefile in every folder that acts as a collection of folder-local commands i.e. more of a one-file collection of scripts than as a build tool. Running 'make' lists all the targets (i.e. commands). So if I wanted to add the folder description in a more machine-processable format (e.g. JSON) I would arrange that "make summary" would show the summary in a readable format. In fact it might be a good idea to put the text in the Makefile.]

Now I also recommend having a (developer) wiki for each project. Unfortunately, there's no ideal wiki to the best of my knowledge and the situation is so bad that I am constantly on the verge of writing a 'better' wiki. But the advantage of instant, shared editing outweighs the disadvantages. An ideal wiki should have its back-end store exposed via git (or near offer) so that the files can be processed programmatically. Should the docs repo be shared with the code or not? My own view is that it should be in a different repository as the devs do not want to be repeatedly merging because the docs changed - GitHub puts the documents in a separate (bare) branch, which actually works too, although I find it a bit clumsy.

Whatever it is, the wiki should contain the following three documents as the base.

  • <My Project> Home - a single page describing where all the project resources can be found - the version control system, the release archive, the project roadmap etc etc etc. It should be headlined by the project vision, a short description of the purpose of the project.
  • Developer Startup for <My Project> - what a developer has to do to get a working development environment for the software. I would expect this to be a single wiki page.
  • Architecture of <My Project> - high level organisational view, dynamic and static perspectives, and evolutionary plan. Ranges from a sentence to many pages.

Of these three, only "Architecture" is likely to confuse; it is a much abused word. But as far as I am concerned, "architecture" is a description of the project in terms of process (or process-like concepts such as threads or coroutines), the communication channels between processes, data stores, and the encoding of data on the channels. In particular, it excludes the functions that the system carries out. Hence an architecture describes the structure for performing functions but not the functions themselves.

From this point on, documents become increasingly optional and should only be instantiated if there is good reason. The first group is concerned with how the software interacts with other systems:

  • External Specifications for <My Project> (Folder) - a wiki page that aggregates the specifications that the software implements, along with annotations on compatibility.
  • APIs for <My Project> - Wiki pages describing each supported API and associated resources e.g. code examples.
  • Document Formats for <My Project> (Folder) - if the project uses an non standard file formats they must be documented. One page to aggregate them and another to describe them.
  • The Database Schema(s) for <My Project> - database schemas should be self-documenting, with the schema details published as part of the build process. Docs.
  • Logging for <My Project> - The logging format and processes.

The second group of documents that relate to holistic aspects of the product. In each case these relate to cross-cutting concerns that need attention across the entire system and are often vulnerable to a single weak link. E.g. Here's a list of some of the most common concerns.

  • Performance Plan - one badly performing component can ruin product.
  • Security Plan - one insecure component makes whole product insecure.
  • Robustness Plan - one defective component makes the whole product crash.
  • Dependencies Plan - these become 'systems requirements' for the customer, which can be a show stopper.
  • And other cross-modules concerns such as caching, transactions, synchronisation, localisation can all require a plan/design document.

The Performance Plan (Speed, Space and maybe Cost): The logical work-cycle of the product for the main tasks. Where appropriate you identify the main computations (complexity), data flows (network costs), data stores (how much), and number of concurrent usages - which leads into identifying the performance and storage hotspots.

Security Plan: Review architecture from a security standpoint - the internal and external interfaces, what data is carried on each, the robustness of all the components that could be corrupted. This leads into listing the threats and counters.

Robustness Plan: It's about memory corruption, exceptions, data corruption, misconfiguration or any other problem that stops the software working. The plan is about identifying where the problems can come from (and what you do about minimising the consequences). If you are writing in C++ you know you have to manage memory correctly. If you're programming in C# you know you have to manage exceptions properly because they are pervasive (how do files get written properly and open resources closed properly?)

Dependencies Plan: the functions of the product that have to be supplied by external hardware or software e.g. operating systems, browsers, plugins, network, software library/executables and so on. This generates a list of system requirements for the customer.

System requirements determine the size of market and cost of ownership for customers. e.g. how to supply rich internet app: Flash, Sliverlight, JavaScript, Adobe Flex etc.