Open Letter On Nested Comments In Pop 11

Originally an open letter to Aaron Sloman in reply to his letter
with regard to a problem with nested-long-comments in Pop-11.

To: Aaron Sloman <ku.ca.mahb.sc|namolS.A#ku.ca.mahb.sc|namolS.A>
From: Stephen Leach <moc.dleifhctaw|evets#moc.dleifhctaw|evets>
Subject: Re: Should this mishap?
Cc: ku.ca.mahb.sc|ved-golpop#ku.ca.mahb.sc|ved-golpop
Bcc:
X-Attachments:

Hi Aaron,

I think this is a language design issue and probably better debated on
the 'dev' list rather than the 'forum' - so that's where I have decided
to reply. This is a fairly long justification; I think it merits the
analysis because it cuts across issues such as syntax colouring,
automated

I think the most useful starting place for debates about comments is
to acknowledge that the problems arise because we are forcing ourselves
to write programs in plain text. If we were to grant ourselves the
freedom to write in styled text, presenting different styles via
mixed fonts with mixed styles (bold, italic, underline, colours etc)
we could easily solve this design problem.

If we were writing programs with styles, we simply mark the comment
text as being in the "comment" style. The presentation of that style
would presumably be user-definable in our favourite editor. Such
a facility would additionally eliminate other design problems such as
the reserved word problem (i.e. how do you refer to a variable
whose name coincides with a reserved word such as "if" or "define")
by eliminating the need for reserved words.

Viewed from this position, both the end-of-line and long-comment
are crude devices, using explicit text markers to show the start and
end of a style region. Why crude? Because, as we all know, it is
always possible for the commented text to coincidentally include
the text marker with unintended results. Such a coincidence is more
or less guaranteed if the commented text is itself program text.

Programming language designers have had to tackle this problem in
depth with literal strings - and the consensus solution is to
provide a general escape mechanism. (Again, note that the styled
text solution is simple, general and complete: define a "literal
string style".) The difficulty in applying this same solution to
comments is that our top-level design goal for comments is to be
insert arbitrary text in comments without detailed inspection of the
text.

The compromise reached in the case of the end-of-line comment is
exceptionally clever. The trick is to require that the commented
text always includes an end-of-line which marks the end of the
commented text! In other words, there is no need for an explicit
end-of-comment marker because the comment-text defines its own
extent. It is this property that makes nesting end-of-line
comments work so seamlessly.

Of course, the end-of-line is also exceptionally visible. Breaking
arbitrary text at end-of-line boundaries has already been done
for us by plain text editors. As a result of this cunning design
choice, the end-of-line comment has proven bullet-proof.

But end-of-line comments are frequently felt to be insufficient.
Their start-comment marker (;;;) can be a nuisance when processing
the comment text. And a vertical strip of such markers can make
the commented text less readable.

"Long" comments are the alternative and language designers have
experimented with variants that do not nest (e.g. in C) and do
nest (e.g. in Haskell). Neither are remotely as successful as
the end-of-line comment.

Non-nesting long comments suffer from the ghastly flaw that any
program text that needs commenting requires detailed inspection
and then cumbersome quoting. To comment out '*/' (using non-nesting
long comment markers /* and */) requires crossword solving
skills:
/***//*/*/
This can turn the task of inserting program text into a comment
into a weird nightmare.

Nesting long comments are eminently more practical, only rarely
causing problems. But the usual designs suffer from a
awful defect. There are one-character texts that they cannot
comment out (i.e. '/' in Poplog). And that unfolds into the dire
consequence that commenting out text that has _unbalanced_
comment markers is impossible: this is the exact situation
you noted:

/*
'/foo/*.p'
*/

;;; MISHAP UNTERMINATED "/*" COMMENT

With non-nested long comments this would not have caused problems,
of course. On the other hand, this example of a Perl regular
expression would cause problems
/[a-z]*/
As noted above, there is a technically correct solution
/[a-z]**//*/
but it is so destructive to the readability of the text it is
worthless as a solution.

This inability of nested long comments to cope with all one-character
texts is not an intrinsic flaw. Assuming the use of 2-character
start & end brackets, this technical defect arises whenever
end(2) = start(1) and
end(1) = start(2)
In other words, when the markers are each other reverse. (N.B. Nested
long comments markers in OpenSpice lack this properly and hence do
not suffer this technical weakness.)

Despite these ugly considerations, the usefulness of long comments is
enough to keep them popping up in modern language designs. As a
consequence, two important practical workarounds have appeared.

The first "sticking plaster" is syntax colouring. If Ved supported
syntax 'colouring' then you would have immediately spotted the cause of
the problem rather than spending an hour or so. Although syntax
'colouring' has a bad name, turning perfectly elegant programs into
garish graffiti, there is nothing intrinsically wrong with the idea.

[Aside: the proper phrase should be syntax styling, of course. That's
what the best modern editors support.]

The second "sticking plaster" is so-called language sensitive editing.
In other words, the programmer edits their source code with commands
that treat the text as program, with context, and not plain text.

Ved has a few language sensitive commands. But it specifically lacks
a "ved_long_comment" command. One would hope that anyone who
implemented such a command would ensure that the quoted text was a
valid text. Had you elected to use such a command, you would have
been protected.

[Aside: I hope in vain - I tried the simple regular expression example
in the truly awesome IntelliJ IDEA development environment. IDEA has
a shortcut Ctrl-? to "Comment with Block Comment" which is its phrase
for Java's long comment. To my utter disgust, IDEA blithely commented
out the expression and then marked it as being in error. Now that's
stupid.]

You ask:
Should it be changed?

For strong reasons of backward compatibility, it probably should be
left alone. The relative benefits of non-nested long comments over
their nested brethren are marginal - and the disadvantages are quite
marked. So even though one could utilize compile_mode flags to
do a smooth transition, I think it would be a waste of time.

However, I think there are ways forward. In practice, nested comments
are used in a handful of highly stylized ways. There's the rare inline
case, the common case where indents match, the occasionally "fancy"
style, and a horrid style where they don't

… /* …… one line …… */ …

/*
… some lines …….
*/

/* … a fancy style ……
\
*/

/*/
… another fancy style ……
/
**/

/* ……………..
…… horrid …….
………………… */

This are actually quite strong patterns, utilizing only a tiny
fraction of the available freedom. So we add optional warnings
that alert programmers to deviations. e.g.

[0] Allow inline uses.

[1] Warn unless the start-marker is the _first_ thing on the line
and the end-marker is the _last_ thing on the line. (This rule
catches your example but misses the Perl regex.)

[2] Warn unless the markers are the only thing on the line (This
rule catches both examples but gets faked out fancy styles and horrid
styles.)

[3] Warn unless the start-marker and end-marker have matching
indent levels. (This rule catches both examples but is faked
out by the horrid style.)

This warning (or warnings) could be disabled by the use of a compile-mode
flag so that old code could be marked as OK but stylistically
out of date.

Maybe the above error message could be changed to include a hint.

I think that's essentially what my proposal amounts to.