For Good Measure

Novel ideas for programming language design

by Colby Russell. 2017 February 16.

Short variable names prohibited by grammar

Naming things using a single letter is consistently identified as a bad practice, and is even acknowledged as such by those who admit to sometimes "slipping up" and doing it themselves. So why not solve this by eliminating single-letter names in the grammar altogether?

Many languages adopt a rule that says, roughly, "identifiers must start with a letter which can be followed by one or more letters and digits". (Some allow for special characters like _ and $, too.) Or, in EBNF:

ident = letter { letter | digit };

Initially, we might suggest changing the rule to "identifiers must start with a letter which must be followed by one or more letters, digits, or symbols", which means the minimum length for a valid identifier is 2. With two-letter identifiers, though, single-letter programmers will likely end up throwing in another consonant or tacking on an underscore, thereby satisfying the language's rules, but subverting their spirit. I think the tipping point is 3. With a minimum length of 3, the ridiculousness of trying to thwart the rules without actually increasing the readibilty of the code becomes apparent even to the stalwarts, which should result in few hold outs.

Considerations

Type-named objects

Consider the following snippet:

PROCEDURE PassFocus* (V: Viewer);
  VAR M: ControlMessage;
BEGIN
  M.id := defocus;
  FocusViewer.handle(M);
  FocusViewer := V;
END PassFocus;

(This is Oberon. It has flaws—annoying ones. Oberon is not my favorite language. I'm comfortable presenting the examples here in Oberon, however, because this snippet should be more or less understandable even to those who've never seen its syntax, and if I'm going to present any example, I'm going to do it in a dead language that no one really uses, so as not to play favorites and put undue focus on the one chosen.)

Note the use of the single-letter identifier V in the parameter list and the local variable M. Our V can be easily changed to viewer, and that would probably be the prescription in most code reviews where the initial naming would be seen as a problem. However, we're now running afoul of an awful lot of repetition, which is a frequent criticism of many languages with static type systems. It's often pointed out with classic Java for example that almost any time you do something, you end up repeating yourself, sometimes up to three times. E.g.:

FrobbedFoo frobbedFoo = new FrobbedFoo(bar);

This is why C#'s var keyword is seen as an improvement, and JVM languages have by now adopted similar constructs.

It's also said that naming things is one of the hardest things in CS. The line above raises other questions, too. For our frobbedFoo should we perhaps be giving the local variable another name that describes it as something else? We're obviously dealing with a FrobbedFoo, and it is redundant to refer to it as such, so should we prefer to name it after its purpose in this context, i.e., what its role is in the procedure, rather than what kind of thing it is?

With type-named objects, we answer this hand-wringing by acknowledging that in many cases, the type alone is sufficient—not merely sufficient for the machine, but for the human reader, too. In languages with support for type-named objects, we therefore need not always give an object an explicit name. Instead we unambiguously refer to it in the local context using its type.

For example, one approach to designing a language with type-named objects would be to disambiguate with keyword the. The example above becomes:

PROCEDURE PassFocus* (Viewer);
  VAR ControlMessage;
BEGIN
  (the ControlMessage).id := defocus;
  FocusViewer.handle(the ControlMessage);
  FocusViewer := the Viewer;
END PassFocus;

Compared to our single-letter identifiers in the preceding snippet, this results in more typing, but the programmer isn't pressed to stop and think of intermediate names to give to the two objects local to the procedure. This will allow for maintaining an uninterrupted train of thought, and despite the higher demand for "human IO", type-bound objects should be more productive and viewed as a programmer convenience.

Considerations

Inverted selectors

Many languages have a receiver.member selector syntax, to select slot member of receiver. This is used both to access fields of records/structs/objects and to reference functions or other procedures—i.e., methods. Here we discuss an "inverted" selector syntax, so that the receiver.member above can become member @ receiver. This on its own is probably no significant benefit, but consider it in the context of a subroutine, paired with language support for type-named objects:

PROCEDURE PassFocus* (Viewer);
  VAR ControlMessage;
BEGIN
  id @ the ControlMessage := defocus;
  FocusViewer.handle(the ControlMessage);
  FocusViewer := the Viewer;
END PassFocus;

This @-notation is generalizable. I've wondered before why I don't see many (any?) languages offer a "passive" form to refer to members.

If the culture of the language under discussion is one that involves an overall pursuit to avoid magic symbols (e.g., Python and Wirth languages like Pascal and Ada), then the keyword from might be used, viz.

id from the ControlMessage

Considerations

The from keyword, if not already present in the language grammar (for use in some other context), may be problematic—it's hard to add keywords to a language, because it can end up making code that worked in version n-1 suddenly invalid code (reserved word used as an identifier). Contrast this the suggestion regarding the for discriminating type-named objects—I expect use of the as an identifier in the wild to be rare. So in the case of from, a semantically similar word like of might be used in its place. Failing that then for, although it reads slightly awkwardly, wouldn't be a completely inappropriate choice, and it's likely to already be a reserved word. The language designers just need to be comfortable allowing it to appear in two constructs, each one in which it has a completely different meaning.