Author: Colby Russell (https://colbyrussell.com)
Date: 2021 February 10
An assembler
============
Our aim is to describe an "assembler" for the RSC object file format used in
Project Oberon 2013 <:http://projectoberon.com>. The Oberon system itself
includes no assembler--Wirth's compiler is single-pass, and it writes object
files directly. However, the system does include ORTool, a utility for
inspecting the binary object files. Adapting its output format, while less
than ideal for authoring purposes, would give us something that fits our needs
well enough and spares us the burden of designing our own input language.
As a concrete example, we want to be able to ingest some input like this,
which has been generated from the binary of a simple "hello, world"-style
program module:
Our assembler should be able to take input in textual format like the sample
text shown in Figure 1 and generate a bit-for-bit identical copy of the
original binary for which the ORTool-like utility produced the example output.
We say that we rely on an ORTool-*like* utility, because while the ORTool
output is almost rich enough for this purpose, i.e., containing enough
semantic content to allow us to do the job, there are still some abiguities
that would hamper our efforts. It is on this basis that the ORTool output
"language" has been adapted to incorporate the necessary changes to allow it
to be used as the input format for our assembly tool.
Specifically, to achieve this, we must take some liberties with the contents
of the strings section. To emphasize: these changes are crucial for the
roundtrippability between binary to text and back.
(NB: And although not strictly a necessary change, the "assembly" instructions
in the preceding sample do not match the exact format that ORTool uses, but it
should be reasonably straightforward to grasp for anyone already familiar with
the ORTool output; this is a change that has been made purely as a matter of
taste and convenience.)
What does the raw content of an RSC binary even look like? Here's a hex dump
of Hello.rsc corresponding to the binary described in the preceding input
language sample:
Such a binary was obtained by compiling the following module source using the
Oberon system compiler:
Let's give a detailed breakdown of the format in a way that exhibits a degree
of rigor and completeness that is not possible by merely inspecting a single
binary.
Formal description
==================
A brief aside: to accommodate a broad audience, we should describe the
algorithms here in a neutral and accessible notation. So what notation is
that, exactly? The classic answer to this problem would be "use ALGOL", but
if we chose that route for the purity of its thrust, we will have failed in
our goal--ALGOL isn't altogether used much even for academic purposes today
and likely to cause more friction than it's worth.
For that reason the example code will be presented in an ad hoc notation that
is expected to be comprehensible to any moderately experienced programmer--
although those most comfortable working with languages from the C syntactic
family tree will find bias in their favor, since we'll go with a curly brace
notation. Some will notice that our fictional language's syntax bears
resemblance to JS, and indeed widespread awareness and use of that
language--even if such use is unenthusiastic by disinterested parties under
duress--is a factor that has informed some of the choices made here.
This document attempts to eschew with exotic syntactic constructs that may
prove overly difficult to follow, considering its purpose is clarity;
semantics for the parts of the text appearing in the code blocks should be
self-evident, including for example the "include" declarations and the implied
linking model, and it should be possible to infer modules' file names on
sight--we waste no energy or space labeling them. The consequences of this
omission are diminished by the fact that blocks of code, even adjacent ones,
are distinguishable from one another because we choose to present them
bracketed by the HTML-like `script` tags (already used for the preceding
Oberon snippet), denoting individual modules; it's expected that this
presentation is simple and straightforward enough to lend itself to immediate
understanding even for readers who've never encountered the convention before.
From this point onwards, this document contains minimal commentary about the
structure of the object file--allowing the pseudocode speak for itself.
Consider the primary module `RSCListingReader` the heart of a hypothetical
assembler utility. Let's get right into it:
Presentation and compatibility
==============================
Suppose we copied the text of this document into a file, gave it a name with a
file extension typically associated with the HTML mimetype, and then opened
such a file in a web browser. What would be the expected result? If in that
page we also applied some CSS rules like the ones that follow, on examination
of the text here, it should be clear to readers with a good understanding of
CSS and the HTML5 parsing algorithm that this will result in the main text of
this document being displayed in the browser as plain text, in much the same
way that it appears in a text editor--with the content of the following code
block itself being the lone exception:
On the other hand, because of the particular way that the preceding block is
itself incorporated into this document, the process of copying the content
exactly as shown is sufficient to make sure those rules get applied. We can
do better than that, though. We can use the same trick to make sure that with
just a few more rules this section is displayed correctly in a web browser,
too. Viz:
What's more, if we had been conservative enough and careful in our choices
regarding the pseudocode notation in the modules presented here, then we could
achieve an effect much more useful than mere display equivalence. By taking
care to make sure the pseudocode can parse as syntactically valid when fed to
the browser's JS engine and that it's semantically equivalent when
evaluated--or at least close enough to "semantically equivalent"--then the
reader could simply outsource the task of creating an assembler. Rather than
a human using this document as a reference and implementing an assembler of
their own, this document itself can be used *directly* as the implementation.
In fact, we have been that careful, and that's been the plan all along.
Achieving the full effect only requires a few more pieces.
Implementing the system interface
---------------------------------
Using a bootstrap loader, such as the one that follows later, we can finesse
the preceding module definitions into our plan, by wiring them up to the
available browser APIs. Doing this is as easy as implementing the minimal
`system` interface referenced in the `RSCListingReader` code. Here's an
example implementation of such a system layer, which--you guessed it--is
itself correct and working code, and its inclusion here is also sufficient to
be able to actually use it. All that would be needed beyond this system layer
would be to wire up the pieces with the bootstrap loader, presented later.
Our system implementation satisfies the interface, because it implements both
`read` and `observe`, and it's free to implement it by whatever means
possible.
In an ideal implementation, for the `read` operation, we'd receive the path to
the file of interest and then read it. However, since we're expected to
execute in the browser instead of a traditional, operating system-managed
process execution context, there *are* no paths of files to be read. So we
fake it and support "reading" from exactly one path. Because the system layer
itself is responsible for how `RSCListingReader.convert` gets called, its
`read` method can be hardcoded, as shown, to expect any path of our choosing.
Note that this implementation of the system interface defers to an external
controller `AppStateController`, which has not yet been defined. We'll
need to do so in order to to have a complete working demo. Unfortunately, the
Web platform APIs do not make for our task to be as trivial as we'd hope for,
given the expectations we have for even our simple interaction design, but
it's still manageable.
The controller implementation needs to be able to activate the bound action--
in our case, the closure over `RSCListingReader.convert`--and to do so while
gracefully handling several cases:
- Ordinary vanilla activation intended to put the results on the screen using
Ctrl+Enter (i.e., the common case)
- Activation by editing the URL to insert the fragment, as on mobile (see
below)
- Navigating directly to the output view by URL, e.g. in a new tab
- Pressing reload on the output view
- Various other conditions involving moving backwards and forwards between the
browser history entries
Although it's verbose, the value of the controller implemented here is its
ability to handle these conditions.
Bootstrapping the system layer
------------------------------
And now, a bootstrap sequence:
That's it. Try it out by making sure this file is named something like
"oasm.txt.htm" (or anything ending with `.htm` or `.html`) and then pointing
your Web browser to it, e.g. by drag-and-drop or by double-clicking it in your
system's file manager. Activate the demo by pressing Ctrl-Enter on your
keyboard, or (particularly for devices without keyboards) append the URL
fragment `#?view=output` to the location shown in your browser's address bar.
If you edit your copy of this document, you can replace the sample input with
a program of your own and "re-run" the assembler on the new input by opening
it in the browser and repeating the steps to activate the assembler. You
should find that you're presented with output corresponding to the modified
input, rather than the output of the original demo here.
Feedback
========
This has been an experiment in writing code in a way that approaches the
practice of literate programming, albeit using native features of the
WHATWG/W3C hypertext system rather than tooling specifically crafted for the
task, such as Knuth's CWEB.
If you'd like to provide feedback, you are encouraged to annotate this content
using Hypothesis:
<:https://via.hypothes.is/https://www.colbyrussell.com/LP/debut/plain.txt.htm>
At the time of publication, a blog post about the writing process of this
document is being drafted, although not yet published. The evolution of
thought behind this and similar works, environmental constraints, and
motivating factors will be detailed there. When published, it will be
announced at the URL:
<:https://www.colbyrussell.com/LP/debut/background-and-primer>