Author: Colby Russell (https://colbyrussell.com) Date: 2021 February 10 An assembler ============ Our aim is to describe an "assembler" for the RSC object file format used in Project Oberon 2013 <:http://projectoberon.com>. The Oberon system itself includes no assembler--Wirth's compiler is single-pass, and it writes object files directly. However, the system does include ORTool, a utility for inspecting the binary object files. Adapting its output format, while less than ideal for authoring purposes, would give us something that fits our needs well enough and spares us the burden of designing our own input language. As a concrete example, we want to be able to ingest some input like this, which has been generated from the binary of a simple "hello, world"-style program module: Our assembler should be able to take input in textual format like the sample text shown in Figure 1 and generate a bit-for-bit identical copy of the original binary for which the ORTool-like utility produced the example output. We say that we rely on an ORTool-*like* utility, because while the ORTool output is almost rich enough for this purpose, i.e., containing enough semantic content to allow us to do the job, there are still some abiguities that would hamper our efforts. It is on this basis that the ORTool output "language" has been adapted to incorporate the necessary changes to allow it to be used as the input format for our assembly tool. Specifically, to achieve this, we must take some liberties with the contents of the strings section. To emphasize: these changes are crucial for the roundtrippability between binary to text and back. (NB: And although not strictly a necessary change, the "assembly" instructions in the preceding sample do not match the exact format that ORTool uses, but it should be reasonably straightforward to grasp for anyone already familiar with the ORTool output; this is a change that has been made purely as a matter of taste and convenience.) What does the raw content of an RSC binary even look like? Here's a hex dump of Hello.rsc corresponding to the binary described in the preceding input language sample: Such a binary was obtained by compiling the following module source using the Oberon system compiler: Let's give a detailed breakdown of the format in a way that exhibits a degree of rigor and completeness that is not possible by merely inspecting a single binary. Formal description ================== A brief aside: to accommodate a broad audience, we should describe the algorithms here in a neutral and accessible notation. So what notation is that, exactly? The classic answer to this problem would be "use ALGOL", but if we chose that route for the purity of its thrust, we will have failed in our goal--ALGOL isn't altogether used much even for academic purposes today and likely to cause more friction than it's worth. For that reason the example code will be presented in an ad hoc notation that is expected to be comprehensible to any moderately experienced programmer-- although those most comfortable working with languages from the C syntactic family tree will find bias in their favor, since we'll go with a curly brace notation. Some will notice that our fictional language's syntax bears resemblance to JS, and indeed widespread awareness and use of that language--even if such use is unenthusiastic by disinterested parties under duress--is a factor that has informed some of the choices made here. This document attempts to eschew with exotic syntactic constructs that may prove overly difficult to follow, considering its purpose is clarity; semantics for the parts of the text appearing in the code blocks should be self-evident, including for example the "include" declarations and the implied linking model, and it should be possible to infer modules' file names on sight--we waste no energy or space labeling them. The consequences of this omission are diminished by the fact that blocks of code, even adjacent ones, are distinguishable from one another because we choose to present them bracketed by the HTML-like `script` tags (already used for the preceding Oberon snippet), denoting individual modules; it's expected that this presentation is simple and straightforward enough to lend itself to immediate understanding even for readers who've never encountered the convention before. From this point onwards, this document contains minimal commentary about the structure of the object file--allowing the pseudocode speak for itself. Consider the primary module `RSCListingReader` the heart of a hypothetical assembler utility. Let's get right into it: Presentation and compatibility ============================== Suppose we copied the text of this document into a file, gave it a name with a file extension typically associated with the HTML mimetype, and then opened such a file in a web browser. What would be the expected result? If in that page we also applied some CSS rules like the ones that follow, on examination of the text here, it should be clear to readers with a good understanding of CSS and the HTML5 parsing algorithm that this will result in the main text of this document being displayed in the browser as plain text, in much the same way that it appears in a text editor--with the content of the following code block itself being the lone exception: On the other hand, because of the particular way that the preceding block is itself incorporated into this document, the process of copying the content exactly as shown is sufficient to make sure those rules get applied. We can do better than that, though. We can use the same trick to make sure that with just a few more rules this section is displayed correctly in a web browser, too. Viz: What's more, if we had been conservative enough and careful in our choices regarding the pseudocode notation in the modules presented here, then we could achieve an effect much more useful than mere display equivalence. By taking care to make sure the pseudocode can parse as syntactically valid when fed to the browser's JS engine and that it's semantically equivalent when evaluated--or at least close enough to "semantically equivalent"--then the reader could simply outsource the task of creating an assembler. Rather than a human using this document as a reference and implementing an assembler of their own, this document itself can be used *directly* as the implementation. In fact, we have been that careful, and that's been the plan all along. Achieving the full effect only requires a few more pieces. Implementing the system interface --------------------------------- Using a bootstrap loader, such as the one that follows later, we can finesse the preceding module definitions into our plan, by wiring them up to the available browser APIs. Doing this is as easy as implementing the minimal `system` interface referenced in the `RSCListingReader` code. Here's an example implementation of such a system layer, which--you guessed it--is itself correct and working code, and its inclusion here is also sufficient to be able to actually use it. All that would be needed beyond this system layer would be to wire up the pieces with the bootstrap loader, presented later. Our system implementation satisfies the interface, because it implements both `read` and `observe`, and it's free to implement it by whatever means possible. In an ideal implementation, for the `read` operation, we'd receive the path to the file of interest and then read it. However, since we're expected to execute in the browser instead of a traditional, operating system-managed process execution context, there *are* no paths of files to be read. So we fake it and support "reading" from exactly one path. Because the system layer itself is responsible for how `RSCListingReader.convert` gets called, its `read` method can be hardcoded, as shown, to expect any path of our choosing. Note that this implementation of the system interface defers to an external controller `AppStateController`, which has not yet been defined. We'll need to do so in order to to have a complete working demo. Unfortunately, the Web platform APIs do not make for our task to be as trivial as we'd hope for, given the expectations we have for even our simple interaction design, but it's still manageable. The controller implementation needs to be able to activate the bound action-- in our case, the closure over `RSCListingReader.convert`--and to do so while gracefully handling several cases: - Ordinary vanilla activation intended to put the results on the screen using Ctrl+Enter (i.e., the common case) - Activation by editing the URL to insert the fragment, as on mobile (see below) - Navigating directly to the output view by URL, e.g. in a new tab - Pressing reload on the output view - Various other conditions involving moving backwards and forwards between the browser history entries Although it's verbose, the value of the controller implemented here is its ability to handle these conditions. Bootstrapping the system layer ------------------------------ And now, a bootstrap sequence: That's it. Try it out by making sure this file is named something like "oasm.txt.htm" (or anything ending with `.htm` or `.html`) and then pointing your Web browser to it, e.g. by drag-and-drop or by double-clicking it in your system's file manager. Activate the demo by pressing Ctrl-Enter on your keyboard, or (particularly for devices without keyboards) append the URL fragment `#?view=output` to the location shown in your browser's address bar. If you edit your copy of this document, you can replace the sample input with a program of your own and "re-run" the assembler on the new input by opening it in the browser and repeating the steps to activate the assembler. You should find that you're presented with output corresponding to the modified input, rather than the output of the original demo here. Feedback ======== This has been an experiment in writing code in a way that approaches the practice of literate programming, albeit using native features of the WHATWG/W3C hypertext system rather than tooling specifically crafted for the task, such as Knuth's CWEB. If you'd like to provide feedback, you are encouraged to annotate this content using Hypothesis: <:https://via.hypothes.is/https://www.colbyrussell.com/LP/debut/plain.txt.htm> At the time of publication, a blog post about the writing process of this document is being drafted, although not yet published. The evolution of thought behind this and similar works, environmental constraints, and motivating factors will be detailed there. When published, it will be announced at the URL: <:https://www.colbyrussell.com/LP/debut/background-and-primer>