For Good Measure

May integration

by Colby Russell. 2019 May 15.

In February, I mentioned that I would be adopting a writing regimen where I publish all material on any subject I've given thought to writing up that month, and to do so regardless of whether I've actually sat down and finished a "proper" writeup for it.

What's the point of something like this? It's like continuous integration for the stuff in your head. For example, the faster binaries nugget included here in this brain dump is something I'm able to trace back to a thought I'd sketched out on paper back in April 2015, and even then I included a comment to myself about how at the time I thought I'd already had it written down somewhere else, but had been unable to find it.

This is inspired in part by Nadia Egbhal's "Things that happened in $MONTH" newsletter, but the subject matter is more closely aligned with samsquire's One Hundred Ideas For Computing.

Having said that all that, I didn't actually end up doing anything like this for March or April, due to a car accident. But here is this month's.


Sundry subdirectories

When working on software projects, I used to keep a p/ directory in the repo root and add it to .git/info/exclude. It's a great way to dump a lot of stuff specific to you and your machine into the project subtree without junking up the output of git status or the risk of accidentally committing something you didn't need to. (You don't want to use .gitignore because it's usually version-controlled.)

I had a realization a while back that I could rename these p/ directories to .../, and I've been working that like for several months now. I like this a lot better, both because it will be hidden in the directory listings (just like . and ..), and because the ellipsis's natural language connotations of being associated with "sundry" items. It feels right. And I have to admit, p/ was pretty arbitrary. I only picked it because it was unlikely to clash with any top-level directory in anything I clone, and because it's short. ("p" stood for "personal".)


Aspirational CVs

Here's an idea for a trend:

Fictional CVs as a way to signal the kinds of things you'd like to work on.

Jane is churning out React/whatever frontend work for her employer or clients. What she'd like to be doing is something more fulfilling. She's interested in machine learning, and she's maybe even started on a side project that she spends some personal time on, but she's often tired or burnt out and doesn't get to work as much on it as she'd like.

So one day when Jane is frustrated with work and has her thoughts particularly deep in chasing some daytime fantasy about getting paid to do something closer to her heart's desire, she cranks out an aspirational CV. Mostly therapeutic, but partly in the hopes that it will somehow enable her to actually go work on something like she describes in CV the entry.

In her aspirational CV, Jane writes from a far future perspective, where at some point in the not-too-distant future (relative to present day), she ran into an opportunity to switch onto the track in her career that turned out to be the start of the happiest she's ever been in her professional life. The fictional CV entry briefly summarizes her role and her accomplishments on that fantasy team, as well as a sort of indicator of a timeline where she was in that position.

Two weeks later, back in the real world, someone in Jane's company emails her to say they saw her aspirational CV that she posted on social media. They checked out her side project, too, and they want to know if she'd be available to chat about a transfer to work on a new project for a team that's getting put together.


Software finishing

I like sometimes in fiction where they present a plausible version of the world we live in, but it differs just slightly in some quaint or convenient way.

It's now widely recognized that global IT infrastructure often depends on software that is underfunded or even has no maintainer at all. Consider, though, the case of a software project whose development activity tapers off because it has reached a state of being "finished".

Maybe in an alternate universe there exists an organization—or some sort of loosely connected movement—that focuses on software comprehensibility as a gift to the world and for future generations. The idea is that once software approaches doneness, the org would pour effort into fastidiously eliminating hacks around the codebase in lieu of rewrites that presents the affected logic in a way that's clearer. This work would extend to getting compiler changes upstream that allows the group to judiciously cull constructs of dubious readability so that they may be replaced with such passages, which may have previously been not as performant but that now work just as well as the sections being replaced, without any penalty at runtime.

For example, one of the cornerstones of the FSF/GNU philosophy is that it focuses on maximizing benefit to the user. What could be more beneficial to a user of free software than ensuring that its codebase is clean and comprehensible for study and modification?


Free software is not enough

In the spirit of Why Open Source Misses The Point Of Free Software, as well as a restatement of Adam Spitz's Open Source Is Not Enough—but this time without the problematic use of the phrase "open source" that might be a red herring and cause someone who's not paying attention to mistake it for trying to make the same point as Stallman in the former essay.

Free software is not even enough.

Consider the case of some bona fide spyware that ships on your machine, except it's licensed as GPLv3. It meets the FSF's criteria for the definition of free software, but is it? You wouldn't mistake this for being software that's especially concerned for the user.

Now consider the case of a widely used software project, released under a public domain-alike license, by a single maintainer who works on it unpaid as a labor of love, except its codebase is completely incomprehensible to anyone except the original maintainer. Or maybe no one can seem to get it to build, not for lack of trying but just due to sheer esotericism. It meets the definition of free software, but how useful is it to the user if it doesn't already do what they want it to, and they have no way to make it do so?

Related reading:


"Sourceware" revisited

The software development world need a term for software with publicly disclosed source code.

We're at this weird point, 20 years out from when the term "open source" was minted, and there are people young and old who don't realize that it was made up for a specific purpose and that it has a specific meaning—people are extrapolating their (sometimes incorrect) misunderstandings just based on what the words "open" and "source" mean.

That's a shame, because it means that open source loses, and we lose "open source" (as useful term of art).

The terms "source available" and "shared source" have been available (no pun intended), but don't see much use, even by those organizations using that model, which distressingly sometimes ends up being referred to as "open source" when it's not.

For lack of a better term, I'll point to the first candidate that the group who ultimately settled on "open source" first considered: "sourceware". That is, used here to refer to software that is distributed as source, regardless of what kind of license is actually attached to it and what rights it confers to recipients. If it's published in source code form, then it's sourceware.

The idea is to give us something like what we have in the Chomsky hierarchy for languages.

So we get a Venn diagram is of the nested form.

... additionally, continuing the thought from above (in "Free software is not enough"), we probably need to go one step deeper for something that also incorporates Balkan's thoughts on the human experience in ethical design.


Changeblog

Configuration as code can't capture everything. DNS hosts are notorious for every service having their own bespoke control panel to manage records, for example.

In other engineering disciplines outside of Silicon Valley-dominated / GitHub cowboy coder software development, checklists and paper trails play an important role.

So when you make a change, let's say to a piece of infrastructure, and that change is not able to captured in source control, log a natural language description of the changes that were made. Or, if that's too difficult, you could consider maintaining a microblog written from first person perspective of (say) the website that's undergoing changes.

"Oh boy, I'm getting switched over to be hosted on Keybase instead of Neocities."


Orthogonally engineered REST APIs

Sometimes web hosts add a special REST API. You probably don't need to do this! If the types of sites you're hosting are non-dynamic sites, it would suffice if you were to implement HTTP fully and consistently.

For an example (of a project that I like): Neocities uses special endpoints for its API. If I want to add a file, I can POST a JSON payload, encoding the path and the file contents, to the /api/upload endpoint.

But if my site is example.neocities.org and I want to upload a new file foo/bar.png, the first choice available to me should be an HTTP PUT for example.neocities.org/foo/bar.png. No site-specific API required. Similarly for HTTP DELETE. HTTP also has support for a "list" operation—by way of WebDAV (which is a part of HTTP), which Neocities already supports. I shouldn't need to mention here that having a separate WebDAV endpoint from the "main" public-facing webserver isn't necessary either. But I've seen a lot of places do this, too.

These changes all work for sites like Neocities, because there is a well-defined payloads and namespace mapping, although it doesn't necessarily work as well for a site like Glitch which allows arbitrary user code to register itself to act as handler code on the server (but if the user's Glitch project is a static site known not to have its own request handling code in the form of a NodeJS script activated by package.json, then why not!)

Also could work for hashbase.io and keybase.pub, so long they pass along enough metadata in the request headers to prove that the content was signed by the keyholder. In the case of Hashbase, it'd be something similar to but not the same as Authorization: Bearer header, except with the header value encoding some delta for the server to derive a new view of the dat's Merkle tree. In the case of keybase.pub, whatever kbfs passes along to the Keybase servers.

You don't need bespoke APIs. (Vanilla) HTTP is your API.


Lessons from hipsterdom applied to the world of computing

I'm half joking here. But only half.

escape.hatch — artisanal devops deployments

People, even developers, are hesitant to pay for software. People pay for services, though. Sometimes, they won't be willing to pay for services in instances where they see their payments as an investment and are trepidatious about whether the business they're handing over money to is actually going to be around next year. That is, if a service exists for $20 per year, it's not enough to satisfy someone by giving them 1 year of service in exchange for $20 in 2019. They want that plus some sort of feeling of security that if they take you up on what you're selling, then you're going to be around long enough that they can give you money next year, too (and ideally, the next year after that, and so on).

People—especially the developer kinds of people—are especially wary of backend services that aren't open source. Often, they don't want to cut you out and run their own infrastructure, but they want the option of running their own infrastructure. They like the idea of being free to do so, even though they probably never will.

escape.hatch would specialize in artisinal devops deployments. Every month, you pay them, and in return they send you a sheet of handwritten notes. The notes contain a private link to a video (screencast) where they check out the latest version of the backend they specialize in, build it, spin up an instance on some commodity cloud compute service, and turn everything over to you. The cloud account is yours, you have its credentials, and it's your instance to use and abuse. The next month, escape.hatch's devops artisans will do the same thing. The key here is the handwritten notes and the content of the video showing off your "handwoven" deployment—like a sort of certificate.

The artisans will be incentivized to make sure that builds/deployments are as painless and as easy as possible and also that their services are resource efficient, because they're only able to take home the difference between what you pay monthly and what the cloud provider's cut is for running your node. (Or maybe not, in the case of a plan whose monthly price scales with use.)

Consumers, on the other hand, will be more likely to purchase services because they reason to themselves after watching the screencasts every month, "I could do all that, if I really needed to." In reality, although the availability of this escape hatch makes them feel secure, they will almost definitely never bother with cancelling the service and taking on the burden of maintenance.

(NB: the .hatch TLD doesn't actually exist)

microblogcasts — small batch podcasts


Improvements to man

I want to be able to trivially read the man pages for a utility packaged in my system's package repositories, even when that utility is not installed on my system. I'm probably trying to figure out whether it's going to do what I need or not; I don't want to install it just to read it and find out that it doesn't. I don't want to search it out, either.

Additionally, the info/man holy war is stupid. Every piece of software should have an in-depth info-style guide and and man-like quick reference. It's annoying to look up the man pages for something only to find that it's got full chapters, just like it's annoying to find that no man page exists because the GNU folks "abhor" them.

But I don't want to use the info system to browse the full guide. (It's too unintuitive for my non-Emacs hands.) I want to read it in the thing I use to browse stuff. You know―my browser.

Also, Bash should stop squatting on the help keyword. Invoking help shouldn't be limited to telling me about the shell itself. That should be reserved for the system-wide help system. C'mon.


Dependency weening

As a project matures, it should gradually replace microdependencies with vendored code tailored for callers' actual use cases.

Don't give up the benefits of code re-use for bootstrapping. Instead start out using dependencies as just that: a bootstrapping strategy.

But then gradually shed these dependencies on third-party code as the project's needs specialize—and you find that the architecture you thought you needed can maybe be replaced with 20 lines of code that all does something much simpler. (Bonus: if you find that a module works orthogonally to the way you need to use it, just reach in and change it, rather than worrying about getting the changes upstream.)

Requires developers to be more willing to take something into their source tree and take responsibility for it. The current trends involve programmers abdicating this reponsibility. (Which exists whether you ignore it or not; npm-style development doesn't eliminate responsibility, just makes it easier to pretend it isn't there.)


Programs should get faster overnight

I mean this in a literal sense: programs should get faster overnight. If I'm working on a program in the afternoon, the compiler's job should be to build it as quickly as possible. That's it. When I go to sleep, I should be able to leave my machine on and it's then that a background service uses the idle CPU to optimize the binary to use fewer cycles. It would even be free to use otherwise prohibitively heavyweight strategies, like the approach taken by Stanford's STOKE. When I wake up in the morning, there is a very real possibility that I find a completely different (but functionally equivalent, and much faster) binary awaiting me.

In fact, we should start from a state where the first "build" is entirely unnecessary. The initial executables can all be stored as source code which is in the first instance fully interpreted (or JITted). Over the lifetime of my system installation, these would be gradually converted into a more optimized form resembling the "binaries" we're familiar with today (albeit even faster). No waiting on compilers (unless you want to, to try moving the process along), and you can reach in and more easily customize things for your own needs far more easily than what you have to do today to track down right source code and try getting it to build.


Self-culling services

tracker-miner-fs is a process that I'll bet most people aren't interested in. On my machine, a bunch of background Evolution processes are in the same category. Usually when I do a new system install, I'll uninstall these sorts of things, unless there's any resistance at all with respect to complaints from the system package manager about dependencies, in which case I tend to immediately write it off as not worth the effort, at which point I decide that I'll just deal with it and move on.

Occasionally, though, when I've got the system monitor open because I'm being parsimonious about compute time or memory, I'll run into these services again, sitting there in the process list.

To reiterate: these are part of the default install because it's expected that they'll be useful to a wide audience, but a service capable of introspection would be able to realize that despite this optimistic outlook, I've never used or benefited from its services at all, and therefore there's no reason for it to continue trying to serve me.

So package maintainer guidelines should be amended to go further than simply dictating that services like these should be trivially removed with no fuss. The guidelines should say that such background services are prohibited in the default install unless they're sufficiently self-reflective. The onus should be on them to detect if they're going silently unused and then disable or remove themselves.


Code overlays

Sometimes I resort to printf debugging. Sometimes that's more involved than the colloquialism lets on—it may involve more than adding single line printf here and and there; sometimes it requires inserting new control flow statements, allocating and initializing some storage space, etc. Sometimes when going back to try and take them out, it's easy to miss them. It's also tedious to even have to try, rather than just wiping them all out. Source control is superficially the right tool here, but this is the sort of thing you're usually doing just prior to actually committing the thing you've been working on. Even with Git rebase, committing some checkpoint state feels a little heavyweight for this job.

I'd like some sort of "code overlay" mechanism comprising a standardized (vendor- and editor-neutral) format used as an alternative to going in and actually rewriting parts of the code. I.e., something that reinforces the ephemerality and feels more like Wite-Out, or Post-its hovering "on top" of your otherwise untainted code.

This sort of thing could also be made general enough that in-IDE breakpoints could be implemented in terms of a code overlay.

These would ideally be represented visually within the IDE, but there'd be a universally understood way to serialize them to text, in case you actually wanted to process them to be turned into a real patch. If represented conceptually, your editor should still give you the ability to toggle between the visual-conceptual form and the "raw" text form, if you want.

The best part is that there could be tight integration with editor's debugging facilities. So when entering debug mode, what it's really doing is applying these overlays to the underlying file, doing a new build with these in place, and then running it. If the build tool is completely integrated into the IDE, then the modified version of the source tree (i.e. with the overlays applied) would never even get written to disk.

In essence, these would be ephemeral micro patchsets managed by editor itself and not the source control system.


DSLs

Domain-specific languages (DSLs) are bad. They're the quintessential example of a "now you have two problems" sort of idea. And that's not even a metaphor; the popular quip is about the syntax of regular expressions—it's a direct application of the more general form I'm pushing here.

The sentiment I'm going for isn't too far off from Robert O'Callahan's thoughts about The Costs Of Programming Language Fragmentation, either.


Wikipedia Name System

Wikipedia Name System as a replacement for EV certificates