Roam and Athens: Boons and Pitfalls

In this post I will present my take on the recent trend in Zettelkasten note taking applications, spearheaded by Roam Research. I will using the open-source alternative Athens Research (currently in beta) as a case study. This post will be easier to understand if you have some familiarity with these kinds of note-taking apps, though it shouldn’t be necessary.

I’ll start off with a story. Imagine — in the midst of an idle daydream — you’ve just been struck with an idea; a stunning analogy between programming and rock climbing. An idea so fresh that your mind is still formulating it into words, yet you are already certain of its quality. You load up your favorite note taking app, fingers wriggling in anticipation, and are greeted with…

Journal page in Athens

A journal. Trivial happenings related to this day. Moving to type, you suddenly flinch as the systemising module in your brain emits a pang of disgust; yes you happened to come up with this idea today, but is this journal page where the idea should live? Subtle annoyance morphs into Lovecraftian horror as you wonder what lucid revelations lie sandwiched between Wednesday morning’s panini and afternoon ciabatta.

If you were prepared you could have avoided this. You could have created some unwieldy page dedicated to loose ideas. If you’re a rebel, you could turn your conjecture into its own page. You could, through some manner of learned helplessness, squeeze it in to a random related pages (perhaps [[programming]] or [[rock climbing]]) before those precious moments of clarity slip away. With a mental shrug, you resign yourself to simply entering a new bullet into the journal page before you.

Code is like climbing, it’s hard to get a grip!

With the text itself seeming to return your blank stare, you meekly wonder if that was what you really meant to write…

Though the above scenario may seem trivial and asinine, it is but one symptom of an issue far more systemic; forcing the user to make a choice they didn’t yet want to — in this case: “Where should my note live?”.

For a powerful counter-example, one need only take a cursory glance at Google’s much more basic note-taking application, Keep.

Landing screen for Google Keep

If I want to write something, anything, there’s a box waiting for it right there: “Take a note”. By typing in that box, I’m not making an implicit choice about what kind of note it is, or where it should live. If I want to categorize the note, I can do that before, during, or after taking it.

Labelling a note in Google Keep

Granted, Roam-likes have much more advanced organizational capabilities, but that by no means gives them a license to forsake the most basic of use cases.

To understand why Roam-likes are so fundamentally different, we can take a look at their data model.

The Roam-like data model (diagrams.net)

The Notebook in this context is a non-negotiable — it is the silo in which the note data exists. Athens does have pie-in-the-sky plans of a “Collective Memex” which breaks the bounds of these silos (and that is something to keep in mind), but for now each notebook is its own world.

A Page is exactly what it says on the tin. If a notebook is a folder, then a page is a document in that folder. Pages consist of nested bullet points (think a tree structure), and only really start to get interesting in their interactions with those bullet points, called…

Text Blocks. The real magic of Roam-likes. Text blocks have the ability to reference other entities: they can link to pages, and embed other text blocks. The bi-directionality of references means that not only can you follow them out, but you can also easily find which text blocks are referencing in.

Showcasing Athens’s referencing functionality with a page about “toast”. “marmalade” is an example of a page link, while “sweeter is always better” is an example of an embedded text block (part of another page). The “Linked References” section show which external text blocks link to this page.

Referencing pages and text blocks are presented to the user somewhat differently. Page links are contained in double square brackets, and editable as plain text (e.g. [[Athens]]), while text blocks embeds are contained in double parentheses and are editable as a hexadecimal ID (e.g. ((70bfb771f))). Despite this difference in the text presentation, the semantics are essentially identical; linking to a page displays its title and allows the user to navigate there, while embedding a text block displays its contents and allows the user to navigate there.

The reason that a user is presented with a ((hexadecimal id)) when editing, instead of the actual content of that text block, is difficult to pin down. My own hypothesis is that it is because the roots of these note taking apps lie in files and file systems. Roam-likes, with all their simplicity and elegance, are still essentially hotted-up plaintext files, and — try as they might — can’t seem to escape that heritage.

Edit mode of a block text reference

Whatever the reason, the current behaviour serves as a fairly rudimentary way of shielding the user between the ambiguity of modifying the parent or referenced text block. A smarter interface would allow seamless and intuitive scoping between those contexts instead of exposing meaningless internal IDs to the user. An even smarter application might allow text-block “forking”, but that’s an concept for another post.

When looking at the difference between pages and text blocks in the data model diagram, you may have seen that while text blocks have the fancy referencing capabilities that Roam-likes are famous for, pages do not. This highlights a key insight:

Pages are like text blocks with less features.

That’s fine, if you always want the power of text blocks, just don’t use pages, right? Not quite; text blocks must exist within a page, so like it or not, you are stuck with at least one. But more importantly, the UI revolves around the page. There is a special list of all pages, automatically created journal entries are pages, searching for text displays the page name most prominently, the graph view only shows pages, and pages have the much more [[user friendly]] syntax for referencing.

So using pages is a necessary evil in Roam-likes, at least if we want to take advantage of many of the applications features. And so this begs the question to a user each time they want to enter a note, should they use a page or a text block? They can find out by asking a series of questions:

  1. Do I want to be able to reference other ideas from inside this note, now or in the future?
  2. Do I want to use the ((text block embedding)) instead of [[page linking]] when referencing this note?
  3. Is there a page I want to put this note in?
  4. Do I want this text to be hidden from the notebooks global page list and graph view?

Answer “Yes” to all of the above questions, and they should make a text-block. Answer “No” to all of the above questions, and they should make a page. But answer “Yes” to some and “No” to others, and trouble begins.

I’d argue that the answer to (1) is never an exact “No”, and (2) is never an exact “Yes”, meaning that each time this decision is made, there is always an implicit tradeoff. In addition, (3) and (4) are often completely independent from the others. This predicament is made much worse by the fact that it’s not a trivial task to refactor between pages and text blocks once they have been fleshed out.

The de facto answer to these conundrums is that it’s always preferable to “let the user decide” how they want to structure their notes. Many see giving the user more choice as giving them more freedom, but this is not the full story. True freedom comes from the ability to choose which choices you make. Any set of options is incomplete if it doesn’t list “don’t care”.

How does one allow the user to not care about the difference between a page and a text block? One way is by instead offering a generalization of the two. Consider the following simplification of the original data model:

Original data model (diagrams.net)
Simplified data model (diagrams.net)

At first glance all that has happened is that Page has been removed between Notebook and Text Block, which seems to reduce rather than enhance functionality. But let’s take a look at the effects of this simplification:

  • Each note in the notebook is capable of referencing any other. No need to consider whether you want referencing when creating the note.
  • There is only one kind of referencing. Gone is the meaningless distinction between ((text block embedding)) and [[page linking]]. For this to work well, text block embedding would need to be more seamless to the user than it is currently, and feel more like page linking.
  • Text blocks do not need to live inside another note entity: they can float directly in the notebook itself, allowing them to be entered without first needing to give a thought to where it should go. It also means text blocks can be dynamically created from within other text blocks, just as you now can with pages in the original model.

That’s all well and good, but what about the functionality reserved for pages: the page list, graph view, search highlighting etc. It wouldn’t make sense for every text block to be shown in these contexts. Luckily, simple problems have simple solutions: just add a field on text blocks to indicate whether this text block should also “act” as a page. The implications are:

  • Text blocks can be easily refactored between whether they should act like a page or not, as simply as a toggling a “page” switch. The default for a child of another text block should be OFF, while the default for a child of the notebook might be ON (or based on user preference).
  • Pages now have the possibility of existing within other pages.

By generalizing the data model, the arbitrary restrictions on the notebook structure have been relaxed, and the cognitive friction in entering notes has been diminished.

In closing, I would just like to clear up one misconception that a dedicated Roam-like user may have. Adopting the proposed model do NOT mean that any of their current workflows need to change.

In practice, I would fully expect that with the proposed model, the UI would change very little. The pages screen can still exist, populated by those text blocks marked as pages. There might be a new section for “unsorted notes”, those text blocks that are not marked as a page, and are not a child of any other text block. There would be a switch on every text block to toggle it as a page. The UI could even continue to display references differently depending on whether the target text block is marked as a [[page]] or ((not)).

I hope that this write-up might influence note app users and developers to think a little further outside of the box when reasoning about something as abstract, important, and challenging as a space for thought. I look forward to any and all feedback, whether it be in agreement or disagreement.