I really only ever make something when I want something to exist that doesn’t already, or when I want something that does exist to more readily suit my (admittedly) idiosyncratic needs or thoughts about how it should exist. For better or worse, I have a lot of wants, and so I make a lot of things (e.g., Two Page Tuesday, or last night’s mostly-successful attempt at tapering a pair of pants I got at Global Thrift, or an early solve for the problem I’m solving here).

So: I wrote an asciidoc parser in Rust. I called it asciidocr because the Command-Line Rust book put an r after all the "clone a UNIX tool" projects, and I liked that convention.

Asciidoc is a lightweight markup language that is, in my opinion, the best one. Why it’s the best one is a separate issue entirely, but we can at least safely assume that it’s a good one, and the one that, for better or worse, I’ve been using to write nearly everything I’ve written for personal or professional use in the last five years or so. While it started as a Python project, it got new life (and a bunch of new features) when it was more or less taken over by the fine Asciidoctor folks, who wrote their converter in Ruby. It works very well, and does a lot of things. But.

It’s in Ruby, a language I have petty beef with and, more importantly, is an interpreted, not compiled language, which means that for every new machine I want to convert asciidoc files on, I need to install Ruby. And there are some other things to, in part pertaining to the way that templates must be written for custom output(s), it’s frankly a little slow, and whatever else.

But mostly it was the "I don’t want to have to write Ruby to extend the thing" that got me thinking. I was dreaming about a text-based writing management tool (like a Scrivener but for folks who use vim), and having already written a tool to make generating PDFs from asciidoc easier, I knew that if I wanted to write this next app in anything but Ruby, I’d need to either (a) subprocess out to the Ruby; (b) rely on the old asciidoc.py project, with its limitations (and also therefore limiting myself to writing in Python, which, like Ruby, means that if I wanted to share my tool, the folks using it would need to be able to install Python); or (c) find or build a converter in a different language. So after getting part of the way through an (a) implementation in Python, I cut my losses and started looking more readily into option (c), for: I was learning Rust and Go(lang).

There is, in fact, a pretty good Go implementation of an asciidoc parser/converter. And there was a hot second when it looked like my company might transition to Go for some backend stuff, so I picked up Powerful Command-Line Applications in Go and got to work. Unfortunately I realized pretty quickly that I am allergic to the following, oft-repeated pattern in the language:

  if err != nil {
    return err
  }

And then it became clear that we weren’t going to be using Go at work, so I dropped it.

Rust, on the other hand: boy-howdy did I love (and still do) working in that. And sure, there wasn’t a very feature-complete asciidoc parser or converter yet, but I liked the language and figured I could learn something: so I asked for some mentorship (thanks big time to Kit Dallege for everything that follows) and got to work.

My background is, of course, very humanities-focused. I mean, sure, there was a math minor in there somewhere, but that was all in service of a brief glimmer of a future doing philosophy of math, so. I’ve written a lot of code, and have been writing some kind of code or other since I was a small kid (thank you, hackable Geocities sites), but I have no "computer science education." Learning how to write a parser seemed like a good way to go.

And instead of relying on a lexing package (e.g., something like pest), where you write a grammar and the thing does it for you, Kit recommended I do the whole thing by hand, since I’d learn more (and potentially it could be faster, or at least a smaller binary).

So that’s more or less what I did. It’s not perfect; it could, of course, be improved; there are some decisions I made early on that I would not make today, knowing what I know how; and I am very fucking proud of it. So we can dig in.

Pretending it’s a Compiler

Googling around got me to a few resources that seemed like they’d be relevant, specifically the Commonmark Spec section about parsing, but what really ended up sticking in my brain was a book called Crafting Interpreters, which I someday would love to go back and really read for its intended purpose. But since I was going to be doing more or less the first half (up to the point where you do something with the tree you’ve created by scanning and parsing the code), I figured this would be a good place to start, and it was! Very well-written, too. So much so that it even made sense though I haven’t pretended to know anything about reading Java in years.

What this meant, anyway, was that I had a clear path forward. Prior to asking for help, I’d written a half-of-a-half implementation that mixed up the lexing and the parsing and the output all together, but this was going to be better, both in terms of building it, in terms of architecture, and in terms of being able to do other things with the tree/graph once I had it. So what I would then do was:

  1. Scan the document into tokens

  2. Parse those tokens into a tree

  3. Take that tree and do something with it

Easy enough, right?

Scanning, Lexing, Whatever You Want to Call It

I’m still not sure what the difference between "scanning" and "lexing" is, if there is one at all, but anyway I needed to generate some tokens. I don’t plan on going into too much detail about the why/how of this (instead I refer you back to Crafting Interpreters), but there are a few interesting (annoying?) things about asciidoc that I think are worth mentioning here.

Like markdown, asciidoc is essentially a line-based language. The most significant character is therefore the line break, \n, and in some worlds/lights it makes sense to parse asciidoc line-by-line. If I were to go back and do it as a "one-shot" parser (which according to the chatter in the Asciidoc community chat, isn’t possible anyway), I might do it as a line-by-line thing. Instead, however, I did the scanning character-by-character, in part because that’s what the book told me to do, and in part because keeping track of the newline tokens actually made parsing much easier in the end (I think/hope, anyway).

So the scanning.

Maybe the best "new thing I started using a lot" of 2024 was the humble Enum. I started using them in Python for a specific thing, and then started using them more, and one of the things I like best about Rust is that it takes its Enums seriously. So, to wit, the first thing I did was create a big ass TokenType enum:

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum TokenType {
    NewLineChar,
    LineContinuation,
    ThematicBreak,
    PageBreak,
    Comment,
    PassthroughBlock, // i.e., "++++"
    SidebarBlock,     // i.e., "****"
    SourceBlock,      // i.e., "----"

    // ...snip
Note
All source can be found in the Github repo. I’m going to condense and remove some comments and things in this post as needed to keep it clean.

And then a Struct for each token:

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Token {
    pub token_type: TokenType,
    pub lexeme: String,          // raw string of code
    pub literal: Option<String>, // our literals are only ever strings (or represented as such)
    pub line: usize,
    pub startcol: usize,
    pub endcol: usize,
    /// The file's stack hierarchy if it's an include, otherwise stays empty
    pub file_stack: Vec<String>,
}

There is a draft official schema for how an asciidoc document should be (able to be) represented, and that’s why we’re keeping track of line, startcol, etc. I think if I were to go back and clean this up, we could probably drop the literal attribute, since we don’t really need it (this was inspired/copied from the Crafting Interpreters way of doing things, which has different requirements than what we have, ultimately).

So once we have our Token structs to play with, we can then proceed to actually scanning the document into tokens. We create a Scanner struct to hold some state and the source and things:

#[derive(Debug)]
/// Scans an asciidoc `&str` into [`Token`]s to be consumed by the Parser.
pub struct Scanner<'a> {
    pub source: &'a str,
    start: usize,
    startcol: usize,
    current: usize,
    line: usize,
    file_stack: Vec<String>,
}

And then, because Rust has such good pattern matching, the actual work just becomes a(n admittedly gigantic) match/switch statement:

fn scan_token(&mut self) -> Token {
        let c = self.source.as_bytes()[self.current] as char;
        self.current += 1;

        match c {
            '\n' => self.add_token(TokenType::NewLineChar, false, 1),

            '\'' => {
                if self.starts_repeated_char_line(c, 3) {
                    self.current += 2;
                    self.add_token(TokenType::ThematicBreak, false, 0)
                } else if ['\0', ' ', '\n'].contains(&self.peek_back()) && self.peek() == '`' {
                    self.current += 1;
                    self.add_token(TokenType::OpenSingleQuote, true, 0)
                } else {
                    self.add_text_until_next_markup()
                }
            }
    // ...snip

In order to keep things moving along speedily (because, in addition to being "cool," Rust is also supposed to be "fast"), the actual scanning function is implemented as an Iterator (a "generator" in Python-speak):

impl<'a> Iterator for Scanner<'a> {
    type Item = Token;

    fn next(&mut self) -> Option<Self::Item> {
        if !self.is_at_end() {
            self.start = self.current;
            return Some(self.scan_token());
        }
        None
    }
}

(It was amazing how easy it was to do that, really.)

Some fun nuances, because we’re dealing with "text" instead of "code," that came up ended up being character boundaries. So take something like the humble ellipsis () or an emoji: these require multiple bytes to represent. This means that sometimes you might try to do something between the bytes it takes to represent the character, which makes the scanner sad (and die, or in Rust-parlance, panic!).

(It occurs to me now that I should have specified earlier that we’re scanning byte by byte, not character-by-character; there are some reasons for doing this that I don’t feel like explaining to do with the way text is encoded and then handled by Rust, so, just, like, trust me that this was a good way to do it.)

Getting around this means that we just check for character boundaries when we look around to see, based on context, what kind of token we should be producing. And we do a lot of looking around! Here are a few, noting the easy-to-use is_char_boundary() function in there:

    fn peek(&self) -> char {
        if self.is_at_end() || !self.source.is_char_boundary(self.current) {
            return '\0';
        }
        self.source.as_bytes()[self.current] as char
    }

    fn peek_back(&self) -> char {
        if self.start == 0 || !self.source.is_char_boundary(self.start - 1) {
            return '\0';
        }
        self.source.as_bytes()[self.start - 1] as char
    }

    fn peeks_ahead(&self, count: usize) -> &str {
        if self.is_at_end()
            || self.current + count > self.source.len()
            || !self.source.is_char_boundary(self.current + count)
        {
            return "\0";
        }
        &self.source[self.current..self.current + count]
    }

This means that, say, if we get a character -, and know it’s the beginning of a new line (i.e., that self.peek_back() == '\n'), and we can peeks_ahead to see that self.peeks_ahead() == "---\n", we know that we should generate a TokenType::SourceBlock delimiter token. Scanning is essentially that, but, like, a bunch of times with a bunch of edge cases and nuances (e.g., because that four-repeated-characters-before-a-newline is such a common pattern, you write a function that checks that for you).

This, naturally, segues into unit testing!

There are a lot of tests around the scanner! I haven’t yet gotten around to running coverage on it, but I think it’s pretty good. One thing I don’t like about Rust is that, by convention, you keep unit tests in the same file as the code they’re testing. I see why you’d want to do that, but also my scanner/mod.rs file is a whopping 1932 lines long. Coming from Python-land… ouch! Still: it works, especially if you use my new best friend rstest, which works so analogously to our dear friend pytest that I was able to get up and running in a matter of minutes with it, simplifying the test-cases dramatically:

#[rstest]
#[case("NOTE", TokenType::NotePara)]
#[case("TIP", TokenType::TipPara)]
#[case("IMPORTANT", TokenType::ImportantPara)]
#[case("CAUTION", TokenType::CautionPara)]
#[case("WARNING", TokenType::WarningPara)]
fn inline_admonitions(#[case] markup_check: &str, #[case] expected_token: TokenType) {
    let markup = format!("{}: bar.", markup_check);
    let expected_tokens = vec![
        Token::new_default(
            expected_token,
            format!("{}: ", markup_check),
            Some(format!("{}: ", markup_check)),
            1,
            1,
            markup_check.len() + 2, // account for space
        ),
        Token::new_default(
            TokenType::Text,
            "bar.".to_string(),
            Some("bar.".to_string()),
            1,
            markup_check.len() + 3,
            markup_check.len() + 6,
        ),
    ];
    scan_and_assert_eq(&markup, expected_tokens);
}

Easy, right? So let’s now suppose we scan our document-as-a-&str into a bunch of tokens. We then parse them. Yay!

Parser-ing

…and again we use a big-ass match statement. But before we can really get into that, we need to look at what we’re doing all this parsing into, namely a (mostly) spec-compliant Abstract Syntax Graph.

Trees and Graphs

In "normal" parsing you create a node tree, and then do some traversing of that tree and… actually I didn’t get that far in the book. Because, to be compliant with the "Asciidoc Technology Compatibility Kit (TCK)," you need to produce JSON, I… just figure it would be easier to start there. JSON — and more specifically the objects needed to serialize it — would be an easy enough "intermediate representation" from which to then go on and output HTML and other formats (more on this later).

To be completely honest, the spec is good but not great, and frankly not complete yet. If I had more time and energy I would contribute more readily to the ADRs and discussions and so on, but… I don’t. Yet. Maybe someday. Regardless.

This meant essentially that I could go the "super abstract" route, and create generic "block" and "inline" objects and go from there, or I could just go ahead and make a struct for each kind of thing, since it’s a finite set of things. So I went that route.

As you do in rust, I used serde and serde-json to do the serialization. What this meant, though, was that it was going to be harder to use Traits to create shared functionality (and to make functions accept more generic parameters). I looked at a few crates that ultimately used our old friend the Enum on the backend to make the serialization happen (since you get the serialization more or less for free with an Enum), so I just did that directly. This meant that I had, for example, this hairy-looking thing:

#[derive(Serialize, Clone, Debug)]
#[serde(untagged)]
pub enum Inline {
    InlineSpan(InlineSpan),
    InlineRef(InlineRef),
    InlineLiteral(InlineLiteral),
    InlineBreak(LineBreak),
}

And that I had to do a lot of if let Some(Block::LeafBlock(block) = foo.last_mut() type stuff, but I’m told that this is part of why my parser is so fast, because enums are so fast, and… if you’re not first you’re last? Anyway, this is one of the design decisions that I’m not sure I would do again (I think it would be more developer-friendly to use Traits), but as (a) I am the only developer and (b) it works, and is fast, it’s fine.

So parsing then becomes a matter of looking at given Token and deciding what to do with it. Because "what to do with it" is often a matter of context, we build a lot of that context into our Parser:

/// Parses a stream of tokens into an [`Asg`] (Abstract Syntax Graph), returning the graph once all
/// tokens have been parsed.
pub struct Parser {
    /// Where the parsing "starts," i.e., the adoc file passed to the script
    origin_directory: PathBuf,
    /// allows for "what just happened" matching
    last_token_type: TokenType,
    /// optional document header
    document_header: Header,
    /// document-level attributes, used for replacements, etc.
    document_attributes: HashMap<String, String>,
    /// holding ground for graph blocks until it's time to push to the main graph
    block_stack: Vec<Block>,
    /// holding ground for inline elements until it's time to push to the relevant block
    inline_stack: VecDeque<Inline>,
    /// holding ground for includes file names; if inside an include push to stack, popping off
    /// once the file's tokens have been accommodated (this allows for simpler nesting)
    file_stack: Vec<String>,
    /// holding ground for a block title, to be applied to the subsequent block
    block_title: Option<Vec<Inline>>,
    /// holding ground for block metadata, to be applied to the subsequent block
    metadata: Option<ElementMetadata>,
    /// counts in/out delimited blocks by line reference; allows us to warn/error if they are
    /// unclosed at the end of the document
    open_delimited_block_lines: Vec<usize>,
    /// appends text to block or inline regardless of markup, token, etc. (will need to change
    /// if/when we handle code callouts)
    open_parse_after_as_text_type: Option<TokenType>,
    // convenience flags
    in_document_header: bool,
    /// designates whether we're to be adding inlines to the previous block until a newline
    in_block_line: bool,
    /// designates whether new literal text should be added to the last span
    in_inline_span: bool,
    /// designates whether, despite newline last_tokens_types, we should append the current block
    /// to the next
    in_block_continuation: bool,
    /// forces a new block when we add inlines; helps distinguish between adding to section.title
    /// and section.blocks
    force_new_block: bool,
    /// Temporarily preserves newline characters as separate inline literal tokens (where ambiguous
    /// blocks, i.e., DListItems, may require splitting the inline_stack on the newline)
    preserve_newline_text: bool,
    /// Some parent elements have non-obvious closing conditions, so we want an easy way to close these
    close_parent_after_push: bool,
    /// Used to see if we need to add a newline before new text; we don't add newlines to the text
    /// literals unless they're continuous (i.e., we never count newline paras as paras)
    dangling_newline: Option<Token>,
}

(As an aside: I’m keeping the comments on this struct, as opposed to many of the others I’ve shown above, in part because it’s useful and in part because I want to shout out to docs.rs for making it SUPER easy to generate really nice documentation for your project. Makes my former technical writer heart happy.)

We keep track of a lot of state, and frankly it got a little over-complicated, but also I didn’t have the time to make it simpler, so: it works, you know?

Again we have a big match statement with a lot of arms like:

TokenType::QuoteVerseBlock => {
    // check if it's verse
    if let Some(metadata) = &self.metadata {
        if metadata.declared_type == Some(AttributeType::Verse) {
            self.parse_delimited_leaf_block(token);
            return;
        }
    } else if self.open_parse_after_as_text_type.is_some() {
        self.parse_delimited_leaf_block(token);
        return;
    }

    self.parse_delimited_parent_block(token);
}

These, in turn generate various Block and Inline objects, that get added to our Abstract Syntax Graph:

#[derive(Serialize, Debug)]
pub struct Asg {
    pub name: String,
    #[serde(rename = "type")]
    pub node_type: NodeTypes,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub attributes: Option<HashMap<String, String>>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub header: Option<Header>,
    #[serde(skip)]
    /// footnote references
    document_id: String,
    #[serde(skip)]
    /// Has of all IDs in the document, and the references they point to
    document_id_hash: HashMap<String, Vec<Inline>>,
    /// Document contents
    pub blocks: Vec<Block>,
    pub location: Vec<Location>,
}

So by and by we build our graph, which takes something like:

This document has two paragraphs.

Paragraphs may be separated by one or more empty lines.

Into:

{
  "name": "document",
  "type": "block",
  "blocks": [
    {
      "name": "paragraph",
      "type": "block",
      "inlines": [
        {
          "name": "text",
          "type": "string",
          "value": "This document has two paragraphs.",
          "location": [ { "line": 1, "col": 1 }, { "line": 1, "col": 33 } ]
        }
      ],
      "location": [ { "line": 1, "col": 1 }, { "line": 1, "col": 33 } ]
    },
    {
      "name": "paragraph",
      "type": "block",
      "inlines": [
        {
          "name": "text",
          "type": "string",
          "value": "Paragraphs may be separated by one or more empty lines.",
          "location": [ { "line": 4, "col": 1 }, { "line": 4, "col": 55 } ]
        }
      ],
      "location": [ { "line": 4, "col": 1 }, { "line": 4, "col": 55 } ]
    }
  ],
  "location": [ { "line": 1, "col": 1 }, { "line": 4, "col": 55 } ]
}
Note
All that location stuff is required by the schema; I don’t like it, but hey, it’s not all about me. If ever somebody takes this to create a better asciidoc LSP or something, it’ll be useful information. (Or if I ever start doing more error handling/verification for the user.)

I could perhaps go into more detail about how the parsing actually works, but, you know, it’s just creating objects, and this is getting long. So if you’re curious, look at the code (or holler at me on Bluesky and I’ll do a follow-up post about whichever part you’re interested in). We’ll now turn to doing something with this graph we’ve made.

Turning it Into Something Useful (Templating)

The first, most obvious useful thing for the parser to do is produce HTML, since that can be turned into basically anything else, one way or another. Instead of targeting the kind of HTML that Asciidoctor produces (which I find overly div heavy), I targeted a HTML standard called "HTMLBook", in part because that’s what I use for work and am therefore most comfortable with, and in part because it’s clean and simple and more like what pick-your-favorite-markdown converter produces. So to make HTML, we use templating. Yes! Our old friend templating. From Dreamweaver templates to LiquidTemplates to handlebars to Jinja/Django, they’re all more or less the same. More or less usable. Etc. For this project I went with one called tera, after trying one called askama, which was really really cool but ultimately was hard to make work nicely with serde.

tera, on the other hand, is basically just Django templates. I write Django templates at work. Easy:

{% import "inline.html.tera" as inline_macros %}
{% import "block.html.tera" as block_macros %}
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>{%- if header %}
        {%- for inline in header.title %}
        {{- inline_macros::process_inline(inline=inline) -}}
        {% endfor -%}
        {% endif -%}</title>
</head>

<body>{% for block in blocks %}
    {{ block_macros::process_block(block=block,skip_tag=false) -}}
{% endfor %}
</body>

</html>

There is a pretty annoying recursion issue (not the fault of tera so much as the fault of what I’m trying to do with it), which means that the block and inline macro code is… ugly. But hey, it works to produce nice, clean documents like the following:

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title></title>
</head>

<body>
    <p>What follows is an aside.</p>
    <aside data-type="sidebar">
        <h5>Aside Title</h5>
        <p>Some aside text!</p></aside>
</body>

</html>

Nice.

But Wait! There’s More!

So now we’ve more or less gotten to the point where we’ve duplicated a good chuck of what asciidoctor, the reference implementation, does in terms of parsing and conversation, but of course: asciidocr, this implementation, DOES NOT DO EVERYTHING ASCIIDOCTOR DOES, and doesn’t intend to. But it does handle a whole bunch of the language, including nice things like include:: directives (see the limitations doc in the repo for more), but this all started because I not only wanted a non-interpreted-language implementation (with Rust we can generate binaries), but also because I wanted to do other stuff, more easily.

So let’s talk about a little of that.

Docx

If there is a "killer feature" of asciidocr, is it that it will — eventually — produce Word/docx files natively. Creating docx files is a PAIN IN THE ASS, but it’ll be worth it for folks like me who want to write their fictions and whatever else in asciidoc, but then have to send journals and agents publishers Word documents.

I’m currently rewriting the implementation of the DOCX backend, but even now, if you install the tool with the --feature docx enabled (for more on what I’m talking about when I talk about installing a Rust feature, see here), you can get a docx created IF:

  • It’s only prose and headings

  • BUT it can include italics and bold and stuff

The reimplementation will be better and handle more things — tables, lists, etc., — but I wanted to write this post now, instead of waiting for it to be "done," since "done" is a myth when it comes to software. Anyway: go try it out! My hope is for the docx backend to be stable enough that I don’t need to hide it behind a feature flag anymore.

Rust and Python

And, somewhat finally, another feature-flag thing: calling asciidocr from Python, making asciidoc conversions super fast with modern syntax (compared to asciidoc.py).

All the credit for this really goes to the pyo3 project, but building on top of their brilliant work, it’s very easy to do something like:

#![cfg(feature="python")]

use std::path::PathBuf;
use crate::scanner;
use crate::parser;
use crate::backends::htmls::render_htmlbook;
use pyo3::{exceptions::PyRuntimeError, prelude::*};

/// parses a string using the specified backend
#[pyfunction]
fn parse_to_html(adoc_str: &str) -> PyResult<String> {
    let graph = parser::Parser::new(PathBuf::from("-")).parse(scanner::Scanner::new(adoc_str));
    match render_htmlbook(&graph) {
        Ok(html) => Ok(html),
        Err(_) => Err(PyRuntimeError::new_err("Error converting asciidoc string")),
    }
}

#[pymodule]
fn asciidocr(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(parse_to_html, m)?)
}

Build a wheel, install it, and then from within Python:

$ python
Python 3.13.1 (main, Jan  7 2025, 10:41:20) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import asciidocr
>>> asciidoc = "This is _pretty freakin' cool_, right?!"
>>> html = asciidocr.parse_to_html(asciidoc)
>>> print(html)
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title></title>
</head>

<body>

    <p>This is <em>pretty freakin' cool</em>, right?!</p>
</body>

</html>

So that’s nice, and potentially useful. As a friend pointed out recently, I need to get this up on PyPI, but, you know, in time…

Loose Ends

So there’s writing an asciidoc parser in Rust, in a pretty high-level way (I could in theory go back and add more detail, but this post is far, far too long). And there are plenty of loose ends so far as the project itself goes, like:

  • Actually covering the entirety of the asciidoc language

  • Allowing users to supply stylesheets for HTML builds via the CLI (and I never talked above about the CLI, did I? Or the packaging process? Maybe separate posts; anyway I used clap).

  • Creating an Asciidoctor-compliant HTML backend, because that means that folks can use this more as a "drop-in replacement" if they want

  • Finishing the docx build

  • …other future dream-big builds that I don’t want to talk about yet (OK: PDFs, I’m talking about PDFs).

  • And much, much more!

In any case.

As with all newer skills, the biggest benefit to my Rust knowledge was just having to write an ass-ton of Rust. I also think I learned something about design patterns, about balance (i.e., maybe it would have been more "pure" to keep some things in the Parser, but it was so much easier to just make the Scanner a little bit smarter sometimes), and about writing software more generally. I like Rust, in part, because it makes you really consider what the "right" thing to do is (okay: I really like it mostly because the tooling is so damn good), and this in turn makes me think about writing all code different (apologies to my coworkers, who now have to put up with me importing Rust-y patterns into Python — I promise I’ll only do it when it makes sense!).

Mostly, though: I’m just happy I now have a tool that does more or less what I want it to do, and quickly (not to brag, but compare some very non-scientific testing that has asciidocr converting a file to HTML in 0.01s user, whereas asciidoctor takes a whole 0.32s user. It’s an admittedly small but noticeable difference, especially for larger documents). So in that sense mission achieved. Yay.


Links: