Advent of Code 2024 Day 2: Parsing Expression Grammars

December 2, 2024

I am fully committed to using Janet this year! At least I hope so. One of my goals is to get really familiar and fluent with using parsing expression grammars (PEG). I’ve always liked parser combinators, which seem to be more or less the same thing. One of the most basic and typical ways to parse input in Advent of Code is to split a string into a lines and split each line on spaces.

1 2 3
4 5 6

This an be achieved quite elegantly with a somewhat more recent addition to the PEG library, split. You give it two patterns, one for the separator and one for the actual contents, and it generally does the right thing. There is just one caveat. In the following snippet, there’s a space at the end of my input string. If you remove the asterisk * from the digit pattern, you get no matches. So split more or less requires you to make the content optional, unless you are absolutely certain that you won’t have a dangling separator at the end of your input.

  (peg/match ~(split :s ':d*) "1 2 3 ")

By making the second pattern optional, the end of the string, “3 “ can now be matched as two digits separated by a space. The second digit is simply empty. Sounds totally logical, I know. But you probably don’t want an empty capture. You can either remove it later through filter or you can call string/trimr on the input, which is my convention for AoC. This removes trailing whitespace, so you no longer need to make the second pattern optional, like this:

(peg/match ~(split :s ':d) (string/trimr "1 2 3 "))

Of course you’ll have to adjust your code for other separators (or filter it out later).

Anyway, back to the real world (of AoC). Here’s the same PEG with and without using split. I think it’s a huge improvement.

# before
(def input-peg
  (peg/compile
    ~{:main (* (some (* :line (? :s))) -1)
      :line (group (some (* :num (any " "))))
      :num (/ (<- :d+) ,scan-number)}))

# after
(def parser
  (peg/compile ~(split "\n" (group (split :s (any (number :d+)))))))