Guide: Expressions with precedence
Section titled “Guide: Expressions with precedence”This walkthrough builds an arithmetic expression parser that handles expressions like 1+2*3 or (1+2)*3 and produces a structured tree that respects operator precedence.
Along the way, we’ll cover the standard recursive-descent technique for precedence, expect for clear error messages, and the chainl1 combinator as a shortcut for left-associative operator chains.
Source: examples/better_errors.ml
What we’re building
Section titled “What we’re building”A parser that handles expressions like:
1+2*3parses as1 + (2 * 3)(multiplication binds tighter)(1+2)*3parses as(1 + 2) * 3(parentheses override precedence)1+2+3parses as(1 + 2) + 3(left-associative)
The result is an AST (abstract syntax tree), not a computed value.
The AST type
Section titled “The AST type”type expr = | Num of int | Add of expr * expr | Mul of expr * exprNum 3 is a literal number. Add (Num 1, Num 2) is 1 + 2. Mul (Add (Num 1, Num 2), Num 3) is (1 + 2) * 3.
The precedence trick
Section titled “The precedence trick”The standard technique for operator precedence in recursive descent is to split the grammar into levels, one per precedence tier. Lower-precedence operators are parsed at higher levels (closer to the entry point), and higher-precedence operators at lower levels (closer to the leaves).
For our two operators:
expr = term (('+' term)*) ← lowest precedence: additionterm = factor (('*' factor)*) ← higher precedence: multiplicationfactor = number | '(' expr ')' ← atomic values and groupingEach level calls the next one for its operands. This structure guarantees that * binds tighter than + without any explicit precedence tables.
The parser
Section titled “The parser”(* Addition level: lowest precedence *)let rec expr () = let left = term () in let rest = Parseff.many (fun () -> Parseff.skip_whitespace (); let _ = Parseff.expect "a '+' operator" (fun () -> Parseff.char '+') in Parseff.skip_whitespace (); term ()) () in List.fold_left (fun acc t -> Add (acc, t)) left rest
(* Multiplication level: higher precedence *)and term () = let left = factor () in let rest = Parseff.many (fun () -> Parseff.skip_whitespace (); let _ = Parseff.expect "a '*' operator" (fun () -> Parseff.char '*') in Parseff.skip_whitespace (); factor ()) () in List.fold_left (fun acc f -> Mul (acc, f)) left rest
(* Atoms and parenthesized groups *)and factor () = Parseff.skip_whitespace (); Parseff.or_ (fun () -> let _ = Parseff.expect "an opening parenthesis" (fun () -> Parseff.char '(') in Parseff.skip_whitespace (); let e = expr () in Parseff.skip_whitespace (); let _ = Parseff.expect "a closing parenthesis" (fun () -> Parseff.char ')') in e) (fun () -> let d = Parseff.expect "a number" Parseff.digit in Num d) ()Each precedence level calls the next one for its operands. expr parses term (('+' term)*), term parses factor (('*' factor)*), and factor handles numbers and parenthesized sub-expressions. Because term is called from within expr, multiplication binds tighter than addition.
many collects zero or more operator-operand pairs, and fold_left builds left-associative AST nodes: 1+2+3 becomes Add(Add(Num 1, Num 2), Num 3). or_ in factor tries the parenthesized path first; if the input doesn’t start with (, it backtracks and tries a digit.
How expect improves errors
Section titled “How expect improves errors”Without expect, a failure on the closing parenthesis would say something like expected ')'. With expect, it says expected a closing parenthesis. This matters when users see the error.
Compare the error messages on "1+":
- Without
expect:expected digit - With
expect:expected a number
And on "(1+2":
- Without
expect:expected ')' - With
expect:expected a closing parenthesis
Tracing a parse
Section titled “Tracing a parse”Let’s trace 1+2*3 through the parser:
expr()callsterm()term()callsfactor(), which matches1→Num 1term()triesmany(* factor), but no*follows, sorest = []term()returnsNum 1toexpr()expr()triesmany(+ term)and sees+- Consumes
+, callsterm() term()callsfactor(), which matches2→Num 2term()triesmany(* factor)and sees*- Consumes
*, callsfactor(), which matches3→Num 3 term()buildsMul(Num 2, Num 3)viafold_leftexpr()buildsAdd(Num 1, Mul(Num 2, Num 3))viafold_left
The key insight: * is consumed inside term, so by the time expr sees the result, 2*3 is already a single Mul node. That’s how precedence works. Each level “claims” its operators before the level above sees them.
An alternative: using chainl1
Section titled “An alternative: using chainl1”The many + fold_left pattern is so common that Parseff provides chainl1 as a shortcut:
let rec expr () = Parseff.chainl1 term (fun () -> Parseff.skip_whitespace (); let _ = Parseff.char '+' in Parseff.skip_whitespace (); fun a b -> Add (a, b)) ()
and term () = Parseff.chainl1 factor (fun () -> Parseff.skip_whitespace (); let _ = Parseff.char '*' in Parseff.skip_whitespace (); fun a b -> Mul (a, b)) ()
and factor () = Parseff.skip_whitespace (); Parseff.or_ (fun () -> let _ = Parseff.char '(' in let e = expr () in Parseff.skip_whitespace (); let _ = Parseff.char ')' in e) (fun () -> Num (Parseff.digit ())) ()chainl1 element op parses one or more elements separated by op, combining them left-to-right. The op parser returns the combining function. This is more concise and expresses the intent directly.
For right-associative operators (like exponentiation), use chainr1 instead.
Printing the AST
Section titled “Printing the AST”For debugging, a simple to_string function:
let rec expr_to_string = function | Num n -> string_of_int n | Add (l, r) -> Printf.sprintf "(%s + %s)" (expr_to_string l) (expr_to_string r) | Mul (l, r) -> Printf.sprintf "(%s * %s)" (expr_to_string l) (expr_to_string r)(* "1+2*3" -> "(1 + (2 * 3))" *)(* "(1+2)*3" -> "((1 + 2) * 3)" *)