Skip to content

This walkthrough builds an arithmetic expression parser that handles expressions like 1+2*3 or (1+2)*3 and produces a structured tree that respects operator precedence.

Along the way, we’ll cover the standard recursive-descent technique for precedence, expect for clear error messages, and the chainl1 combinator as a shortcut for left-associative operator chains.

Source: examples/better_errors.ml

A parser that handles expressions like:

  • 1+2*3 parses as 1 + (2 * 3) (multiplication binds tighter)
  • (1+2)*3 parses as (1 + 2) * 3 (parentheses override precedence)
  • 1+2+3 parses as (1 + 2) + 3 (left-associative)

The result is an AST (abstract syntax tree), not a computed value.

type expr =
| Num of int
| Add of expr * expr
| Mul of expr * expr

Num 3 is a literal number. Add (Num 1, Num 2) is 1 + 2. Mul (Add (Num 1, Num 2), Num 3) is (1 + 2) * 3.

The standard technique for operator precedence in recursive descent is to split the grammar into levels, one per precedence tier. Lower-precedence operators are parsed at higher levels (closer to the entry point), and higher-precedence operators at lower levels (closer to the leaves).

For our two operators:

expr = term (('+' term)*) ← lowest precedence: addition
term = factor (('*' factor)*) ← higher precedence: multiplication
factor = number | '(' expr ')' ← atomic values and grouping

Each level calls the next one for its operands. This structure guarantees that * binds tighter than + without any explicit precedence tables.

(* Addition level: lowest precedence *)
let rec expr () =
let left = term () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '+' operator" (fun () -> Parseff.char '+') in
Parseff.skip_whitespace ();
term ())
()
in
List.fold_left (fun acc t -> Add (acc, t)) left rest
(* Multiplication level: higher precedence *)
and term () =
let left = factor () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '*' operator" (fun () -> Parseff.char '*') in
Parseff.skip_whitespace ();
factor ())
()
in
List.fold_left (fun acc f -> Mul (acc, f)) left rest
(* Atoms and parenthesized groups *)
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.expect "an opening parenthesis" (fun () -> Parseff.char '(') in
Parseff.skip_whitespace ();
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.expect "a closing parenthesis" (fun () -> Parseff.char ')') in
e)
(fun () ->
let d = Parseff.expect "a number" Parseff.digit in
Num d)
()

Each precedence level calls the next one for its operands. expr parses term (('+' term)*), term parses factor (('*' factor)*), and factor handles numbers and parenthesized sub-expressions. Because term is called from within expr, multiplication binds tighter than addition.

many collects zero or more operator-operand pairs, and fold_left builds left-associative AST nodes: 1+2+3 becomes Add(Add(Num 1, Num 2), Num 3). or_ in factor tries the parenthesized path first; if the input doesn’t start with (, it backtracks and tries a digit.

Without expect, a failure on the closing parenthesis would say something like expected ')'. With expect, it says expected a closing parenthesis. This matters when users see the error.

Compare the error messages on "1+":

  • Without expect: expected digit
  • With expect: expected a number

And on "(1+2":

  • Without expect: expected ')'
  • With expect: expected a closing parenthesis

Let’s trace 1+2*3 through the parser:

  1. expr() calls term()
  2. term() calls factor(), which matches 1Num 1
  3. term() tries many(* factor), but no * follows, so rest = []
  4. term() returns Num 1 to expr()
  5. expr() tries many(+ term) and sees +
  6. Consumes +, calls term()
  7. term() calls factor(), which matches 2Num 2
  8. term() tries many(* factor) and sees *
  9. Consumes *, calls factor(), which matches 3Num 3
  10. term() builds Mul(Num 2, Num 3) via fold_left
  11. expr() builds Add(Num 1, Mul(Num 2, Num 3)) via fold_left

The key insight: * is consumed inside term, so by the time expr sees the result, 2*3 is already a single Mul node. That’s how precedence works. Each level “claims” its operators before the level above sees them.

The many + fold_left pattern is so common that Parseff provides chainl1 as a shortcut:

let rec expr () =
Parseff.chainl1
term
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '+' in
Parseff.skip_whitespace ();
fun a b -> Add (a, b))
()
and term () =
Parseff.chainl1
factor
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '*' in
Parseff.skip_whitespace ();
fun a b -> Mul (a, b))
()
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.char '(' in
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.char ')' in
e)
(fun () -> Num (Parseff.digit ()))
()

chainl1 element op parses one or more elements separated by op, combining them left-to-right. The op parser returns the combining function. This is more concise and expresses the intent directly.

For right-associative operators (like exponentiation), use chainr1 instead.

For debugging, a simple to_string function:

let rec expr_to_string = function
| Num n -> string_of_int n
| Add (l, r) ->
Printf.sprintf "(%s + %s)" (expr_to_string l) (expr_to_string r)
| Mul (l, r) ->
Printf.sprintf "(%s * %s)" (expr_to_string l) (expr_to_string r)
(* "1+2*3" -> "(1 + (2 * 3))" *)
(* "(1+2)*3" -> "((1 + 2) * 3)" *)