Guide: Your first parser
Section titled “Guide: Your first parser”Build a config file parser from scratch, learning Parseff along the way.
Why parser combinators?
Section titled “Why parser combinators?”Parser combinators let you write small parsers that each handle one thing, then compose them. Instead of a single regex that’s hard to read, test, and extend, you get modular pieces with clear error messages.
There are three ways to compose parsers:
- Sequence: parse A, then parse B (
letbindings) - Choice: try A, if it fails try B (
Parseff.or_) - Repetition: parse A zero or more times (
Parseff.many)
This tutorial builds a key-value config file parser from scratch, adding one feature at a time.
The format
Section titled “The format”# Server configurationhost = localhostport = 8080tags = web,api,v2debug = trueLines starting with # are comments. Blank lines are skipped. Everything else is key = value.
Step 1: parsing a key-value pair
Section titled “Step 1: parsing a key-value pair”Each parser reads input by calling combinators like Parseff.take_while1 and Parseff.char, and returns a value.
let key () = Parseff.take_while1 (fun c -> (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c = '_') ~label:"key"
let raw_value () = Parseff.take_while1 (fun c -> c <> '\n') ~label:"value"
let entry () = let k = key () in Parseff.skip_while (fun c -> c = ' ' || c = '\t'); let _ = Parseff.char '=' in Parseff.skip_while (fun c -> c = ' ' || c = '\t'); let v = raw_value () in (k, v)take_while1 scans characters while the predicate holds and requires at least one match. The ~label appears in error messages if nothing matches. char '=' matches a single character. skip_while advances past whitespace without allocating a string.
Sequencing is just let bindings. Each line advances the cursor through the input.
To run it:
match Parseff.parse "host = localhost" entry with| Ok (k, v) -> Printf.printf "%s -> %s\n" k v (* "host" -> "localhost" *)| Error { pos; error = `Expected msg } -> Printf.printf "Error at %d: %s\n" pos msg| Error _ -> print_endline "Parse error"Parseff.parse returns Ok value on success, or Error { pos; error } on failure.
Step 2: comments and blank lines
Section titled “Step 2: comments and blank lines”A comment line starts with #. A line can be a comment, an entry, or blank. We need alternation: try one option, and if it fails, try the next. Parseff.or_ does this for two alternatives:
let comment () = let _ = Parseff.char '#' in let _ = Parseff.take_while (fun c -> c <> '\n') in ()or_ tries the left parser. If it fails, it backtracks (resets the cursor to where it was) and tries the right. No input is consumed on failure. For more than two alternatives, Parseff.one_of takes a list:
let line () = Parseff.skip_while (fun c -> c = ' ' || c = '\t'); Parseff.one_of [ (fun () -> comment (); None); (fun () -> Some (entry ())); (fun () -> None); (* blank line: always succeeds *) ] ()one_of tries each parser in order until one succeeds. Here: try a comment, then try an entry, then fall through to None for blank lines. The last branch always succeeds, so line never fails.
take_while (without the 1) can match zero characters. It always succeeds.
Step 3: the whole file
Section titled “Step 3: the whole file”Parseff.sep_by parses repeated elements with a separator between them:
let config () = let lines = Parseff.sep_by line (fun () -> Parseff.char '\n') () in Parseff.end_of_input (); List.filter_map Fun.id linesend_of_input ensures there’s no trailing data. Without it, "host = localhost\ngarbage" could partially succeed.
Running the full parser:
let input = {|# Server configurationhost = localhostport = 8080tags = web,api,v2debug = true|}
let () = match Parseff.parse input config with | Ok entries -> List.iter (fun (k, v) -> Printf.printf "%s = %s\n" k v ) entries | Error { pos; error = `Expected msg } -> Printf.printf "Error at %d: %s\n" pos msg | Error _ -> print_endline "Parse error"Output:
host = localhostport = 8080tags = web,api,v2debug = trueStep 4: typed values
Section titled “Step 4: typed values”Right now all values are strings. We can parse them into typed data with alternation. We already saw one_of in step 2 for choosing between comment/entry/blank. Here we use both or_ (for the two-way true/false choice) and one_of (for the three-way type choice):
type config_value = | Bool of bool | Int of int | Tags of string list | Str of string
let bool_value () = Parseff.or_ (fun () -> let _ = Parseff.consume "true" in Bool true) (fun () -> let _ = Parseff.consume "false" in Bool false) ()
let int_value () = let s = Parseff.take_while1 (fun c -> c >= '0' && c <= '9') ~label:"integer" in Int (int_of_string s)
let tag_list () = let tags = Parseff.sep_by (fun () -> Parseff.take_while1 (fun c -> c <> ',' && c <> '\n') ~label:"tag") (fun () -> Parseff.char ',') () in Tags tags
let typed_value () = Parseff.one_of [ (fun () -> bool_value ()); (fun () -> int_value ()); (fun () -> tag_list ()); ] ()consume matches a literal string (for multi-character matches like "true"). or_ is shorthand when you have exactly two alternatives; one_of takes a list for three or more. Order matters: bool_value must come before tag_list, otherwise "true" would match as Tags ["true"].
Now swap raw_value for typed_value in the entry parser:
let typed_entry () = let k = key () in Parseff.skip_while (fun c -> c = ' ' || c = '\t'); let _ = Parseff.char '=' in Parseff.skip_while (fun c -> c = ' ' || c = '\t'); let v = typed_value () in (k, v)Step 5: validation with typed errors
Section titled “Step 5: validation with typed errors”Suppose ports must be 0-65535. Use Parseff.error with a polymorphic variant to report a structured error:
let port_value () = let s = Parseff.take_while1 (fun c -> c >= '0' && c <= '9') ~label:"digit" in let n = int_of_string s in if n >= 0 && n <= 65535 then Int n else Parseff.error (`Port_out_of_range n)The error flows through to the result type:
match Parseff.parse "port = 99999" typed_entry with| Ok _ -> ()| Error { error = `Port_out_of_range n; _ } -> Printf.printf "%d is not a valid port\n" n| Error { error = `Expected msg; _ } -> Printf.printf "Parse error: %s\n" msgParseff.error raises a typed error value. Parseff.fail raises a string message (wrapped as `Expected). Use error when callers need to distinguish different failure modes; use fail for simple messages shown directly to users.