Guide: Making parsers fast
Section titled “Guide: Making parsers fast”Techniques for writing high-performance parsers with Parseff.
This guide covers practical techniques for getting the most performance out of Parseff. Each section shows a few non-recommended slow patterns and a fast alternative with an approximate expected impact.
Quick wins
Section titled “Quick wins”Pre-compile regexes
Section titled “Pre-compile regexes”Compile regexes once at module level, not inside parser functions. Regex compilation is expensive (dozens of allocations and automaton construction per call) so hoisting it out gives roughly a 100x improvement:
let number () = (* compiles on every call *) let re = Re.compile (Re.Posix.re "[0-9]+") in Parseff.match_regex re
let number_re = Re.compile (Re.Posix.re "[0-9]+")let number () = Parseff.match_regex number_reUse take_while instead of regex
Section titled “Use take_while instead of regex”For simple character classes, Parseff.take_while runs a tight loop with no regex overhead, giving a 5-10x improvement. Use regex only when you need its pattern-matching power (alternation, repetition, grouping). For “all characters matching a predicate,” take_while is the right tool:
let digits_with_regex () = Parseff.match_regex (Re.compile (Re.Posix.re "[0-9]+"))let digits_with_take_while () = Parseff.take_while ~at_least:1 (fun c -> c >= '0' && c <= '9') ~label:"digit"Use skip_whitespace instead of whitespace
Section titled “Use skip_whitespace instead of whitespace”When you just need to move past whitespace, Parseff.skip_whitespace avoids allocating a string you’d immediately discard. The per-call saving is small, but it adds up in tight loops that skip whitespace between every token:
(* allocates a string you immediately discard *)let _ = Parseff.whitespace () inparse_value ()
(* skips without allocation *)Parseff.skip_whitespace ();parse_value ()Use fused operations
Section titled “Use fused operations”Fewer parser round-trips means less work overall. Fused operations combine several steps into one, yielding a 2-4x improvement in hot paths by doing in a single call what would otherwise require multiple:
(* 4 parser steps *)Parseff.skip_whitespace ();let _ = Parseff.char ',' inParseff.skip_whitespace ();let value = Parseff.take_while ~at_least:1 is_digit ~label:"digit" inignore value
(* 1 fused step *)let value = Parseff.fused_sep_take is_whitespace ',' is_digit inignore valueAvailable fused operations:
Operation Replacesskip_while_then_char f c skip_while f; char cfused_sep_take ws sep pred skip_whitespace; char sep; skip_whitespace; take_while ~at_least:1 predsep_by_take ws sep pred sep_by (take_while pred) (skip_ws; char sep; skip_ws)sep_by_take_span ws sep pred Same, but returns spans instead of stringsUse zero-copy spans
Section titled “Use zero-copy spans”Avoid intermediate string allocations when you can work with slices. Spans point into the original input buffer with no allocation until you call Parseff.span_to_string, giving a 2-3x speed improvement and roughly 3x less memory allocation:
(* allocates a string per element *)let parse_values () = Parseff.sep_by (fun () -> int_of_string (Parseff.take_while ~at_least:1 is_digit ~label:"digit")) (fun () -> Parseff.char ',') ()
(* zero-copy: spans point into the original buffer *)let parse_values () = let spans = Parseff.sep_by_take_span is_whitespace ',' is_digit in List.map int_of_span spansSee the Zero-copy API reference for more details.
Advanced techniques
Section titled “Advanced techniques”Minimize or_ in loops
Section titled “Minimize or_ in loops”Each Parseff.or_ sets up a backtracking checkpoint, which is extra work the runtime has to do. In a Parseff.many loop that runs thousands of times, doing less per iteration makes a real difference:
let keyword () = (* installs a checkpoint per alternation, per iteration *) Parseff.or_ (fun () -> Parseff.consume "class") (fun () -> Parseff.or_ (fun () -> Parseff.consume "function") (fun () -> Parseff.consume "let") ()) ()
let keyword () = (* parse then validate, no backtracking *) let w = Parseff.take_while ~at_least:1 (fun c -> c >= 'a' && c <= 'z') ~label:"keyword" in match w with | "class" | "function" | "let" -> w | _ -> Parseff.fail "expected keyword"When or_ is fine:
- Top-level choices (not inside tight loops)
- Complex parsers where each branch has distinct structure
- When the number of alternatives is small
Push work into bulk operations
Section titled “Push work into bulk operations”The less your parser has to bounce back and forth between tiny operations, the faster it runs. Bulk operations do the entire scan in one go instead of making repeated round-trips:
let parse_list () = (* using combinators *) Parseff.sep_by (fun () -> int_of_string (Parseff.take_while ~at_least:1 is_digit ~label:"digit")) (fun () -> Parseff.char ',') ()
let parse_list () = (* runs the entire scan in one operation *) Parseff.sep_by_take is_whitespace ',' is_digit |> List.map int_of_string