Guide: Making parsers fast
Section titled “Guide: Making parsers fast”Techniques for writing high-performance parsers with Parseff.
This guide covers practical techniques for getting the most performance out of Parseff. Each section shows a few non-recommended slow patterns and a fast alternative with an approximate expected impact.
Quick wins
Section titled “Quick wins”Pre-compile regexes
Section titled “Pre-compile regexes”Compile regexes once at module level, not inside parser functions. Regex compilation is expensive (dozens of allocations and automaton construction per call) so hoisting it out gives roughly a 100x improvement:
let number () = (* compiles on every call *) let re = Re.compile (Re.Posix.re "[0-9]+") in Parseff.match_regex re
let number_re = Re.compile (Re.Posix.re "[0-9]+")let number () = Parseff.match_regex number_reUse take_while instead of regex
Section titled “Use take_while instead of regex”For simple character classes, Parseff.take_while runs a tight loop with no regex overhead, giving a 5-10x improvement. Use regex only when you need its pattern-matching power (alternation, repetition, grouping). For “all characters matching a predicate,” take_while is the right tool:
let digits_with_regex () = Parseff.match_regex (Re.compile (Re.Posix.re "[0-9]+"))let digits_with_take_while1 () = Parseff.take_while1 (fun c -> c >= '0' && c <= '9') ~label:"digit"Use skip_whitespace instead of whitespace
Section titled “Use skip_whitespace instead of whitespace”When you just need to move past whitespace, Parseff.skip_whitespace avoids allocating a string you’d immediately discard. The per-call saving is small, but it adds up in tight loops that skip whitespace between every token:
(* allocates a string you immediately discard *)let _ = Parseff.whitespace () inparse_value ()
(* skips without allocation *)Parseff.skip_whitespace ();parse_value ()Use fused operations
Section titled “Use fused operations”Fewer round-trips between your parser and the effect handler means less work overall. Fused operations combine several steps into one, yielding a 2-4x improvement in hot paths by doing in a single call what would otherwise require multiple:
(* 4 effect dispatches *)Parseff.skip_whitespace ();let _ = Parseff.char ',' inParseff.skip_whitespace ();let value = Parseff.take_while1 is_digit ~label:"digit" inignore value
(* 1 effect dispatch *)let value = Parseff.fused_sep_take is_whitespace ',' is_digit inignore valueAvailable fused operations:
Operation Replacesskip_while_then_char f c skip_while f; char cfused_sep_take ws sep pred skip_whitespace; char sep; skip_whitespace; take_while1 predsep_by_take ws sep pred sep_by (take_while pred) (skip_ws; char sep; skip_ws)sep_by_take_span ws sep pred Same, but returns spans instead of stringsUse zero-copy spans
Section titled “Use zero-copy spans”Avoid intermediate string allocations when you can work with slices. Spans point into the original input buffer with no allocation until you call Parseff.span_to_string, giving a 2-3x speed improvement and roughly 3x less memory allocation:
(* allocates a string per element *)let parse_values () = Parseff.sep_by (fun () -> int_of_string (Parseff.take_while1 is_digit ~label:"digit")) (fun () -> Parseff.char ',') ()
(* zero-copy: spans point into the original buffer *)let parse_values () = let spans = Parseff.sep_by_take_span is_whitespace ',' is_digit in List.map int_of_span spansSee the Zero-copy API reference for more details.
Advanced techniques
Section titled “Advanced techniques”Minimize or_ in loops
Section titled “Minimize or_ in loops”Each Parseff.or_ sets up a backtracking checkpoint, which is extra work the runtime has to do. In a Parseff.many loop that runs thousands of times, doing less per iteration makes a real difference:
let keyword () = (* installs a handler per alternation, per iteration *) Parseff.or_ (fun () -> Parseff.consume "class") (fun () -> Parseff.or_ (fun () -> Parseff.consume "function") (fun () -> Parseff.consume "let") ()) ()
let keyword () = (* parse then validate, no backtracking *) let w = Parseff.take_while1 (fun c -> c >= 'a' && c <= 'z') ~label:"keyword" in match w with | "class" | "function" | "let" -> w | _ -> Parseff.fail "expected keyword"When or_ is fine:
- Top-level choices (not inside tight loops)
- Complex parsers where each branch has distinct structure
- When the number of alternatives is small
Push work into handler-level operations
Section titled “Push work into handler-level operations”The less your parser has to bounce back and forth with the effect handler, the faster it runs. Handler-level operations do the entire scan in one go instead of making repeated round-trips:
let parse_list () = (* using combinators *) Parseff.sep_by (fun () -> int_of_string (Parseff.take_while1 is_digit ~label:"digit")) (fun () -> Parseff.char ',') ()
let parse_list () = (* runs the entire scan in one operation *) Parseff.sep_by_take is_whitespace ',' is_digit |> List.map int_of_string