Guide: A JSON parser
Section titled “Guide: A JSON parser”Build a complete recursive-descent JSON parser with depth limiting.
This walkthrough builds a full JSON parser that handles all JSON value types: null, booleans, numbers, strings, arrays, and objects. It demonstrates recursive descent parsing, mutual recursion between parsers, and depth limiting with rec_.
Source: test/test_json.ml
What we’re building
Section titled “What we’re building”A parser that takes any valid JSON string and returns a structured OCaml value. Along the way, we’ll handle:
- Six different value types with alternation
- Recursive nesting (arrays inside objects inside arrays…)
- Whitespace between tokens
- Depth limiting to prevent stack overflows on malicious input
The data type
Section titled “The data type”First, we define what a parsed JSON value looks like:
type json = | Null | Bool of bool | Number of float | String of string | Array of json list | Object of (string * json) listEvery JSON value maps to one of these constructors. This is the return type of our parser.
Primitive value parsers
Section titled “Primitive value parsers”Each JSON primitive type gets its own parser. These are the leaf nodes. They don’t call other parsers recursively.
Null and booleans
Section titled “Null and booleans”let null_parser () = let _ = Parseff.consume "null" in Null
let bool_parser () = Parseff.or_ (fun () -> let _ = Parseff.consume "true" in Bool true) (fun () -> let _ = Parseff.consume "false" in Bool false) ()null_parser matches the literal "null" and returns the Null constructor. bool_parser uses or_ to try "true" first, backtracking to "false" if that fails.
Numbers
Section titled “Numbers”let number_re = Re.compile (Re.Posix.re "-?[0-9]+(\\.[0-9]+)?")
let number_parser () = let s = Parseff.match_regex number_re in Number (float_of_string s)We match the number as a string with a regex, then convert it. The regex handles integers, decimals, and negative numbers.
Strings
Section titled “Strings”let string_content_re = Re.compile (Re.Posix.re "[^\"]*")
let string_parser () = let _ = Parseff.char '"' in let s = Parseff.match_regex string_content_re in let _ = Parseff.char '"' in String sMatch an opening quote, capture everything that isn’t a quote, then match the closing quote. This simplified version doesn’t handle escape sequences (\", \\, etc.). A production parser would need more work here.
The recursive entry point
Section titled “The recursive entry point”This is where it gets interesting. A JSON value can be any of the six types, and arrays/objects contain more JSON values:
let rec json () = Parseff.rec_ (fun () -> Parseff.skip_whitespace (); Parseff.one_of [ array_parser; object_parser; null_parser; bool_parser; number_parser; string_parser; ] () )Three things to notice:
rec_wraps the body. This registers a recursion entry point for depth tracking. Every timejsoncalls itself (through arrays or objects), the depth counter increments. When it exceedsmax_depth, parsing fails cleanly instead of overflowing the stack.one_oftries each parser in order. Array and object parsers come first because they start with distinctive characters ([and{). If those fail, we fall through to the simpler alternatives.- Whitespace is consumed before the value. This means individual parsers don’t need to worry about leading whitespace.
Arrays
Section titled “Arrays”and array_parser () = let _ = Parseff.char '[' in Parseff.skip_whitespace (); let elements = Parseff.or_ (fun () -> let first = json () in let rest = Parseff.many (fun () -> Parseff.skip_whitespace (); let _ = Parseff.char ',' in json ()) () in first :: rest) (fun () -> []) () in Parseff.skip_whitespace (); let _ = Parseff.char ']' in Array elementsAfter the opening [, we use or_ to handle two cases:
- Non-empty array: Parse the first element, then
manyparses zero or more, elementpairs. This pattern avoids a trailing comma problem, since the separator always comes before an element (except the first). - Empty array: The left branch fails (no element to parse), so we backtrack and return
[].
Notice json () is called recursively here. This is where the depth tracking from rec_ matters. Without it, deeply nested arrays like [[[[[...]]]]] would blow the stack.
Objects
Section titled “Objects”and object_parser () = let _ = Parseff.char '{' in Parseff.skip_whitespace (); let pairs = Parseff.or_ (fun () -> let first = key_value () in let rest = Parseff.many (fun () -> Parseff.skip_whitespace (); let _ = Parseff.char ',' in key_value ()) () in first :: rest) (fun () -> []) () in Parseff.skip_whitespace (); let _ = Parseff.char '}' in Object pairsThe structure mirrors the array parser exactly. The only difference is that elements are key-value pairs instead of bare values.
Key-value pairs
Section titled “Key-value pairs”and key_value () = let _ = Parseff.char '"' in let key = Parseff.match_regex string_content_re in let _ = Parseff.char '"' in Parseff.skip_whitespace (); let _ = Parseff.char ':' in Parseff.skip_whitespace (); let value = json () in (key, value)Parse a string key (reusing the same regex as string_parser), a colon separator, and then recursively parse the value.
Running the parser
Section titled “Running the parser”let () = let input = {|{"name": "parseff", "version": 1, "tags": ["parser", "ocaml"]}|} in match Parseff.parse input json with | Ok result -> (* ... process the JSON value ... *) | Error { pos; error = `Expected msg } -> Printf.printf "Error at %d: %s\n" pos msg | Error { pos; error = `Unexpected_end_of_input } -> Printf.printf "Unexpected end of input at %d\n" pos | Error { pos; error = `Depth_limit_exceeded msg } -> Printf.printf "Depth limit exceeded at %d: %s\n" pos msg | Error _ -> print_endline "Parse error"Depth limiting
Section titled “Depth limiting”Without rec_, a malicious input with 10,000 nested arrays would crash your program with a stack overflow. With it, you can set a limit:
Parseff.parse ~max_depth:64 input jsonWhen the limit is exceeded, parsing fails with "maximum nesting depth 64 exceeded", giving a clean error instead of a crash.
Here’s what happens with deeply nested input:
(* 50 levels deep: within the default limit of 128 *)let input = String.make 50 '[' ^ String.make 50 ']' inParseff.parse input json_array(* Ok (Array (Array (Array ...))) *)
(* 256 levels deep: exceeds the limit *)let input = String.make 256 '[' ^ String.make 256 ']' inParseff.parse ~max_depth:128 input json(* Error { error = `Depth_limit_exceeded "maximum nesting depth 128 exceeded" } *)The mutual recursion pattern
Section titled “The mutual recursion pattern”This parser uses five mutually recursive functions:
json ──→ one_of ──→ array_parser ──→ json (recursive) ──→ object_parser ──→ key_value ──→ json (recursive) ──→ null_parser ──→ bool_parser ──→ number_parser ──→ string_parserIn OCaml, mutually recursive functions are defined with let rec ... and ...:
let rec json () = ...and array_parser () = ...and object_parser () = ...and key_value () = ...Only the top-level entry point (json) needs rec_. The other functions participate in the recursion, but json is where a new nesting level begins.