Skip to content

Build a validated IPv4 address parser step by step.

This walkthrough builds an IPv4 address parser from scratch. It covers digit matching, multi-character numbers, range validation, error reporting, and ensuring complete input consumption.

Source: test/test_ip.ml and examples/better_errors.ml

An IPv4 address looks like 192.168.1.1: four numbers from 0 to 255, separated by dots. Simple enough to parse by hand, but it’s a perfect first example because it combines several core concepts:

  • Parsing individual digits and assembling multi-digit numbers
  • Matching literal separators
  • Validating parsed values against domain rules
  • Reporting errors at the right position
(* Parse a single octet: one or more digits, validated to 0-255 *)
let number () =
let digits = Parseff.many1 Parseff.digit () in
let n = List.fold_left (fun acc d -> (acc * 10) + d) 0 digits in
if n >= 0 && n <= 255 then n
else Parseff.fail (Printf.sprintf "number out of range: %d" n)
(* Four octets separated by dots, consuming all input *)
let ip_address () =
let a = number () in
let _ = Parseff.char '.' in
let b = number () in
let _ = Parseff.char '.' in
let c = number () in
let _ = Parseff.char '.' in
let d = number () in
Parseff.end_of_input ();
(a, b, c, d)

Parseff.digit parses a single digit and returns its integer value (0-9). many1 applies it one or more times, producing a list like [1; 9; 2]. The fold_left converts that to 192. Then we validate: if the number isn’t in range, Parseff.fail reports it. This is the parse-then-validate pattern.

The ip_address parser reads like the format itself: number, dot, number, dot, number, dot, number. end_of_input ensures there’s no trailing data — without it, "1.2.3.4 extra" would silently succeed.

match Parseff.parse "192.168.1.1" ip_address with
| Ok (a, b, c, d) ->
Printf.printf "Parsed: %d.%d.%d.%d\n" a b c d
| Error { pos; error = `Expected msg } ->
Printf.printf "Error at %d: %s\n" pos msg
| Error _ -> print_endline "Parse error"

On valid input, you get an Ok with the four octets as a tuple. On invalid input, the error tells you exactly where parsing failed.

Let’s look at what happens with bad input:

Out of range: "1.2.3.256" parses 1, ., 2, ., 3, ., then reads 256. The number parser validates and fails with "number out of range: 256" at position 6 (the start of 256).

Incomplete: "1.2.3" parses 1, ., 2, ., 3, then tries to match '.' and fails because the input is exhausted.

Trailing data: "1.2.3.4 extra" parses all four octets, then end_of_input fails because there are characters remaining.

The basic parser’s error messages are functional but terse. expect lets you replace them with something more human-readable:

let digit_val () =
Parseff.expect "a digit (0-9)" Parseff.digit
let number_0_255 () =
let digits = Parseff.many1 digit_val () in
let n = List.fold_left (fun acc d -> (acc * 10) + d) 0 digits in
if n >= 0 && n <= 255 then n
else Parseff.fail (Printf.sprintf "number %d is out of range (must be 0-255)" n)
let ip_address () =
let a = number_0_255 () in
let _ = Parseff.expect "a dot separator" (fun () -> Parseff.char '.') in
let b = number_0_255 () in
let _ = Parseff.expect "a dot separator" (fun () -> Parseff.char '.') in
let c = number_0_255 () in
let _ = Parseff.expect "a dot separator" (fun () -> Parseff.char '.') in
let d = number_0_255 () in
Parseff.end_of_input ();
(a, b, c, d)

Now on "192.168.1.", instead of expected digit, the error says expected a digit (0-9). On "192.168.1", instead of expected '.', it says expected a dot separator.

For programmatic error handling, polymorphic variants give you structured errors you can pattern match on:

let number_0_255_typed () =
let digits = Parseff.many1 digit_val () in
let n = List.fold_left (fun acc d -> (acc * 10) + d) 0 digits in
if n >= 0 && n <= 255 then n
else Parseff.error (`Out_of_range n)

Now callers can distinguish parsing failures from validation failures:

match Parseff.parse "192.168.1.300" ip_address with
| Ok (a, b, c, d) ->
Printf.printf "%d.%d.%d.%d\n" a b c d
| Error { error = `Out_of_range n; _ } ->
Printf.printf "Octet %d is not in 0-255\n" n
| Error { error = `Expected msg; _ } ->
Printf.printf "Parse error: %s\n" msg
| Error _ -> print_endline "Unknown error"

fail is fine for errors shown directly to users. error is better when code needs to react differently to different failure modes.