yaml12: YAML 1.2 for R and Python

  python, r, rust, yaml

  Tomasz Kalinowski

Today we’re announcing two new packages for parsing and emitting YAML 1.2: yaml12 for R and py-yaml12 for Python.

Both packages are implemented in Rust and built on the excellent saphyr crate. They share the same design goals: predictable YAML 1.2 typing, explicit control over tag interpretation via handlers, and clean round-tripping of unhandled tags.

Before we get into the details, a quick note on how this relates to the existing R yaml package. The R yaml package is now in r-lib, and we’ve taken over maintenance after years of stewardship by its original author, Jeremy Stephens, and later by Shawn Garbett.

If yaml already works for you, there’s no need to switch. yaml12 is an experiment providing consistent R and Python bindings to a new Rust library specifically for YAML 1.2, which, as we’ll see below, has some particular advantages.

Install

Install the R package from CRAN:

Install the Python package from PyPI:

pip install py-yaml12

Quick start (R)

library(yaml12)

yaml <- "
title: A modern YAML parser and emitter written in Rust
properties: [fast, correct, safe, simple]
"

doc <- parse_yaml(yaml)
str(doc)
#> List of 2
#>  $ title     : chr "A modern YAML parser and emitter written in Rust"
#>  $ properties: chr [1:4] "fast" "correct" "safe" "simple"

Round-trip back to YAML:

obj <- list(
  seq = 1:2,
  map = list(key = "value"),
  tagged = structure("1 + 1", yaml_tag = "!expr")
)
write_yaml(obj)
#> ---
#> seq:
#>   - 1
#>   - 2
#> map:
#>   key: value
#> tagged: !expr 1 + 1
#> ...

identical(obj, parse_yaml(format_yaml(obj)))
#> [1] TRUE

Quick start (Python)

# Install from PyPI:
#   python -m pip install py-yaml12
from yaml12 import parse_yaml, format_yaml, Yaml

yaml_text = """
title: A modern YAML parser and emitter written in Rust
properties: [fast, correct, safe, simple]
"""

doc = parse_yaml(yaml_text)

assert doc == {
  "title": "A modern YAML parser and emitter written in Rust",
  "properties": ["fast", "correct", "safe", "simple"]
}

assert doc == parse_yaml(format_yaml(doc))

# Tagged values
tagged = parse_yaml("!expr 1 + 1")
assert tagged == Yaml(value="1 + 1", tag="!expr")

Why YAML 1.2?

YAML 1.2 tightened up a number of ambiguous implicit conversions. In particular, plain scalars like on/off/yes/no/y/n are strings in the 1.2 core schema, and YAML 1.2 removed sexagesimal (base-60) parsing, so values like 1:2 are not treated as numbers.

YAML 1.2 also removed !!timestamp, !!binary, and !!omap from the set of core types, which further reduces implicit coercions (for example, getting a date/time object when you expected a string). If you want to interpret those values, you can do so explicitly via tags and handlers.

That makes YAML a better default for configuration files, front matter, and data interchange. You get fewer surprises and fewer “why did this become a boolean?” moments (or “why did this become a date?").

Highlights

A consistent API in R and Python

The two packages intentionally share the same high-level functions:

Tags and handlers (opt-in, meaning, safe defaults)

In YAML, tags are explicit annotations like !expr or !!timestamp that attach type and meaning to a value.

Tags are preserved by default:

  • In R, tags are kept in a yaml_tag attribute.
  • In Python, tags are kept by wrapping values in a Yaml() object.

Handlers let you opt into custom behavior for tags (including tags on mapping keys) while keeping parsing as a data-only operation by default.

If you used R yaml's !expr tag to evaluate expressions, you can recreate that behavior by registering a handler, but it’s only recommended when parsing trusted YAML, since evaluating arbitrary code is a security risk. For untrusted input, the default behavior is safer because it keeps !expr as data and does not execute code.

R example:

# by default, tags are kept as data
dput(parse_yaml("!expr 1 + 1"))
#> structure("1 + 1", yaml_tag = "!expr")

# Add a handler to process tagged nodes (like the {yaml} package does)
handlers <- list("!expr" = \(x) eval(str2expression(x), globalenv()))
parse_yaml("!expr 1 + 1", handlers = handlers)
#> [1] 2

Python example:

from yaml12 import parse_yaml

handlers = {"!expr": eval}  # use with trusted input only
parse_yaml("!expr 1 + 1", handlers=handlers)

#> 2

Simplification and missing values (R)

In R, parse_yaml() can simplify homogeneous sequences to vectors. When it does, YAML null becomes the appropriate NA type:

parse_yaml("[1, 2, 3, null]")
#> [1]  1  2  3 NA

str(parse_yaml("[1, 2, 3, null]", simplify = FALSE))
#> List of 4
#>  $ : int 1
#>  $ : int 2
#>  $ : int 3
#>  $ : NULL

Non-string mapping keys

YAML allows mapping keys that aren’t plain strings (numbers, booleans, tagged scalars, even sequences and mappings). Both packages preserve these safely:

  • In R, you’ll get a regular named list plus a yaml_keys attribute when needed.
  • In Python, unhashable keys (like lists/dicts) are wrapped in Yaml so they can still be used as dict keys and round-trip correctly.

R example:

dput(parse_yaml("{a: b}: c"))
#> structure(list("c"), names = "", yaml_keys = list(list(a = "b")))

Python example:

from yaml12 import parse_yaml, Yaml

doc = parse_yaml("{a: b}: c")
assert doc == {Yaml({'a': 'b'}): 'c'}

Mapping order is preserved

YAML mappings are ordered. yaml12 preserves mapping/dictionary order when parsing and formatting, so the order you see in a YAML file (or emit) round-trips in both R and Python.

Document streams and front matter

Both packages support multi-document YAML streams with multi = TRUE. When multi = FALSE (the default), parsing stops after the first document, which is handy for extracting YAML front matter from text that continues with non-YAML content.

Example:

yaml <- "
---
title: Extracting YAML front matter
---
This is technically now the second document in a YAML stream
"
str(parse_yaml(yaml))
#> List of 1
#>  $ title: chr "Extracting YAML front matter"
str(parse_yaml(yaml, multi = TRUE))
#> List of 2
#>  $ :List of 1
#>   ..$ title: chr "Extracting YAML front matter"
#>  $ : chr "This is technically now the second document in a YAML stream"

Performance and safety notes

yaml12 is implemented in Rust and written with performance and safety in mind. It avoids unnecessary allocations, copies, and extra traversals where possible. In Python, py-yaml12 (imported as yaml12) also releases the GIL for large parses and serializations.

In typical usage, the R package yaml12 is ~2× faster than the yaml package, and the Python package py-yaml12 is ≥50× faster than default PyYAML in the benchmarks ( R benchmarks; Python benchmarks).

Tags are preserved by default, and interpreting them (including any kind of evaluation) is always an explicit opt-in via handlers. Plain scalars follow the YAML 1.2 core schema rules for predictable typing.

In Python, py-yaml12 ships prebuilt wheels for common platforms. If you do need to build from source, you’ll need a Rust toolchain. In R, yaml12 is available from CRAN (including binaries on common platforms).

Wrapping up

If you work with YAML as a data format for configuration, front matter, or data interchange, we hope yaml12 (R) and py-yaml12 (Python) help you parse and emit YAML 1.2 predictably. If you run into YAML that doesn’t behave as expected, we’d love to hear about it in the issue trackers: r-yaml12 and py-yaml12.

Learn more

Acknowledgements

Both packages build on the fantastic work in the YAML ecosystem, especially the saphyr Rust crate and the yaml-test-suite.