Today we’re announcing two new packages for parsing and emitting YAML 1.2:
yaml12 for R and
py-yaml12 for Python.
Both packages are implemented in Rust and built on the excellent
saphyr crate. They share the same design goals: predictable YAML 1.2 typing, explicit control over tag interpretation via handlers, and clean round-tripping of unhandled tags.
Before we get into the details, a quick note on how this relates to the existing R
yaml package. The R yaml package is now in
r-lib, and we’ve taken over maintenance after years of stewardship by its original author, Jeremy Stephens, and later by Shawn Garbett.
If yaml already works for you, there’s no need to switch. yaml12 is an experiment providing consistent R and Python bindings to a new Rust library specifically for YAML 1.2, which, as we’ll see below, has some particular advantages.
Install
Install the R package from CRAN:
install.packages("yaml12")Install the Python package from PyPI:
pip install py-yaml12
Quick start (R)
library(yaml12)
yaml <- "
title: A modern YAML parser and emitter written in Rust
properties: [fast, correct, safe, simple]
"
doc <- parse_yaml(yaml)
str(doc)
#> List of 2
#> $ title : chr "A modern YAML parser and emitter written in Rust"
#> $ properties: chr [1:4] "fast" "correct" "safe" "simple"
Round-trip back to YAML:
obj <- list(
seq = 1:2,
map = list(key = "value"),
tagged = structure("1 + 1", yaml_tag = "!expr")
)
write_yaml(obj)
#> ---
#> seq:
#> - 1
#> - 2
#> map:
#> key: value
#> tagged: !expr 1 + 1
#> ...
identical(obj, parse_yaml(format_yaml(obj)))
#> [1] TRUE
Quick start (Python)
# Install from PyPI:
# python -m pip install py-yaml12
from yaml12 import parse_yaml, format_yaml, Yaml
yaml_text = """
title: A modern YAML parser and emitter written in Rust
properties: [fast, correct, safe, simple]
"""
doc = parse_yaml(yaml_text)
assert doc == {
"title": "A modern YAML parser and emitter written in Rust",
"properties": ["fast", "correct", "safe", "simple"]
}
assert doc == parse_yaml(format_yaml(doc))
# Tagged values
tagged = parse_yaml("!expr 1 + 1")
assert tagged == Yaml(value="1 + 1", tag="!expr")
Why YAML 1.2?
YAML 1.2 tightened up a number of ambiguous implicit conversions. In particular, plain scalars like on/off/yes/no/y/n are strings in the 1.2 core schema, and YAML 1.2 removed sexagesimal (base-60) parsing, so values like 1:2 are not treated as numbers.
YAML 1.2 also removed !!timestamp, !!binary, and !!omap from the set of core types, which further reduces implicit coercions (for example, getting a date/time object when you expected a string). If you want to interpret those values, you can do so explicitly via tags and handlers.
That makes YAML a better default for configuration files, front matter, and data interchange. You get fewer surprises and fewer “why did this become a boolean?” moments (or “why did this become a date?").
Highlights
A consistent API in R and Python
The two packages intentionally share the same high-level functions:
parse_yaml(): Parse YAML from a stringread_yaml(): Read YAML from a fileformat_yaml(): Format values as YAML (to a string)write_yaml(): Write YAML to a file (or stdout)
Tags and handlers (opt-in, meaning, safe defaults)
In YAML, tags are explicit annotations like !expr or !!timestamp that attach type and meaning to a value.
Tags are preserved by default:
- In R, tags are kept in a
yaml_tagattribute. - In Python, tags are kept by wrapping values in a
Yaml()object.
Handlers let you opt into custom behavior for tags (including tags on mapping keys) while keeping parsing as a data-only operation by default.
If you used R yaml's !expr tag to evaluate expressions, you can recreate that behavior by registering a handler, but it’s only recommended when parsing trusted YAML, since evaluating arbitrary code is a security risk. For untrusted input, the default behavior is safer because it keeps !expr as data and does not execute code.
R example:
# by default, tags are kept as data
dput(parse_yaml("!expr 1 + 1"))
#> structure("1 + 1", yaml_tag = "!expr")
# Add a handler to process tagged nodes (like the {yaml} package does)
handlers <- list("!expr" = \(x) eval(str2expression(x), globalenv()))
parse_yaml("!expr 1 + 1", handlers = handlers)
#> [1] 2
Python example:
from yaml12 import parse_yaml
handlers = {"!expr": eval} # use with trusted input only
parse_yaml("!expr 1 + 1", handlers=handlers)
#> 2
Simplification and missing values (R)
In R,
parse_yaml() can simplify homogeneous sequences to vectors. When it does, YAML null becomes the appropriate NA type:
parse_yaml("[1, 2, 3, null]")
#> [1] 1 2 3 NA
str(parse_yaml("[1, 2, 3, null]", simplify = FALSE))
#> List of 4
#> $ : int 1
#> $ : int 2
#> $ : int 3
#> $ : NULL
Non-string mapping keys
YAML allows mapping keys that aren’t plain strings (numbers, booleans, tagged scalars, even sequences and mappings). Both packages preserve these safely:
- In R, you’ll get a regular named list plus a
yaml_keysattribute when needed. - In Python, unhashable keys (like lists/dicts) are wrapped in
Yamlso they can still be used asdictkeys and round-trip correctly.
R example:
dput(parse_yaml("{a: b}: c"))
#> structure(list("c"), names = "", yaml_keys = list(list(a = "b")))
Python example:
from yaml12 import parse_yaml, Yaml
doc = parse_yaml("{a: b}: c")
assert doc == {Yaml({'a': 'b'}): 'c'}
Mapping order is preserved
YAML mappings are ordered. yaml12 preserves mapping/dictionary order when parsing and formatting, so the order you see in a YAML file (or emit) round-trips in both R and Python.
Document streams and front matter
Both packages support multi-document YAML streams with multi = TRUE. When multi = FALSE (the default), parsing stops after the first document, which is handy for extracting YAML front matter from text that continues with non-YAML content.
Example:
yaml <- "
---
title: Extracting YAML front matter
---
This is technically now the second document in a YAML stream
"
str(parse_yaml(yaml))
#> List of 1
#> $ title: chr "Extracting YAML front matter"
str(parse_yaml(yaml, multi = TRUE))
#> List of 2
#> $ :List of 1
#> ..$ title: chr "Extracting YAML front matter"
#> $ : chr "This is technically now the second document in a YAML stream"
Performance and safety notes
yaml12 is implemented in Rust and written with performance and safety in mind. It avoids unnecessary allocations, copies, and extra traversals where possible. In Python, py-yaml12 (imported as yaml12) also releases the GIL for large parses and serializations.
In typical usage, the R package yaml12 is ~2× faster than the yaml package, and the Python package py-yaml12 is ≥50× faster than default PyYAML in the benchmarks (
R benchmarks;
Python benchmarks).
Tags are preserved by default, and interpreting them (including any kind of evaluation) is always an explicit opt-in via handlers. Plain scalars follow the YAML 1.2 core schema rules for predictable typing.
In Python, py-yaml12 ships prebuilt wheels for common platforms. If you do need to build from source, you’ll need a Rust toolchain. In R, yaml12 is available from CRAN (including binaries on common platforms).
Wrapping up
If you work with YAML as a data format for configuration, front matter, or data interchange, we hope yaml12 (R) and py-yaml12 (Python) help you parse and emit YAML 1.2 predictably. If you run into YAML that doesn’t behave as expected, we’d love to hear about it in the issue trackers:
r-yaml12 and
py-yaml12.
Learn more
- R package docs: https://posit-dev.github.io/r-yaml12/
- R package on CRAN: https://cran.r-project.org/package=yaml12
- Python package docs: https://posit-dev.github.io/py-yaml12/
- Python package on PyPI: https://pypi.org/project/py-yaml12/
Acknowledgements
Both packages build on the fantastic work in the YAML ecosystem, especially the saphyr Rust crate and the
yaml-test-suite.