processx 3.9.0

  package, processes, processx, system

  Gábor Csárdi

We’re happy to announce the release of processx 3.9.0. processx is an R package to run and manage system processes.

You can install it from CRAN with:

install.packages("processx")

This blog post discusses the major new features in processx 3.9.0. You can see a full list of changes in the release notes.

Pipelines

New new pipeline class lets you connect two or more processes with kernel-level pipes, exactly like a Unix shell pipeline (cmd1 | cmd2 | cmd3): data flows directly between child processes without passing through R.

pl <- pipeline$new(
  list(c("sort"), c("uniq", "-c"), c("sort", "-rn")),
  stdin = "|",
  stdout = "|"
)
pl$write_input("banana\napple\nbanana\norange\napple\nbanana\n")
pl$close_input()
#> NULL
pl$read_all_output_lines()
#> [1] "   3 banana" "   2 apple"  "   1 orange"
pl$wait()
pl$get_exit_statuses()
#> [[1]]
#> [1] 0
#> 
#> [[2]]
#> [1] 0
#> 
#> [[3]]
#> [1] 0

The pipeline$new() constructor takes a list of character vectors — one per command — along with the usual stdin, stdout, and stderr arguments. These apply to the ends of the pipeline: stdin connects to the first process, stdout reads from the last, and stderr controls all processes.

The key benefit over calling run() in sequence is efficiency: intermediate data never materialises in R. A pipeline processing gigabytes of log lines uses the same small kernel buffers as a shell pipeline would.

Because each step in the pipeline is a regular process object under the hood, you can access individual processes via $get_processes() — useful for reading per-process stderr or checking exit codes when a stage fails.

pipeline works on Unix and Windows and is currently experimental: the API may still change.

Pseudo-terminal support

processx::run(pty = TRUE)

Many command-line tools behave differently when their output is not connected to a terminal: they disable colour, turn off progress bars, or buffer output more aggressively. The pty = TRUE option runs a process inside a pseudo-terminal so it sees a real terminal — colour and interactive behaviour included.

run() now supports pty = TRUE directly:

out <- run("ls", c("--color", path.expand("~/works/processx")), pty = TRUE)
cat(out$stdout)
#> DESCRIPTION    NAMESPACE      README.md      inst           tests
#> LICENSE        NEWS.md        _pkgdown.yml   man            tools
#> LICENSE.md     R              air.toml       processx.Rproj vignettes
#> Makefile       README.Rmd     codecov.yml    src

When pty = TRUE, stderr is merged into stdout (the result’s $stderr is always NULL), because a PTY has a single stream. You can also supply a file path as stdin; its contents are fed to the process via the PTY master, followed by an EOF signal.

Windows support

processx 3.9.0 adds support for pseudo-terminals (PTYs) on Windows, starting from Windows 10 version 1809. The Windows implementation uses the ConPTY API (CreatePseudoConsole), loaded dynamically so processx continues to load on older Windows and emits a clear error if pty = TRUE is requested on an unsupported version.

Other improvements

New process cleanup article

A new article, Process cleanup, documents all five mechanisms processx provides for ensuring subprocesses don’t outlive their intended scope:

  1. Explicit cleanup with on.exit() — always deterministic.
  2. Automatic cleanup on garbage collection (cleanup = TRUE, the default).
  3. Process-tree cleanup (cleanup_tree = TRUE).
  4. Linux parent-death signal (linux_pdeathsig) — Linux only, handles R crashes.
  5. Supervisor process (supervise = TRUE) — all platforms, handles R crashes.

Death signal support on Linux

On Linux, you can now tell the kernel to deliver a signal to the child process automatically if the parent R process exits — even if R crashes. Set linux_pdeathsig = TRUE to send SIGTERM, or pass an integer signal number directly:

p <- process$new("sleep", "100", linux_pdeathsig = TRUE)

This is useful when you want child processes to clean up after an R crash, without the overhead of running a supervisor. The argument is silently ignored on macOS and Windows.

Record the time when a process exits

process$get_end_time() returns the time when the process exited as a POSIXct, or NULL if it is still running. This makes it straightforward to measure wall-clock duration without having to record timestamps yourself:

p <- process$new("sleep", "1")
p$wait()
p$get_end_time() - p$get_start_time()
#> Time difference of 1.010295 secs

Append stdout/stderr to files

process$new() and run() now support ">>" as a prefix for stdout and stderr file paths to append output instead of truncating the file:

log <- tempfile()
run("echo", args = "first line", stdout = log)
#> $status
#> [1] 0
#> 
#> $stdout
#> NULL
#> 
#> $stderr
#> [1] ""
#> 
#> $timeout
#> [1] FALSE
run("echo", args = "second line", stdout = paste0(">>", log))
#> $status
#> [1] 0
#> 
#> $stdout
#> NULL
#> 
#> $stderr
#> [1] ""
#> 
#> $timeout
#> [1] FALSE
readLines(log)
#> [1] "first line"  "second line"

This is handy when you run the same process repeatedly and want to accumulate output in a single log file.

Binary standard output and error

run() and process$new() now support encoding = "binary" to capture raw bytes. In binary mode, run() returns stdout and stderr as raw vectors, and process$read_output() / process$read_error() return raw vectors rather than character strings. All bytes are preserved exactly, including null bytes and non-UTF-8 sequences.

result <- run("cat", args = "/bin/ls", encoding = "binary")
typeof(result$stdout)
#> [1] "raw"
length(result$stdout)
#> [1] 154624

Two new methods, process$read_output_bytes() and process$read_error_bytes(), and the conn_read_bytes() function, provide direct access to raw bytes from processx connections.

Acknowledgements

Thanks to everyone who contributed to processx 3.9.0 through code, issues, testing, and feedback:

@advieser, @cderv, @chwpearse, @HenrikBengtsson, @king-of-poppk, @r2evans, @sckott, @sda030, @stupidpupil, and @Yunuuuu.