processx 3.2.0

  r-lib, processx

  Gábor Csárdi

Introduction

We’re delighted to announce that processx in now on CRAN. processx makes it easy to run external processes from R. It’s an extended version of system() and system2(), that gives you greater control, and more visibility into the running process.

It’s hard to make processx examples work across platforms because system utilities vary from OS to OS. To work around this problem, processx bundles a small program, px, which can perform some basic tasks, like printing to the standard output and error, and waiting for a given amount of time.

px <- processx:::get_tool("px")
px
#> [1] "/Users/gaborcsardi/r_pkgs/processx/bin//px"

processx deals with two kinds of external processes: foreground and background. Foreground processes are synchronous, R waits until they finish, and collects the output and the exit code of the process. Background processes are asynchronous, processx does not wait for them to finish, they run concurrently and can communicate with the R process.

Foreground processes

processx::run() runs a foreground external process. It is somewhat similar to the system2() base R function. Its basic usage is:

processx::run(command, args)

command is a string (length 1 character vector), and args should be a character vector of arguments. command can be an absolute file name, a relative file name, or a command name. For the latter, the current PATH is used to find the command. For example these both work on Unix systems:

run("/bin/ls")
run("ls")

Here is the output of px --help:

pxhelp <- run(px, "--help")
cat(pxhelp$stderr)
#> Usage: px [command arg] [command arg] ...
#> 
#> Commands:
#>   sleep  <seconds>           -- sleep for a number os seconds
#>   out    <string>            -- print string to stdout
#>   err    <string>            -- print string to stderr
#>   outln  <string>            -- print string to stdout, add newline
#>   errln  <string>            -- print string to stderr, add newline
#>   cat    <filename>          -- print file to stdout
#>   return <exitcode>          -- return with exitcode
#>   write <fd> <string>        -- write to file descriptor
#>   echo <fd1> <fd2> <nbytes>  -- echo from fd to another fd
#>   getenv <var>               -- environment variable to stdout

Quoting

processx does not use a shell to start up the external process, so special characters in command and args need not be shell quoted. This makes it much easier to support arbitrary file names (that may contain spaces or special characters) in calls to external programs.

run(px, c("outln", "arg -   with spaces", "outln", "'arg with quote'"))
#> $status
#> [1] 0
#> 
#> $stdout
#> [1] "arg -   with spaces\n'arg with quote'\n"
#> 
#> $stderr
#> [1] ""
#> 
#> $timeout
#> [1] FALSE

Interruption

Unlike system() and system2(), processx::run() is always interruptible, you can use the usual interruption key, e.g. ESC in RStudio, or CTRL+C in a terminal. On interruption, the external process is terminated.

Spinner

run() can show a friendly spinner while the external process is running. If the process takes longer then a few second, it is a good idea to use it. The spinner is automatically hidden if R is non-interactive:

run(px, c("sleep", "5"), spinner = TRUE)

Time limit

You can specify a time limit in run(), in seconds, or as a difftime object:

run(px, c("sleep", "5"), timeout = 1)
#> Error in run(px, c("sleep", "5"), timeout = 1): System command timeout

run() throws an error of class system_command_timeout_error, so you can easily catch timeouts using tryCatch(), if you wish so. By default run() also throws an error if the system command fails, as indicated by its exit status.

run(px, c("return", "10"))
#> Error in run(px, c("return", "10")): System command error

The error_on_status argument can be set to FALSE to avoid errors for non-zero exit statuses. This can be useful if you anticipate a failure and want to handle it without throwing an R error, or if a non-zero exit status does not indicate an error for the given program.

Standard output and error

By default, run() collects all standard output and error of the process and retuns them in two strings. If desired, it can also echo them to the screen while the external process is running. (They are still collected and returned, so you can still compute on them.)

outp <- run("ls", "..", echo = TRUE)
#> _redirects
#> articles
#> contribute.md
#> help-is-on-the-way.jpg
#> help.md
#> learn.md
#> lifecycle.md
#> packages.md
#> reprex-addin.png
#> reprex-addins-menu.png
#> rstudio-logo.svg
#> test-ggplot2-1.png
#> test.md

Setting environment variables

You can set environment variables for the external process via the env argument. Usually you want to add these variables to those already set in the current process, otherwise the external process might fail if some essential environment variables (like PATH) are not set:

run(px, c("getenv", "FOO"), env = c(Sys.getenv(), FOO = "bar"))
#> $status
#> [1] 0
#> 
#> $stdout
#> [1] "bar\n"
#> 
#> $stderr
#> [1] ""
#> 
#> $timeout
#> [1] FALSE

Advanced usage: background processes

processx really shines when it comes to controlling background processes. To start a backgound process, you create an R6 object of class process. The arguments of process$new() mostly correspond to the arguments of run().

proc <- process$new(px, c("sleep", "10"))
proc
#> PROCESS 'px', running, pid 61100.

process objects have methods to query process information and to manipulate the subprocess. See ?process for a complete list of methods.

proc$get_name()
#> [1] "px"
proc$get_cmdline()
#> [1] "/Users/gaborcsardi/r_pkgs/processx/bin//px"
#> [2] "sleep"                                     
#> [3] "10"
proc$get_exe()
#> [1] "/Users/gaborcsardi/r_pkgs/processx/bin/px"
proc$is_alive()
#> [1] TRUE
proc$suspend()
#> NULL
proc$get_status()
#> [1] "stopped"
proc$resume()
#> NULL
proc$get_status()
#> [1] "running"
proc$kill()
#> [1] TRUE
proc$is_alive()
#> [1] FALSE
proc$get_exit_status()
#> [1] -9

Output and polling

The standard output and standard error of a background process are ignored by default. To write them to files, set the stdout and/or stderr arguments to the paths of the files. Alternatively, processx can create connections for standard output and error, and R can read from these connections or poll them. Polling a set of connections or processes means that R waits until data is available on any of the connections, or a timeout expires. This is useful if the R process is waiting on one or more processes.

proc <- process$new(px, c("sleep", "1", "outln", "foo", "sleep", "1",
     "errln", "bar", "sleep", "1"), stdout = "|", "stderr" = "|")
proc$poll_io(-1)
#>   output    error  process 
#>  "ready" "silent" "nopipe"
proc$read_output_lines()
#> [1] "foo"
proc$poll_io(-1)
#>   output    error  process 
#> "silent"  "ready" "nopipe"
proc$read_error_lines()
#> [1] "bar"
proc$poll_io(-1)
#>   output    error  process 
#> "silent"  "ready" "nopipe"
proc$is_alive()
#> [1] FALSE

$poll_io() also returns when the process terminates.

To poll multiple processes, the non-member poll() function can be used, this takes a list of processes:

proc1 <- process$new(px, c("sleep", "0.5", "outln", "foo1", "sleep", "1"),
     stdout = "|", "stderr" = "|")
proc2 <- process$new(px, c("sleep", "1", "outln", "foo2", "sleep", "1"),
     stdout = "|", "stderr" = "|")
poll(list(proc1, proc2), -1)
#> [[1]]
#>   output    error  process 
#>  "ready" "silent" "nopipe" 
#> 
#> [[2]]
#>   output    error  process 
#> "silent" "silent" "nopipe"
proc1$read_output_lines()
#> [1] "foo1"
poll(list(proc1, proc2), -1)
#> [[1]]
#>   output    error  process 
#> "silent" "silent" "nopipe" 
#> 
#> [[2]]
#>   output    error  process 
#>  "ready" "silent" "nopipe"
proc2$read_output_lines()
#> [1] "foo2"

Process tree cleanup

In addition to terminating the subprocess, processx supports terminating all child processes that were started by the subprocess, and the child processes of those, etc.

To request process tree cleanup, set the cleanup_tree argument of run() or the cleanup argument of process$new() to TRUE. (It is the default for process$new().) To clean up manually, use the $kill_tree() method.

Use case: wait for an external process to be ready

When starting up an external process, sometimes you need to wait until the process is ready to receive input. E.g. PhantomJS is a headless browser, used for testing web applications. The headless browser is queried and controlled via an HTTP socket. PhantomJS has some startup time, and to make sure that it is ready for input, you need need to wait until it logs an INFO line to its standard output:

❯ phantomjs  -w
[INFO  - 2018-08-21T19:57:53.957Z] GhostDriver - Main - running on port 8910
^C

So processx must capture the standard output and wait until the message is printed. If the message is not printed within a timeout we throw an error. On success the function returns the PhantomJS process object:

start_program <- function(command, args, message, timeout = 5, ...) {
  timeout <- as.difftime(timeout, units = "secs")
  deadline <- Sys.time() + timeout
  px <- process$new(command, args, stdout = "|", ...)
  while (px$is_alive() && (now <- Sys.time()) < deadline) {
    poll_time <- as.double(deadline - now, units = "secs") * 1000
    px$poll_io(as.integer(poll_time))
    lines <- px$read_output_lines()
    if (any(grepl(message, lines))) return(px)
  }

  px$kill()
  stop("Cannot start ", command)
}

Use start_program like this:

start_program("phantomjs", "-w", "running on port")

Some comments about start_program():

  • It waits for message to show up in the standard output of the process.
  • If this does not happen within 5 seconds, it throws an error.
  • On success, it returns the process object.
  • The returned process object still has a connection to the standard output of the process. This needs to be read out regularly, otherwise its buffer fills up, and the subprocess stops, until the buffer it freed. Alternatively, one can close it with close(px$get_output_connection()).
  • If an error happens, the subprocess is terminated when the process object, referred to by px within the function, is garbage collected.

The ps package

The ps package deals with system processes in general. processx and ps methods overlap, in fact processx uses ps to implement some of its methods. It is also possible to create a ps_handle object from a processx object, with the $as_ps_handle() method. This can then be used with the ps functions directly:

proc <- process$new(px, c("sleep", "3"))
ps <- proc$as_ps_handle()
ps::ps_memory_info(ps)
#>        rss        vms    pfaults    pageins 
#>     663552 2491170816        323          0

The ps package includes a testthat reporter that can be used to check that testthat test cases clean up all their child processes and close their connections and open files. See ?ps::CleanupReporter for details.

Here is a simple example on how to use it. In the testthat.R file of your package, update the test_check() call to use CleanupReporter. Since ps is not supported on all platforms (only Windows, macOS and Linux currently), we also need to check for ps support:

if (ps::ps_is_supported()) {
  reporter <- ps::CleanupReporter(testthat::SummaryReporter)$new()
} else {
  ## ps does not support this platform
  reporter <- "progress"
}

test_check("<package-name>", reporter = reporter)

CleanupReporter will check for leftover child processes, R connections and open files at the end of each test_that() block. If a check fails, it generates a regular testthat test failure.

The callr package

The callr package uses processx to start another R process, and run R code in it. It can start R processes synchonously or asynchronously, and the R processes can be either state-less or stateful. See ?callr::r, ?callr::r_process and ?callr::r_session for details.

Acknowledgements

We’re grateful to the 8 people who contributed issues, code and comments since the last processx release:

@breichholf, @dchiu911, @gaborcsardi, @hadley, @joelnitta, @matthijsvanderloos, @maxheld83, and @wlandau