WebAssembly roundup part 1: webR 0.4.2

We’re totally stoked to announce the release of webr 0.4.2!

It’s been a little while since I’ve written about webR here, and a few releases between my last blog post and this one. In this post I’ll cover some of the exciting changes to the core webR distribution, and also include some interesting tidbits for JavaScript developers using webR in their own applications. You can see a full list of changes in the release notes.

This post is the first in an R for WebAssembly roundup series. The next posts will cover updates to Shinylive for R, and introduce new a new Quarto extension that uses the power of webR and WebAssembly to elevate your documents with interactivity.

The base R distribution may be run using nothing but a text console, but some additional options can be implemented by frontends to provide system-dependent display of content. Previously, we implemented the pager option so that R’s help system can be better displayed within the webR application. Using the pager we can show R function and package documentation outside of the text console in dedicated tabbed windows.

In recent releases of webR we have expanded our support for such display systems, providing an implementation both for the View() function and the viewer global option used by htmlwidgets.

This gives us the ability to show a tabular data viewer for data.frame-like R objects and an iframe based HTML content viewer, enabling dynamic web-based output from R packages like leaflet and gt.

Screenshots of the webR REPL showing a tabular data viewer, an interactive map using the leaflet package, and a HTML table rendered using the gt package.

The implementation of viewer is fairly general, making use of webR’s output messages mechanism to send the required information to the main JavaScript thread for display. That way, any application using webR may choose to listen for those messages and how to show the resulting content on the page. We’ll make use of this in a later post where I introduce using webR to generate dynamic content in a new Quarto extension.

Improvements to the webR app UI

Recent webR releases have also made some other quality of life improvements to the webR app. Some minor improvements include making each UI panel resizeable, and offering .zip download of an entire directory in the Files panel.

Screenshot showing the 'Download directory' feature in the webR REPL app.

R source syntax highlighting and parsing

The webR app’s code editor is powered by CodeMirror with R parsing provided by the codemirror-lang-r package. CodeMirror’s extensibility is excellent, and the library is well suited for integrating into a wider project like this. However, we noticed that the codemirror-lang-r package had a few issues highlighting certain types of R syntax. In particular, in our application highlighting matrix operations such as %*% would crash the parser!

Screenshots comparing syntax highlighting of R source code before and after the changes discussed above.

As well as fixing this bug, we’ve worked to improve the R parser to better support some other types of R syntax and have contributed these changes upstream so as to benefit other users of codemirror-lang-r.

WebAssembly R package binary format

One of R’s greatest strengths is its vibrant community of R packages and their developers, and so one of the development goals of webR is that packages are downloaded and installed as fast as possible. In the latest release of webR, some joint work with Jeroen Ooms improving the performance of loading WebAssembly binary R packages has landed.

R packages and other filesystem data is efficiently made available to the R WebAssembly process using Emscripten’s file packager and the WORKERFS filesystem driver. Previously we used uncompressed filesystem data, with the intention of serving content using HTTP compression. However, web services do not always compress files automatically¹, especially if they are large. So, in the latest release of webR, filesystem data may now be mounted from a gzip compressed file², and the base R filesystem is also distributed in compressed form.

R package developers might recognise that traditional R package binaries are already produced as a gzip compressed archive. And, as pointed out to me by Jeroen, the format of a .tar archive is very similar to Emscripten’s .data files. With some clever arrangement of R package archive data and Emscripten filesystem metadata, pre-processed WebAssembly R package binaries may now be directly mounted to the virtual filesystem by webR.

Mounting R packages in this way is more efficient than installing .tgz archives in the usual manner because the decompression step happens in the browser, rather than using R’s slower internal routines, and the WORKERFS filesystem driver also avoids memory copies with the archive files until they are actually opened and read by the WebAssembly R process.

Both the webR default repository and R-Universe now serve binary R packages for WebAssembly in this new format. These packages can be installed and loaded interactively in the webR application, or used as dependencies in a deployed Shinylive for R app. For your own custom R packages, the rwasm package can be used to compile WebAssembly binaries using a pre-configured Docker container. However, I’d actually recommend creating a personal R-Universe repository for your packages instead, since this will automatically build binaries for multiple targets including WebAssembly.

A much simpler but effective change has also been made: R packages listed only as LinkingTo dependencies are no longer downloaded by webR on package installation. These are packages are required for building an R package from source, but not at runtime. The change saves network resources when installing WebAssembly R packages. In one particular worst-case scenario, this change avoided downloading about 100 megabytes of data!

Virtual file system drivers

A nice side-effect of the work in the previous section is that mounting filesystem data with WORKERFS now also works correctly under Node.js, fixing a fairly painful and long-standing bug for our server-side users of webR.

We’ve also introduced mounting with Emscripten’s IDBFS filesystem driver when running webR in the browser³. This driver makes use of the low-level IndexedDB API provided by the JavaScript environment to write virtual filesystem contents to a form of local storage on the device.

With this, files that have been written to the virtual filesystem can be persisted over page reloads and automatically made available again to the WebAssembly R process when the page is revisited in the future, without needing to re-download the content.

You can try it out right here! Any files written to the /persist directory in the interactive R console below should be persisted. The first time you load this page, the directory will be empty. However, if files are written they will remain available after you refresh the page or revisit in the future.

It should be noted that filesystem data stored in an IndexedDB database can only be accessed within the same origin, essentially across the current web page’s domain. Also, browsers may decide the amount of storage space provided, what content is deleted when quotas are reached, and when exactly that deletion occurs. In private browsing mode, for example, data is usually removed when the private session ends.

Even with these caveats, I expect developers working with webR will be able to make use of the IDBFS driver to selectively cache content or R packages that are too large to download over the network on every single page load, further improving start up times in their own apps as a result.

Developing with webR

Deprecating the `ServiceWorker` channel

The ServiceWorker communication channel, a method webR offered to handle message passing between the main browser thread and the JavaScript Web Worker running the R WebAssembly binary, has been deprecated. The communication channel was originally devised as a way to allow use of webR in cases where the SharedArrayBuffer API is not available. This includes any use of webR with an origin that is not Cross-Origin Isolated, such as when content is hosted by GitHub Pages.

The channel was implemented using a JavaScript Service Worker proxy and synchronous XHR requests. Unfortunately, with the overhead of message serialisation and capturing network requests, performance was significantly impacted. The channel was also not compatible with applications that make use of a service worker for genuine network proxy functionality, such as Shinylive.

An alternative method has since been developed in the form of the PostMessage communication channel. This instead uses the JavaScript PostMessage API, which is designed to handle communication between workers efficiently. It has much better performance and even provides a way to transfer objects using zero-copy operations. There are some minor downsides when using the PostMessage channel, mostly related to taking input using tools like readline(), or nested REPLs like R’s browser(), but for most applications we find that this is not catastrophic and a reasonable price to pay for what is intended as a fallback method.

If you are working on a webR application where readline() functionality is absolutely required, but you cannot set your web server headers to enable cross-origin isolation, an alternative implementation of using a service worker to solve the problem can be found with the coi-serviceworker package. When enabled, the web page will appear to webR to be cross-origin isolated and so SharedArrayBuffer can be used. This still has the other drawbacks of requiring a service worker, but will have much better performance than using webR’s ServiceWorker channel directly.

For these reasons, the PostMessage communication channel is now the default fallback when the web page is not cross-origin isolated. The ServiceWorker channel will continue to be available in the short-term, if explicitly requested, but will eventually be removed in a future version of webR.

API additions

We’ve made some minor changes to the webR JavaScript API. There’s nothing ground breaking here, but some new tools that we hope to be useful.

Report current version

With the aim of providing functionality similar to the R.Version() and packageVersion() R functions, the version of the currently running webR session may now be obtained from the JavaScript environment.

> const webR = new WebR();
> webR.version;
// '0.4.3-dev+d1fb4f4'

Discover an object’s class

An R object’s class() may be inspected from an RObject proxy. The returned value is an RCharacter vector of classes from which the object inherits.

> await webR.evalR("mtcars")
    .then(obj => obj.class())
    .then(cls => cls.toArray());
// ['data.frame']

Explicitly construct an R `data.frame`

In a previous version of webR, we introduced creating new R data.frame objects from JavaScript using the generic RObject constructor. WebR will build a data.frame for arguments with compatible shape: either an object with named columns, or an array with objects for each row.

> let source1 = { abc: [1, 2, 3], xyz: [4, 5, 6] };
> await new webR.RObject(source1)
   .then(obj => obj.class())
   .then(cls => cls.toArray());
// ['data.frame']

> let source2 = [ { abc: 1, xyz: 4 }, { abc: 2, xyz: 5 }, { abc: 3, xyz: 6 }];
> await new webR.RObject(source2)
   .then(obj => obj.class())
   .then(cls => cls.toArray());
// ['data.frame']

You might ask why not create an R list object by default? The reason is that we expect a common situation to be taking datasets defined in the JavaScript environment and processing them using R. With data.frame as the default, JavaScript objects that have been formatted for use with existing JavaScript frameworks can be almost transparently passed to R.

> penguins;
//  Array(344) [
//  0: { species: 'Adelie', island: 'Torgersen', flipper_length_mm: 181, ... }
//   ... more
// ]
> const sample_mass = await webR.evalR(`
    \\(x) x |> dplyr::sample_n(5) |> dplyr::pull("body_mass_g")
  `);
> await sample_mass(penguins);
// { type: 'double', names: null, values: [3300, 3250, 4000, 4700, 3750] }

The generic constructor throws an exception for JavaScript objects that cannot be coerced as a data.frame. If you’d prefer to create an R list, you must instead be explicit by using the RList constructor,

> let source = { def: [123, 456], uvw: 'hello' };
> await new webR.RObject(source);
// Uncaught WebRWorkerError: Can't construct `data.frame`. Source object is not eligible.

> let obj = await new webR.RList(source);
> await obj.type();
// 'list'

The RObject constructor is designed to be a useful default for interactive work at a JavaScript console. However, production applications should be explicit in the choice of constructor. With this in mind we have added a new class RDataFrame, a subclass of RList, so that users may be explicit in their choice of creating a data.frame, rather than relying on the generic RObject constructor.

> let source = { abc: [1, 2, 3], xyz: [4, 5, 6] };
> await new webR.RDataFrame(source)
   .then(obj => obj.class())
   .then(cls => cls.toArray());
// ['data.frame']

Now, if your source object is not quite as you expect, rather than continuing silently without error an exception will be thrown. We hope this will reduce the chance of type-related bugs and unexpected behaviour, and aid in debugging when issues do occur.

// Say we _expect_ a JS object here, but something went wrong...
> let bug = undefined;

> const obj1 = await new webR.RObject(bug);
// [No error and webR silently continues with an unexpected R object]

> const obj2 = await new webR.RDataFrame(bug);
// Uncaught WebRWorkerError: Can't construct `data.frame`. Source object is not eligible.

Acknowledgements

Special thanks to @jeroen, for helpful conversations when it comes to packaging for webR. And thank you, as always, to the users and developers contributing to webR in the form of discussion, bug reports, and pull requests.

@027xiguapi, @adrianolszewski, @alekrutkowski, @andrjohns, @baogorek, @bugzpodder, @christianp, @coatless, @codingthemystery, @ColinFay, @derrickstaten, @dipterix, @EduardBel, @gregvolny, @guillaumechaumet, @gyanaranjans, @helgasoft, @HenrikBengtsson, @isbool, @JosiahParry, @luisDVA, @minhaj57sorder, @olivroy, @oranwutang, @pawelru, @psychemedia, @rainer-rq-koelle, @richarddmorey, @richardjtelford, @seanbirchall, @shalom-lab, @StaffanBetner, @stobor827, @SugarRayLua, @tavosansal, @thomascwells, @timelyportfolio, and @zpinocchio.

It depends a lot on how the hosting service has configured their production web server and the files themselves; both size and content type can make a difference to behaviour. Some services allow for pre-compressed content, while others do not. The AWS CloudFront documentation gives a good overview of how this all fits together. ↩︎
Emscripten’s file_packager tool also supports built-in LZ4 compression with the --lz4 flag. While generally useful for bundling files for WebAssembly applications, we avoid using this feature since it writes important data to a .js output file that must be executed. Ideally, we’d prefer our package loading mechanism to only require a single file download, similar to traditional R package archives. ↩︎
Note that currently users wanting to make use of IDBFS mounting must configure webR to use the PostMessage Communication Channel. ↩︎