We’re delighted to announce the release of httr21 1.0.0. httr2 is the second generation of httr: it helps you generate HTTP requests and process the responses, designed with an eye towards modern web APIs and potentially putting your code in a package.
You can install it from CRAN with:
install.packages("httr2")
httr2 has been under development for the last two years, but this is the first time we’ve blogged about it because we’ve been waiting until the user interface felt stable. It now does, and we’re ready to encourage you to use httr2 whenever you need to talk to a web server. Most importantly httr2 is now a “real” package because it has a wonderful new logo, thanks to a collaborative effort involving Julie Jung, Greg Swineheart, and DALL•E 3.
httr2 is the successor to httr. The biggest difference is that it has an explicit request object which you can build up over multiple function calls. This makes the interface fit more naturally with the pipe, and generally makes life easier because you can iteratively build up a complex request. httr2 also builds on the 10 years of package development experience we’ve accrued since creating httr, so it should all around be more enjoyable to use. If you’re a current httr user, there’s no need to switch, as we’ll continue to maintain the package for many years to come, but if you start on a new project, I’d recommend that you give httr2 a shot.
If you’ve been following httr2 development for a while, you might want to jump to the release notes to see what’s new (a lot!). The most important change in this release is that Maximilian Girlich is now a httr2 author, in recognition of his many contributions to the package. This release also features improved tools for performing multiple requests (more on that below) and a bunch of bug fixes and minor improvements for OAuth.
For the rest of this blog post, I’ll assume that you’re familiar with the basics of HTTP. If you’re not, you might want to start with vignette("httr2")
which introduces you to HTTP using httr2.
Making a request
httr2 is designed around the two big pieces of HTTP: requests and responses. First you’ll create a request, with a URL:
req <- request(example_url())
req
#> <httr2_request>
#> GET http://127.0.0.1:51981/
#> Body: empty
Instead of using an external website, here we’re using a test server that’s built in to httr2. This ensures that this blog post, and many httr2 examples, work independently from the rest of the internet.
You can see the HTTP request that httr2 will send, without actually sending it2, by doing a dry run:
req |> req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2
#> Accept: */*
#> Accept-Encoding: deflate, gzip
As you can see, this request object will perform a simple GET
request with automatic user agent and accept headers.
To make more complex requests, you modify the request object with functions that start with req_
. For example, you could make it a HEAD
request, with some query parameters, and a custom user agent:
req |>
req_url_query(param = "value") |>
req_user_agent("My user agent") |>
req_method("HEAD") |>
req_dry_run()
#> HEAD /?param=value HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: My user agent
#> Accept: */*
#> Accept-Encoding: deflate, gzip
Or you could send some JSON in the body of the request:
req |>
req_body_json(list(x = 1, y = "a")) |>
req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Content-Type: application/json
#> Content-Length: 15
#>
#> {"x":1,"y":"a"}
httr2 provides a
wide range of req_
function to customise the request in common ways; if there’s something you need that httr2 doesn’t support, please
file an issue!
Performing the request and handling the response
Once you have a request that you are happy with, you can send it to the server with
req_perform()
:
req_json <- req |> req_url_path("/json")
resp <- req_json |> req_perform()
Performing a request will return a response object (or throw an error, which we’ll talk about next). You can see the basic details of the request by printing it or you can see the raw response with
resp_raw()
3:
resp
#> <httr2_response>
#> GET http://127.0.0.1:51981/json
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (407 bytes)
resp |> resp_raw()
#> HTTP/1.1 200 OK
#> Connection: close
#> Date: Tue, 14 Nov 2023 14:41:32 GMT
#> Content-Type: application/json
#> Content-Length: 407
#> ETag: "de760e6d"
#>
#> {
#> "firstName": "John",
#> "lastName": "Smith",
#> "isAlive": true,
#> "age": 27,
#> "address": {
#> "streetAddress": "21 2nd Street",
#> "city": "New York",
#> "state": "NY",
#> "postalCode": "10021-3100"
#> },
#> "phoneNumbers": [
#> {
#> "type": "home",
#> "number": "212 555-1234"
#> },
#> {
#> "type": "office",
#> "number": "646 555-4567"
#> }
#> ],
#> "children": [],
#> "spouse": null
#> }
But generally, you’ll want to use the resp_
functions to extract parts of the response for further processing. For example, you could parse the JSON body into an R data structure:
resp |>
resp_body_json() |>
str()
#> List of 8
#> $ firstName : chr "John"
#> $ lastName : chr "Smith"
#> $ isAlive : logi TRUE
#> $ age : int 27
#> $ address :List of 4
#> ..$ streetAddress: chr "21 2nd Street"
#> ..$ city : chr "New York"
#> ..$ state : chr "NY"
#> ..$ postalCode : chr "10021-3100"
#> $ phoneNumbers:List of 2
#> ..$ :List of 2
#> .. ..$ type : chr "home"
#> .. ..$ number: chr "212 555-1234"
#> ..$ :List of 2
#> .. ..$ type : chr "office"
#> .. ..$ number: chr "646 555-4567"
#> $ children : list()
#> $ spouse : NULL
Or get the value of a header:
resp |> resp_header("Content-Length")
#> [1] "407"
Error handling
You can use
resp_status()
to see the returned status:
resp |> resp_status()
#> [1] 200
But this will almost always be 200, because httr2 automatically follows redirects (statuses in the 300s) and turns HTTP failures (statuses in the 400s and 500s) into R errors. The following example shows what error handling looks like using an example endpoint that returns a response with the status defined in the URL:
req |>
req_url_path("/status/404") |>
req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.
req |>
req_url_path("/status/500") |>
req_perform()
#> Error in `req_perform()`:
#> ! HTTP 500 Internal Server Error.
Turning HTTP failures into R errors can make debugging hard, so httr2 provides the
last_request()
and
last_response()
helpers which you can use to figure out what went wrong:
last_request()
#> <httr2_request>
#> GET http://127.0.0.1:51981/status/500
#> Body: empty
last_response()
#> <httr2_response>
#> GET http://127.0.0.1:51981/status/500
#> Status: 500 Internal Server Error
#> Content-Type: text/plain
#> Body: None
httr2 provides two other tools to customise error handling:
-
req_error()
gives you full control over what responses should be turned into R errors, and allows you to add additional information to the error message. -
req_retry()
helps deal with transient errors, where you need to wait a bit and try again. For example, many APIs are rate limited and will return a 429 status if you have made too many requests.
You can learn more about both of these functions in “ Wrapping APIs” as they are particularly important when creating an R package (or script) that wraps a web API.
Control the request process
There are a number of other req_
functions that don’t directly affect the HTTP request but instead control the overall process of submitting a request and handling the response. These include:
-
req_cache()
, which sets up a cache so if repeated requests return the same results, and you can avoid a trip to the server. -
req_throttle()
, which automatically adds a small delay before each request so you can avoid hammering a server with many requests. -
req_progress()
, which adds a progress bar for long downloads or uploads. -
req_cookie_preserve()
, which lets you preserve cookies across requests.
Additionally, httr2 provides rich support for authenticating with OAuth, implementing many more OAuth flows than httr. You’ve probably used OAuth a bunch without knowing what it’s called: you use it when you login to a non-Google website using your Google account, when you give your phone access to your twitter account, or when you login to a streaming app on your smart TV. OAuth is a big, complex, topic, and is documented in “ OAuth".
Multiple requests
httr2 includes three functions to perform multiple requests:
-
req_perform_sequential()
takes a list of requests and performs them one at a time. -
req_perform_parallel()
takes a list of requests and performs them in parallel (up to 6 at a time by default). It’s similar toreq_perform_sequential()
, but is obviously faster, at the expense of potentially hammering a server. It also has some limitations: most importantly it can’t refresh an expired OAuth token and it doesn’t respectreq_retry()
orreq_throttle()
. -
req_perform_iterative()
takes a single request and a callback function to generate the next request from previous response. It’ll keep going until the callback function returnsNULL
ormax_reqs
requests have been performed. This is very useful for paginated APIs that only tell you the URL for the next page.
For example, imagine we wanted to download each person from the Star Wars API. The URLs have a very consistent structure so we can generate a bunch of them, then create the corresponding requests:
Now I can perform those requests, collecting a list of responses:
resps <- req_perform_sequential(reqs)
#> Iterating ■■■■ 10% | ETA: 40s
#> Iterating ■■■■■■■ 20% | ETA: 3m
#> Iterating ■■■■■■■■■■ 30% | ETA: 2m
#> Iterating ■■■■■■■■■■■■■ 40% | ETA: 1m
#> Iterating ■■■■■■■■■■■■■■■■ 50% | ETA: 46s
#> Iterating ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 33s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 22s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 13s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 6s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
These responses contain their data in a JSON body:
resps |>
_[[1]] |>
resp_body_json() |>
str()
#> List of 16
#> $ name : chr "Luke Skywalker"
#> $ height : chr "172"
#> $ mass : chr "77"
#> $ hair_color: chr "blond"
#> $ skin_color: chr "fair"
#> $ eye_color : chr "blue"
#> $ birth_year: chr "19BBY"
#> $ gender : chr "male"
#> $ homeworld : chr "https://swapi.dev/api/planets/1/"
#> $ films :List of 4
#> ..$ : chr "https://swapi.dev/api/films/1/"
#> ..$ : chr "https://swapi.dev/api/films/2/"
#> ..$ : chr "https://swapi.dev/api/films/3/"
#> ..$ : chr "https://swapi.dev/api/films/6/"
#> $ species : list()
#> $ vehicles :List of 2
#> ..$ : chr "https://swapi.dev/api/vehicles/14/"
#> ..$ : chr "https://swapi.dev/api/vehicles/30/"
#> $ starships :List of 2
#> ..$ : chr "https://swapi.dev/api/starships/12/"
#> ..$ : chr "https://swapi.dev/api/starships/22/"
#> $ created : chr "2014-12-09T13:50:51.644000Z"
#> $ edited : chr "2014-12-20T21:17:56.891000Z"
#> $ url : chr "https://swapi.dev/api/people/1/"
There’s lots of ways to deal with this sort of data (e.g. for loops or functional programming) but to make life easier, httr2 comes with its own helper,
resps_data()
. This function takes a callback that retrieves the data for each response, then concatenates all the data into a single object. In this case, we need to wrap
resp_body_json()
in a list, so we get one list for each person, rather than one list in total:
resps |>
resps_data(\(resp) list(resp_body_json(resp))) |>
_[1:3] |>
str(list.len = 10)
#> List of 3
#> $ :List of 16
#> ..$ name : chr "Luke Skywalker"
#> ..$ height : chr "172"
#> ..$ mass : chr "77"
#> ..$ hair_color: chr "blond"
#> ..$ skin_color: chr "fair"
#> ..$ eye_color : chr "blue"
#> ..$ birth_year: chr "19BBY"
#> ..$ gender : chr "male"
#> ..$ homeworld : chr "https://swapi.dev/api/planets/1/"
#> ..$ films :List of 4
#> .. ..$ : chr "https://swapi.dev/api/films/1/"
#> .. ..$ : chr "https://swapi.dev/api/films/2/"
#> .. ..$ : chr "https://swapi.dev/api/films/3/"
#> .. ..$ : chr "https://swapi.dev/api/films/6/"
#> .. [list output truncated]
#> $ :List of 16
#> ..$ name : chr "C-3PO"
#> ..$ height : chr "167"
#> ..$ mass : chr "75"
#> ..$ hair_color: chr "n/a"
#> ..$ skin_color: chr "gold"
#> ..$ eye_color : chr "yellow"
#> ..$ birth_year: chr "112BBY"
#> ..$ gender : chr "n/a"
#> ..$ homeworld : chr "https://swapi.dev/api/planets/1/"
#> ..$ films :List of 6
#> .. ..$ : chr "https://swapi.dev/api/films/1/"
#> .. ..$ : chr "https://swapi.dev/api/films/2/"
#> .. ..$ : chr "https://swapi.dev/api/films/3/"
#> .. ..$ : chr "https://swapi.dev/api/films/4/"
#> .. ..$ : chr "https://swapi.dev/api/films/5/"
#> .. ..$ : chr "https://swapi.dev/api/films/6/"
#> .. [list output truncated]
#> $ :List of 16
#> ..$ name : chr "R2-D2"
#> ..$ height : chr "96"
#> ..$ mass : chr "32"
#> ..$ hair_color: chr "n/a"
#> ..$ skin_color: chr "white, blue"
#> ..$ eye_color : chr "red"
#> ..$ birth_year: chr "33BBY"
#> ..$ gender : chr "n/a"
#> ..$ homeworld : chr "https://swapi.dev/api/planets/8/"
#> ..$ films :List of 6
#> .. ..$ : chr "https://swapi.dev/api/films/1/"
#> .. ..$ : chr "https://swapi.dev/api/films/2/"
#> .. ..$ : chr "https://swapi.dev/api/films/3/"
#> .. ..$ : chr "https://swapi.dev/api/films/4/"
#> .. ..$ : chr "https://swapi.dev/api/films/5/"
#> .. ..$ : chr "https://swapi.dev/api/films/6/"
#> .. [list output truncated]
Another option would be to convert each response into a data frame or tibble. That’s a little tricky here because of the nested lists that will need to become list-columns4, so we’ll avoid that challenge here by focussing on the first nine columns:
sw_data <- function(resp) {
tibble::as_tibble(resp_body_json(resp)[1:9])
}
resps |> resps_data(sw_data)
#> # A tibble: 10 × 9
#> name height mass hair_color skin_color eye_color birth_year gender
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue 19BBY male
#> 2 C-3PO 167 75 n/a gold yellow 112BBY n/a
#> 3 R2-D2 96 32 n/a white, bl… red 33BBY n/a
#> 4 Darth Vader 202 136 none white yellow 41.9BBY male
#> 5 Leia Organa 150 49 brown light brown 19BBY female
#> 6 Owen Lars 178 120 brown, gr… light blue 52BBY male
#> 7 Beru Whitesun… 165 75 brown light blue 47BBY female
#> 8 R5-D4 97 32 n/a white, red red unknown n/a
#> 9 Biggs Darklig… 183 84 black light brown 24BBY male
#> 10 Obi-Wan Kenobi 182 77 auburn, w… fair blue-gray 57BBY male
#> # ℹ 1 more variable: homeworld <chr>
When you’re performing large numbers of requests, it’s almost inevitable that something will go wrong. By default, all three functions will bubble up errors, causing you to lose all of the work that’s been done so far. You can, however, use the on_error
argument to change what happens, either ignoring errors, or returning when you hit the first error. This will changes the return value: instead of a list of responses, the list might now also contain error objects. httr2 provides other helpers to work with this object:
-
resps_successes()
filters the list to find the successful responses. You’ll can then pair this withresps_data()
to get the data from the successful request. -
resps_failures()
filters the list to find the failed responses. You’ll can then pair this withresps_requests()
to find the requests that generated them and figure out what went wrong,.
Acknowledgements
A big thanks to all 87 folks who have helped make httr2 possible!
@allenbaron, @asadow, @atheriel, @boshek, @casa-henrym, @cderv, @colmanhumphrey, @cstjohn810, @cwang23, @DavidRLovell, @DMerch, @dpprdan, @ECOSchulz, @edavidaja, @elipousson, @emmansh, @Enchufa2, @ErdaradunGaztea, @fangzhou-xie, @fh-mthomson, @fkohrt, @flahn, @gregleleu, @guga31bb, @gvelasq, @hadley, @hongooi73, @howardbaek, @jameslairdsmith, @JBGruber, @jchrom, @jemus42, @jennybc, @jimrothstein, @jjesusfilho, @jjfantini, @jl5000, @jonthegeek, @JosiahParry, @judith-bourque, @juliasilge, @kasperwelbers, @kelvindso, @kieran-mace, @KoderKow, @lassehjorthmadsen, @llrs, @lyndon-bird, @m-mohr, @maelle, @maxheld83, @mgirlich, @MichaelChirico, @michaelgfalk, @misea, @MislavSag, @mkoohafkan, @mmuurr, @multimeric, @nbenn, @nclsbarreto, @nealrichardson, @Nelson-Gon, @olivroy, @owenjonesuob, @paul-carteron, @pbulsink, @ramiromagno, @rplati, @rressler, @samterfa, @schnee, @sckott, @sebastian-c, @selesnow, @Shaunson26, @SokolovAnatoliy, @spotrh, @stefanedwards, @taerwin, @vanhry, @wing328, @xinzhuohkust, @yogat3ch, @yogesh-bansal, @yutannihilation, and @zacdav-db.
-
Pronounced “hitter 2”. ↩︎
-
Well, technically, it does send the request, just to another test server that returns the request that it received. ↩︎
-
This is only an approximation. For example, it only shows the final response if there were redirects, and it automatically uncompresses the body if it was compressed. Nevertheless, it’s still pretty useful. ↩︎
-
To turn these into list-columns, you need to wrap each list in another list, something like
is_list <- map_lgl(json, is.list); json[is_list] <- map(json[is_list], list)
. This ensures that each element has length 1, the invariant for a row in a tibble. ↩︎