We’re ecstatic to announce the release of googlesheets4 0.2.0 on CRAN:
googlesheets4 is a package to work with Google Sheets from R. Although version 0.1.0 debuted on CRAN in late 2019, I’ve waited to blog about it until I considered googlesheet4 a replacement for an older package called googlesheets. That day is here! In particular, googlesheets 0.2.0 can create and edit Sheets.
Install googlesheets4 from CRAN like so:
install.packages("googlesheets4")
Then attach it for use via:
library(googlesheets4)
Below we give a brief overview of the basic read and write functionality of googlesheets4. The website has several articles that document more advanced use and special topics, such as auth. As always, you can find detailed notes about all changes in the change log.
googlesheets is dead! Long live googlesheets4!
googlesheets4 is a reboot of an earlier package called googlesheets.
Why 4? Why googlesheets4? Did I miss googlesheets1 through 3? No. The idea is to name the package after the corresponding version of the Sheets API. In hindsight, the original googlesheets should have been googlesheets3.
googlesheets4 wraps v4 of the Sheets API, whereas the original googlesheets wraps v3. The v3 API has been deprecated for a long time. Its shut-off has been postponed several times, but its days are clearly numbered. Gradual shutdown of certain endpoints and scopes is already underway, with complete shutdown expected on September 30, 2020 at the time of writing. If you use googlesheets, you must switch to googlesheets4 and the time is now.
Change is hard, but let’s focus on the positive:
- The v4 API has a better design, which makes it easier to wrap, which translates to a better user experience in client packages.
- I have gotten a lot better at writing R packages. Some of the fiddly parts of the original googlesheets came from the awkward v3 API, but some of it came from me.
- The gargle package ( gargle.r-lib.org) provides infrastructure common to ~250 Google APIs, including Sheets, Drive, Gmail, and BigQuery. By using gargle for auth, we can provide a consistent auth experience across googlesheets4, googledrive, gmailr, and bigrquery.
- The googledrive package ( googledrive.tidyverse.org) provides a full-featured client for working with files on Google Drive, which allows googlesheets4 to focus on operations specific to spreadsheets. In the original googlesheets package, about half of the code and effort was actually related to Drive, not Sheets.
Read a Sheet
Let’s say you’re staring at a Sheet1 in the browser and you want to read it into R. Copy the URL to your clipboard and paste it into a call to read_sheet()
like this:
read_sheet("https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077")
#> Reading from "gapminder"
#> Range "Africa"
#> # A tibble: 624 x 6
#> country continent year lifeExp pop gdpPercap
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Algeria Africa 1952 43.1 9279525 2449.
#> 2 Algeria Africa 1957 45.7 10270856 3014.
#> 3 Algeria Africa 1962 48.3 11000948 2551.
#> 4 Algeria Africa 1967 51.4 12760499 3247.
#> 5 Algeria Africa 1972 54.5 14760787 4183.
#> # … with 619 more rows
If you’re following along at home, you probably just got a prompt to log in with Google. That’s because, in general, you’ll want googlesheets4 to be able to do the same things you can do with Sheets in the browser. If you know you only want to read public Sheets, you can use gs4_deauth()
to tell googlesheets4 that it should not attempt auth.
I’m reading from one of our public example Sheets — specifically, a Sheet that holds Gapminder data. A browser URL is OK for quick-and-dirty work, but there are other ways to target a Sheet that are more robust and easier on the eyes.
You will often want to refer to your Sheets by ID or by name. For the public example Sheets, we offer convenience functions gs4_example()
and gs4_examples()
to get IDs based on name.
gap_id <- gs4_example("gapminder")
class(gap_id)
#> [1] "sheets_id" "drive_id"
unclass(gap_id)
#> gapminder
#> "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
gap_id
#> Spreadsheet name: gapminder
#> ID: 1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY
#> Locale: en_US
#> Time zone: America/Los_Angeles
#> # of sheets: 5
#>
#> (Sheet name): (Nominal extent in rows x columns)
#> Africa: 625 x 6
#> Americas: 301 x 6
#> Asia: 397 x 6
#> Europe: 361 x 6
#> Oceania: 25 x 6
#>
#> (Named range): (A1 range)
#> canada: 'Americas'!A38:F49
The above demonstrates that printing a Sheet ID (literally, an instance of sheets_id
) reveals relevant metadata about the Sheet, such as its name and an overview of its worksheets and named ranges.
read_sheet()
is happy to accept a Sheet ID, instead of a URL. Such a Sheet ID might be an instance of sheets_id
, but you can also provide the ID as a string. Here we show piping a Sheet ID into read_sheet()
and we start to demonstrate other features, e.g. the use of a qualified A1-style range
.
gs4_example("gapminder") %>%
read_sheet(range = "Asia!A:D")
#> Reading from "gapminder"
#> Range "'Asia'!A:D"
#> # A tibble: 396 x 4
#> country continent year lifeExp
#> <chr> <chr> <dbl> <dbl>
#> 1 Afghanistan Asia 1952 28.8
#> 2 Afghanistan Asia 1957 30.3
#> 3 Afghanistan Asia 1962 32.0
#> 4 Afghanistan Asia 1967 34.0
#> 5 Afghanistan Asia 1972 36.1
#> # … with 391 more rows
But how do you get a Sheet ID for your own Sheets? First, you can use as_sheets_id()
to extract the ID from various types of input, such as a browser URL. I use as.character()
here to suppress the default printing behaviour, so we can just stare at the plain string.
as_sheets_id("https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077") %>%
as.character()
#> [1] "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
We can also use googledrive’s ability to address Drive files by name2 to help us identify the Sheet of interest. The account I’m logged in with owns a Sheet named “deaths” and googledrive::drive_get()
retrieves its metadata as a dribble
(“Drive tibble”). as_sheets_id()
is also happy to take a dribble
as input, but it must contain exactly 1 row, describing a Drive file that is also a Sheet.
deaths_dribble <- googledrive::drive_get("deaths")
deaths_dribble
#> # A tibble: 1 x 4
#> name path id drive_resource
#> <chr> <chr> <chr> <list>
#> 1 deaths ~/deaths 1VTJjWoP1nshbyxmL9JqXgdVsimaYty21LGxxs018H2Y <named list [35]>
as_sheets_id(deaths_dribble)
#> Spreadsheet name: deaths
#> ID: 1VTJjWoP1nshbyxmL9JqXgdVsimaYty21LGxxs018H2Y
#> Locale: en_US
#> Time zone: America/Los_Angeles
#> # of sheets: 2
#>
#> (Sheet name): (Nominal extent in rows x columns)
#> arts: 1000 x 26
#> other: 1000 x 26
#>
#> (Named range): (A1 range)
#> other_data: 'other'!A5:F15
#> arts_data: 'arts'!A5:F15
read_sheet()
also accepts a one-row dribble
as input. Here I also show the use of range
to target a named range and specify some of the column types:
deaths_dribble %>%
read_sheet(range = "arts_data", col_types = "??i?DD")
#> Reading from "deaths"
#> Range "arts_data"
#> # A tibble: 10 x 6
#> Name Profession Age `Has kids` `Date of birth` `Date of death`
#> <chr> <chr> <int> <lgl> <date> <date>
#> 1 David Bowie musician 69 TRUE 1947-01-08 2016-01-10
#> 2 Carrie Fisher actor 60 TRUE 1956-10-21 2016-12-27
#> 3 Chuck Berry musician 90 TRUE 1926-10-18 2017-03-18
#> 4 Bill Paxton actor 61 TRUE 1955-05-17 2017-02-25
#> 5 Prince musician 57 TRUE 1958-06-07 2016-04-21
#> # … with 5 more rows
read_sheet()
is the main “read” function of googlesheets4 and should remind you of other table-reading functions, like readr::read_csv()
and readxl::read_excel()
. It also goes by another name: range_read()
, which is the most correct name according to googlesheets4’s naming conventions. Either name is fine! It’s OK if you don’t care about this, I just want to give you a heads up. If you make extensive use of googlesheets4, you’ll notice there are 3 large families of functions, with the prefixes gs4_
, sheet_
, and range_
. The prefix conveys a function’s scope of operation.
Remember there are articles that go into much more depth on reading Sheets, column types, and using googlesheets4 in concert with googledrive.
Write and modify a Sheet
I’ll go out in a blaze of glory, demonstrating just a few of the functions that can create and edit a Sheet.
First I create a new Sheet with gs4_create()
and send some initial data to two well-named worksheets:
ss <- gs4_create(
"able-aardvark",
sheets = list(flowers = head(iris), autos = head(mtcars))
)
#> Creating new Sheet: "able-aardvark"
ss
#> Spreadsheet name: able-aardvark
#> ID: 1OV9Mxf-NdTyB1ChmlVKEFy0e1AaPAJCahzcWBw1w3qQ
#> Locale: en_US
#> Time zone: Etc/GMT
#> # of sheets: 2
#>
#> (Sheet name): (Nominal extent in rows x columns)
#> flowers: 7 x 5
#> autos: 7 x 11
read_sheet(ss)
#> Reading from "able-aardvark"
#> Range "flowers"
#> # A tibble: 6 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> # … with 1 more row
If you’re following along at home, you can use gs4_browse()
to open a Sheet in your default web browser. This is especially helpful when you are developing code that writes to a Sheet.
gs4_browse(ss)
Now I send a third data frame to an entirely new (work)sheet within this existing (spread)Sheet with write_sheet()
.
my_data <- data.frame(
numbers = c(1, 5, 3, 2, 4, 6),
letters = letters[1:6]
)
write_sheet(my_data, ss = ss)
#> Writing to "able-aardvark"
#> Writing to sheet "my_data"
ss
#> Spreadsheet name: able-aardvark
#> ID: 1OV9Mxf-NdTyB1ChmlVKEFy0e1AaPAJCahzcWBw1w3qQ
#> Locale: en_US
#> Time zone: Etc/GMT
#> # of sheets: 3
#>
#> (Sheet name): (Nominal extent in rows x columns)
#> flowers: 7 x 5
#> autos: 7 x 11
#> my_data: 7 x 2
Let’s append a row to the bottom of my_data
and populate it with formulas that summarize the data in each column (hey, a sparkline!).
my_summaries <- data.frame(
x = gs4_formula('=SPARKLINE(A2:A7, {"color", "blue"})'),
y = gs4_formula('=JOIN("-", B2:B7)')
)
ss %>%
sheet_append(my_summaries, sheet = "my_data")
#> Writing to "able-aardvark"
#> Appending 1 row(s) to "my_data"
Notice that we did not have to specify which row to write the new values into. sheet_append()
uses an endpoint that knows how to add rows to the existing table of data. It’s true that we used some row knowledge in our formulas, but more often you will just be sending data. Also notice the use of gs4_formula()
. This is how you indicate that character data should be sent as Sheets formulas, as opposed to regular strings.
Let’s take one last glance at our creation.
ss
#> Spreadsheet name: able-aardvark
#> ID: 1OV9Mxf-NdTyB1ChmlVKEFy0e1AaPAJCahzcWBw1w3qQ
#> Locale: en_US
#> Time zone: Etc/GMT
#> # of sheets: 3
#>
#> (Sheet name): (Nominal extent in rows x columns)
#> flowers: 7 x 5
#> autos: 7 x 11
#> my_data: 8 x 2
In the browser, the my_data
sheet looks something like this at this point:
Finally, we clean up. Note that we (must) use googledrive for this. The Sheets API can create a Sheet, but alas it cannot delete one. For that (and most other “whole file” operations), we must use the Drive API, which is why googlesheets4 is designed to work with googledrive.
googledrive::drive_trash(ss)
#> Files trashed:
#> * able-aardvark: 1OV9Mxf-NdTyB1ChmlVKEFy0e1AaPAJCahzcWBw1w3qQ
Once again, the articles provide much deeper coverage of topics like identifying and modifying Sheets.
Try it out
The googlesheets4 package is marked as experimental, but it’s really somewhere between experimental and maturing. This is the first CRAN release that includes write and edit capability, so I want the flexibility to make some relatively quick, modest changes to the interface in response to user feedback. But such changes get more painful for everyone the longer we wait. Now is a great time to take googlesheets4 out for a test drive.
If you encounter ergonomic problems or spot tasks that were possible with googlesheets but are not yet possible with googlesheet4, please open an issue.
Thank you!
A big thanks to all 64 contributors who’ve helped to bring googlesheets4 to this point via their contributions on GitHub. @4marel, @AaronGullickson, @adisarid, @ahelgason, @alex-steiner-next, @aljrico, @amir2cs, @antoine-sachet, @batpigandme, @behrman, @ben519, @Biomiha, @BriBecker, @ceprdata, @ChemiKyle, @chrowe, @cneskey, @csnardi, @dan-reznik, @dcaley5005, @Flavjack, @GitHubDoug, @gloignon, @grwhumphries, @guhanrv, @ianformanek, @j-Rinehart, @jennybc, @jiaoshuo, @jimhester, @jjankowiak, @jpawlata, @jperkel, @juliangilbey, @jzadra, @karawoo, @kaveh1000, @lucasmation, @MarkEdmondson1234, @mikegunn, @mine-cetinkaya-rundel, @mitchelloharawild, @mlamias, @mountainMath, @mssanjavickovic, @nacnudus, @nathanhwangbo, @nicole-brewer, @ogs22, @pachamaltese, @peeter-t2, @ramirobentes, @realauggieheschmeyer, @RSherwoodJr, @sam-watts, @schmalte04, @seanchrismurphy, @selesnow, @somnambWl, @SridharJagannathan, @Tadge-Analytics, @untergeekDE, @wildcat47, and @yogat3ch.
-
It’s very easy to get confused about vocabulary when talking about spreadsheets and, especially, about Google Sheets. When we say capital-S “Sheet”, we mean a spreadsheet or (spread)Sheet, i.e. the Google equivalent of an Excel workbook or
.xlsx
file. Indeed, the actual Sheets API uses the term “spreadsheet”. An individual “tab” inside a Sheet is a (work)sheet, i.e. small-s “sheet”. So one Sheet contains one or more sheets. ↩︎ -
Believe it or not, files on Drive don’t have to have a unique name. You can have multiple files named “foofy” on Drive, even in the same folder. In fact, historically, a file can also belong to multiple folders (although this is mercifully being phased out). The main takeaway is that your usual expectations about “a name or filepath identifies at most one file” and “a file is identified by exactly one name or filepath” don’t hold on Drive. ↩︎