There are many ways to get your hands on your Sheets, in order to work with them via googlesheets4. They basically range from “ugly, but low effort” to “more humane, but more effort”.
As a regular, interactive user, you can just let googlesheets4 prompt you for anything it needs re: auth.
Since this article is compiled noninteractively on a server, we have arranged for googlesheets4 to use a service account token (not shown).
When you visit a Sheet in the browser, you can copy that URL to your clipboard. Such URLs look something like this:
which breaks down like this:
Notice that this URL contains a (spread)Sheet ID and a (work)sheet ID. This URL happens to link to the official example Sheet that holds Gapminder data.
googlesheets4 accepts such a URL as the
ss argument (think “spreadsheet”) of many functions:
ugly_url <- "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077" read_sheet(ugly_url) #> Error: Server error: (500) INTERNAL #> • Internal server error. Typically a server bug. #> • Internal error encountered.
At this time, although the URL may contain both a (spread)Sheet and a (work)sheet, we only extract the (spread)Sheet ID. If the function targets a specific (work)sheet, that is typically specified via arguments like
sheet or a default of “first (visible) sheet”.
These URLs are not particularly nice to look at in your code, though.
You can extract the Sheet ID from a URL with
as_sheets_id() (which is what we are doing internally to make the URL work in the first place):
ssid <- as_sheets_id(ugly_url) class(ssid) #>  "sheets_id" "drive_id" unclass(ssid) #>  "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
as_sheets_id() is a generic function, which means it knows what to do with a few different types of input. For character string input,
as_sheets_id() passes a string through, unless it looks like a URL. If it looks like a URL, we use a regular expression to extract the Sheet ID. The returned string bears the classes
drive_id (for playing nicely with googledrive).
Why did we call
unclass(ssid) above to see the naked Sheet ID? Because, by default, when you print an instance of
sheets_id, we attempt to show you some current metadata about the Sheet.
ssid #> Spreadsheet name: gapminder #> ID: 1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY #> Locale: en_US #> Time zone: America/Los_Angeles #> # of sheets: 5 #> # of named ranges: 1 #> #> (Sheet name): (Nominal extent in rows x columns) #> Africa: 625 x 6 #> Americas: 301 x 6 #> Asia: 397 x 6 #> Europe: 361 x 6 #> Oceania: 25 x 6 #> #> (Named range): (A1 range) #> canada: 'Americas'!A38:F49
googlesheets4 also accepts a Sheet ID as the
ss argument of many functions:
read_sheet(ssid) #> Error: Server error: (503) UNAVAILABLE #> • Service unavailable. Typically the server is down. #> • The service is currently unavailable.
I think in a script or app that will endure for a while it is better to refer to a Sheet by its ID than by its URL. The Sheet ID is nicer to look at, it is complete, it is minimal.
ssid <- "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY" read_sheet(ssid) #> Error: Server error: (503) UNAVAILABLE #> • Service unavailable. Typically the server is down. #> • The service is currently unavailable.
Note this demonstration that a Sheet ID also works when provided as a plain, old string, i.e. it does not have to have the
sheets_id class. In some contexts, you might even prefer to store it as a string, in order to bypass the special printing behaviour for
When the Sheet is specified via a character string, googlesheets4 assumes it is a Sheet ID (or an ID-containing URL). This is NOT the case for googledrive, which assumes a character string is a file name or path. Therefore, for maximum happiness, in a mixed googlesheets4 / googledrive workflow, it’s a good idea to be explicit and declare a string to be a Sheet ID, when that is the case.
ssid <- as_sheets_id("1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY") read_sheet(ssid) #> ✔ Reading from 'gapminder' #> ✔ Range Africa #> # A tibble: 624 x 6 #> country continent year lifeExp pop gdpPercap #> <chr> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Algeria Africa 1952 43.1 9279525 2449. #> 2 Algeria Africa 1957 45.7 10270856 3014. #> 3 Algeria Africa 1962 48.3 11000948 2551. #> 4 Algeria Africa 1967 51.4 12760499 3247. #> 5 Algeria Africa 1972 54.5 14760787 4183. #> # … with 619 more rows
If your script or app targets a specific Sheet, the most efficient and robust way to address it is by its ID.
A big feature of the googledrive package is the ability to navigate between human-friendly file names and machine-friendly file IDs. Both the Drive and Sheets APIs require the use of IDs, so the illusion that you can identify a Drive file by name is provided by the googledrive package. (A Google Sheet is just a special case of a Drive file … a file that happens to be a spreadsheet.)
If you need to refer to a Sheet by name, i.e. if you need to lookup its file ID based on its name, you must use the googledrive package for that. There are other reasons for using these two packages together: the Sheets API has an intentionally narrow focus on spreadsheet operations involving worksheets and cells. General whole-file operations, like copy / rename / move / share, must be done via the Drive API and, within R, via googledrive. See the article Using googlesheets4 with googledrive for more.
It’s time to attach googledrive (in addition to the already-attached googlesheets4):
(In our hidden auth chunk, we actually put a shared token into force for both googlesheets4 and googledrive, anticipating this moment.)
The Sheet we’ve been working with is named “gapminder” and is owned by the account we’ve logged in as here. We can use
googledrive::drive_get() to identify a file by name:
(gap_dribble <- drive_get("gapminder")) #> ✔ The input `path` resolved to exactly 1 file. #> # A dribble: 1 x 4 #> name path id drive_resource #> * <chr> <chr> <chr> <list> #> 1 gapminder gapminder 1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6MZOEb… <named list [33…
gap_id <- as_sheets_id(gap_dribble) unclass(gap_id) #>  "1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6MZOEb7Kew" gap_id #> Spreadsheet name: gapminder #> ID: 1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6MZOEb7Kew #> Locale: en_US #> Time zone: America/Los_Angeles #> # of sheets: 5 #> # of named ranges: 1 #> #> (Sheet name): (Nominal extent in rows x columns) #> Africa: 625 x 6 #> Americas: 301 x 6 #> Asia: 397 x 6 #> Europe: 361 x 6 #> Oceania: 25 x 6 #> #> (Named range): (A1 range) #> canada: 'Americas'!A38:F49
Since we generally call
as_sheets_id() on whatever the user provides as
ss, you can even pass
gap_dribble straight into googlesheets4 functions.
sheet_properties(gap_dribble) #> # A tibble: 5 x 8 #> name index id type visible grid_rows grid_columns data #> <chr> <int> <int> <chr> <lgl> <int> <int> <list> #> 1 Africa 0 780868077 GRID TRUE 625 6 <NULL> #> 2 Americas 1 45759261 GRID TRUE 301 6 <NULL> #> 3 Asia 2 1984823455 GRID TRUE 397 6 <NULL> #> 4 Europe 3 1503562052 GRID TRUE 361 6 <NULL> #> 5 Oceania 4 1796776040 GRID TRUE 25 6 <NULL>
Two important things to note:
drive_find()(see below). At first, you can just react to the interactive auth prompts and make sure you auth as the same user with googledrive and googlesheets4. Once you get tired of doing auth for both packages, read the article Using googlesheets4 with googledrive. Coordinated auth should get even easier in the future.
drive_get()on a Sheet name that you have. If you don’t have a Sheet named “gapminder”, the code above won’t yield anything. As a rough rule of thumb, if you don’t see it at spreadsheets.google.com, you can’t
What if you want to see all of your Sheets? Or all the Sheets with “gap” in their name?
googledrive::drive_find() is the workhorse function for these tasks for general Drive files. It has lots of bells and whistles and we can use one of them to narrow the search to Google Sheets:
drive_find(type = "spreadsheet") #> # A dribble: 8 x 3 #> name id drive_resource #> * <chr> <chr> <list> #> 1 blubbery-lobo 1foTYXQEknymPpyug1xWOq7ituJ-ywLCaTEHaF2y4uC4 <named list > #> 2 bacciform-booby 15WqCxHBCjY4P_i6-uTL9rUBgpcHRKQFrqheRSSOgHhE <named list > #> 3 childsafe-squid 1uGu_BHwialVz6okcuO5kRgLhaprClnzq5Orgc9BSSOo <named list > #> 4 fiery-hart 18KiiL2AGkL9nTkjHdOXGmeUQzqnOLqWnnOIEnWY6HSo <named list > #> 5 chicken-sheet 1StV8oqHa0SVo58ztLGPH5mDMMHIBsiF6VO5GUJjM7u0 <named list > #> # … with 3 more rows
gs4_find() #> # A dribble: 8 x 3 #> name id drive_resource #> * <chr> <chr> <list> #> 1 blubbery-lobo 1foTYXQEknymPpyug1xWOq7ituJ-ywLCaTEHaF2y4uC4 <named list > #> 2 bacciform-booby 15WqCxHBCjY4P_i6-uTL9rUBgpcHRKQFrqheRSSOgHhE <named list > #> 3 childsafe-squid 1uGu_BHwialVz6okcuO5kRgLhaprClnzq5Orgc9BSSOo <named list > #> 4 fiery-hart 18KiiL2AGkL9nTkjHdOXGmeUQzqnOLqWnnOIEnWY6HSo <named list > #> 5 chicken-sheet 1StV8oqHa0SVo58ztLGPH5mDMMHIBsiF6VO5GUJjM7u0 <named list > #> # … with 3 more rows