Find and identify Sheets
Source:vignettes/articles/find-identify-sheets.Rmd
find-identify-sheets.Rmd
There are many ways to get your hands on your Sheets, in order to work with them via googlesheets4. They basically range from “ugly, but low effort” to “more humane, but more effort”.
Auth
As a regular, interactive user, you can just let googlesheets4 prompt you for anything it needs re: auth.
Since this article is compiled noninteractively on a server, we have arranged for googlesheets4 to use a service account token (not shown).
Use a URL
When you visit a Sheet in the browser, you can copy that URL to your clipboard. Such URLs look something like this:
https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077
which breaks down like this:
https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit#gid=SHEET_ID
Notice that this URL contains a (spread)Sheet ID and a (work)sheet ID. This URL happens to link to the official example Sheet that holds Gapminder data.
googlesheets4 accepts such a URL as the ss
argument
(think “spreadsheet”) of many functions:
ugly_url <- "https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077"
read_sheet(ugly_url)
#> ✔ Reading from gapminder.
#> ✔ Range Africa.
#> # A tibble: 624 × 6
#> country continent year lifeExp pop gdpPercap
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Algeria Africa 1952 43.1 9279525 2449.
#> 2 Algeria Africa 1957 45.7 10270856 3014.
#> 3 Algeria Africa 1962 48.3 11000948 2551.
#> 4 Algeria Africa 1967 51.4 12760499 3247.
#> 5 Algeria Africa 1972 54.5 14760787 4183.
#> # ℹ 619 more rows
At this time, although the URL may contain both a (spread)Sheet and a
(work)sheet, we only extract the (spread)Sheet ID. If the function
targets a specific (work)sheet, that is typically specified via
arguments like range
or sheet
or a default of
“first (visible) sheet”.
These URLs are not particularly nice to look at in your code, though.
Use a Sheet ID
You can extract the Sheet ID from a URL with
as_sheets_id()
(which is what we are doing internally to
make the URL work in the first place):
ssid <- as_sheets_id(ugly_url)
class(ssid)
#> [1] "sheets_id" "drive_id" "vctrs_vctr" "character"
unclass(ssid)
#> [1] "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
as_sheets_id()
is a generic function, which means it
knows what to do with a few different types of input. For character
string input, as_sheets_id()
passes a string through,
unless it looks like a URL. If it looks like a URL, we use a regular
expression to extract the Sheet ID. The returned string bears the
classes sheets_id
and drive_id
(for playing
nicely with googledrive).
Why did we call unclass(ssid)
above to see the naked
Sheet ID? Because, by default, when you print an instance of
sheets_id
, we attempt to show you some current metadata
about the Sheet.
ssid
#>
#> ── <googlesheets4_spreadsheet> ───────────────────────────────────────────
#> Spreadsheet name: gapminder
#> ID: 1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY
#> Locale: en_US
#> Time zone: America/Los_Angeles
#> # of sheets: 5
#> # of named ranges: 1
#>
#> ── <sheets> ──────────────────────────────────────────────────────────────
#> (Sheet name): (Nominal extent in rows x columns)
#> Africa: 625 x 6
#> Americas: 301 x 6
#> Asia: 397 x 6
#> Europe: 361 x 6
#> Oceania: 25 x 6
#>
#> ── <named ranges> ────────────────────────────────────────────────────────
#> (Named range): (A1 range)
#> canada: 'Americas'!A38:F49
This is the same metadata you’ll see when you call
gs4_get()
(but you must call gs4_get()
explicitly if you want to store the returned metadata).
googlesheets4 also accepts a Sheet ID as the ss
argument
of many functions:
read_sheet(ssid)
#> ✔ Reading from gapminder.
#> ✔ Range Africa.
#> # A tibble: 624 × 6
#> country continent year lifeExp pop gdpPercap
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Algeria Africa 1952 43.1 9279525 2449.
#> 2 Algeria Africa 1957 45.7 10270856 3014.
#> 3 Algeria Africa 1962 48.3 11000948 2551.
#> 4 Algeria Africa 1967 51.4 12760499 3247.
#> 5 Algeria Africa 1972 54.5 14760787 4183.
#> # ℹ 619 more rows
I think in a script or app that will endure for a while it is better to refer to a Sheet by its ID than by its URL. The Sheet ID is nicer to look at, it is complete, it is minimal.
ssid <- "1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY"
read_sheet(ssid)
#> ✔ Reading from gapminder.
#> ✔ Range Africa.
#> # A tibble: 624 × 6
#> country continent year lifeExp pop gdpPercap
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Algeria Africa 1952 43.1 9279525 2449.
#> 2 Algeria Africa 1957 45.7 10270856 3014.
#> 3 Algeria Africa 1962 48.3 11000948 2551.
#> 4 Algeria Africa 1967 51.4 12760499 3247.
#> 5 Algeria Africa 1972 54.5 14760787 4183.
#> # ℹ 619 more rows
Note this demonstration that a Sheet ID also works when provided as a
plain, old string, i.e. it does not have to have the
sheets_id
class. In some contexts, you might even prefer to
store it as a string, in order to bypass the special printing behaviour
for sheets_id
.
When the Sheet is specified via a character string, googlesheets4 assumes it is a Sheet ID (or an ID-containing URL). This is NOT the case for googledrive, which assumes a character string is a file name or path. Therefore, for maximum happiness, in a mixed googlesheets4 / googledrive workflow, it’s a good idea to be explicit and declare a string to be a Sheet ID, when that is the case.
ssid <- as_sheets_id("1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY")
read_sheet(ssid)
#> ✔ Reading from gapminder.
#> ✔ Range Africa.
#> # A tibble: 624 × 6
#> country continent year lifeExp pop gdpPercap
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Algeria Africa 1952 43.1 9279525 2449.
#> 2 Algeria Africa 1957 45.7 10270856 3014.
#> 3 Algeria Africa 1962 48.3 11000948 2551.
#> 4 Algeria Africa 1967 51.4 12760499 3247.
#> 5 Algeria Africa 1972 54.5 14760787 4183.
#> # ℹ 619 more rows
If your script or app targets a specific Sheet, the most efficient and robust way to address it is by its ID.
Use the Sheet’s name (uses googledrive)
A big feature of the googledrive package is the ability to navigate between human-friendly file names and machine-friendly file IDs. Both the Drive and Sheets APIs require the use of IDs, so the illusion that you can identify a Drive file by name is provided by the googledrive package. (A Google Sheet is just a special case of a Drive file … a file that happens to be a spreadsheet.)
If you need to refer to a Sheet by name, i.e. if you need to lookup its file ID based on its name, you must use the googledrive package for that. There are other reasons for using these two packages together: the Sheets API has an intentionally narrow focus on spreadsheet operations involving worksheets and cells. General whole-file operations, like copy / rename / move / share, must be done via the Drive API and, within R, via googledrive. See the article Using googlesheets4 with googledrive for more.
It’s time to attach googledrive (in addition to the already-attached googlesheets4):
(In our hidden auth chunk, we actually put a shared token into force for both googlesheets4 and googledrive, anticipating this moment.)
The Sheet we’ve been working with is named “gapminder” and is owned
by the account we’ve logged in as here. We can use
googledrive::drive_get()
to identify a file by name:
(gap_dribble <- drive_get("gapminder"))
#> ✔ The input `path` resolved to exactly 1 file.
#> # A dribble: 1 × 4
#> name path id drive_resource
#> <chr> <chr> <drv_id> <list>
#> 1 gapminder gapminder 1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6M… <named list>
drive_get()
returns a dribble
,
a “Drive tibble”, where each row holds info on a Drive file.
as_sheets_id()
also accepts a one-row dribble
,
so we can get right into our normal googlesheets4 workflows:
gap_id <- as_sheets_id(gap_dribble)
unclass(gap_id)
#> [1] "1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6MZOEb7Kew"
gap_id
#>
#> ── <googlesheets4_spreadsheet> ───────────────────────────────────────────
#> Spreadsheet name: gapminder
#> ID: 1ksUQqF_K5yKbJr_uWVnFVZhVuxwMyZCvK6MZOEb7Kew
#> Locale: en_US
#> Time zone: America/Los_Angeles
#> # of sheets: 5
#> # of named ranges: 1
#>
#> ── <sheets> ──────────────────────────────────────────────────────────────
#> (Sheet name): (Nominal extent in rows x columns)
#> Africa: 625 x 6
#> Americas: 301 x 6
#> Asia: 397 x 6
#> Europe: 361 x 6
#> Oceania: 25 x 6
#>
#> ── <named ranges> ────────────────────────────────────────────────────────
#> (Named range): (A1 range)
#> canada: 'Americas'!A38:F49
Since we generally call as_sheets_id()
on whatever the
user provides as ss
, you can even pass
gap_dribble
straight into googlesheets4 functions.
sheet_properties(gap_dribble)
#> # A tibble: 5 × 8
#> name index id type visible grid_rows grid_columns data
#> <chr> <int> <int> <chr> <lgl> <int> <int> <list>
#> 1 Africa 0 780868077 GRID TRUE 625 6 <NULL>
#> 2 Americas 1 45759261 GRID TRUE 301 6 <NULL>
#> 3 Asia 2 1984823455 GRID TRUE 397 6 <NULL>
#> 4 Europe 3 1503562052 GRID TRUE 361 6 <NULL>
#> 5 Oceania 4 1796776040 GRID TRUE 25 6 <NULL>
Two important things to note:
- googledrive requires auth for functions like
drive_get()
anddrive_find()
(see below). At first, you can just react to the interactive auth prompts and make sure you auth as the same user with googledrive and googlesheets4. Once you get tired of doing auth for both packages, read the article Using googlesheets4 with googledrive. Coordinated auth should get even easier in the future.- In our hidden auth here, we have taken special measures to use a shared token for googledrive and googlesheets4.
- Remember to use
drive_get()
on a Sheet name that you have. If you don’t have a Sheet named “gapminder”, the code above won’t yield anything. As a rough rule of thumb, if you don’t see it at spreadsheets.google.com, you can’tdrive_get()
it either.
List your Sheets (uses googledrive)
What if you want to see all of your Sheets? Or all the Sheets with “gap” in their name?
googledrive::drive_find()
is the workhorse function for
these tasks for general Drive files. It has lots of bells and whistles
and we can use one of them to narrow the search to Google Sheets:
drive_find(type = "spreadsheet")
#> # A dribble: 9 × 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 geological-canary 13fmUp01xitQrPvgw7LWDXNMHiMewxHnPuYHai… <named list>
#> 2 blubbery-lobo 1foTYXQEknymPpyug1xWOq7ituJ-ywLCaTEHaF… <named list>
#> 3 bacciform-booby 15WqCxHBCjY4P_i6-uTL9rUBgpcHRKQFrqheRS… <named list>
#> 4 childsafe-squid 1uGu_BHwialVz6okcuO5kRgLhaprClnzq5Orgc… <named list>
#> 5 fiery-hart 18KiiL2AGkL9nTkjHdOXGmeUQzqnOLqWnnOIEn… <named list>
#> # ℹ 4 more rows
This is so handy that we’ve made gs4_find()
in
googlesheets4, which is a shortcut for
drive_find(type = "spreadsheet")
:
gs4_find()
#> # A dribble: 9 × 3
#> name id drive_resource
#> <chr> <drv_id> <list>
#> 1 geological-canary 13fmUp01xitQrPvgw7LWDXNMHiMewxHnPuYHai… <named list>
#> 2 blubbery-lobo 1foTYXQEknymPpyug1xWOq7ituJ-ywLCaTEHaF… <named list>
#> 3 bacciform-booby 15WqCxHBCjY4P_i6-uTL9rUBgpcHRKQFrqheRS… <named list>
#> 4 childsafe-squid 1uGu_BHwialVz6okcuO5kRgLhaprClnzq5Orgc… <named list>
#> 5 fiery-hart 18KiiL2AGkL9nTkjHdOXGmeUQzqnOLqWnnOIEn… <named list>
#> # ℹ 4 more rows
See the examples for drive_find()
and
gs4_find()
for more ideas about how to search Drive
effectively for your Sheets.