btt22::btt_state()
[1] "2.1.1" "2.1.2" "2.2.1" "2.2.2" "2.2.3" "2.3.1" "2.3.2" "2.3.3" "2.3.4"
[10] "2.3.5" "2.3.6" "2.3.7" "2.4.1" "2.4.2" "2.4.3"
Day 2 Session 1: Function Design
Invalid Date
Standing on the shoulders of Building Tidy Tools, rstudio::conf(2020) (C. Wickham and Wickham 2021), R Packages (H. Wickham and Bryan 2020)
rstudio::conf code of conduct highlights:
if there is a problem:
conf@rstudio.com
The Code of Conduct and COVID policies can be found at https://www.rstudio.com/conference/2022/2022-conf-code-of-conduct/. Please review them carefully.
RStudio requires that you wear a mask that fully covers your mouth and nose at all times in all public spaces. We strongly recommend that you use a correctly fitted N95, KN95, or similar particulate filtering mask; we will have a limited supply available upon request.
You can report Code of Conduct violations in person, by email, or by phone. Please see the policy linked above for contact information.
conf22
together!
stickies
no stupid questions
Function design
Managing side effects
Tidy eval
Functional & object-oriented programming
We want to concentrate on specific concepts, rather than writing entire functions.
We have created a set of checkpoints called states:
btt22::btt_state()
[1] "2.1.1" "2.1.2" "2.2.1" "2.2.2" "2.2.3" "2.3.1" "2.3.2" "2.3.3" "2.3.4"
[10] "2.3.5" "2.3.6" "2.3.7" "2.4.1" "2.4.2" "2.4.3"
For example, "2.1.1"
means day 2, session 1, task 1.
To get new files for a state:
# "2.1.1": day 2, session 1, task 1
btt_get("2.1.1")
R
, tests/testthat
.One example builds on another, so it’s important to keep up.
We will do our best to help; in case you need to reset:
btt_reset_hard("2.1.1")
Overwrites:
R
, tests/testthat
Imports
, Suggests
sections of DESCRIPTION
At the end of this section you will be able to:
But first…
When we finished yesterday:
> checking R code for possible problems ... NOTE
uss_make_matches: no visible binding for global variable ‘tier’
uss_make_matches: no visible binding for global variable ‘Season’
uss_make_matches: no visible binding for global variable ‘Date’
uss_make_matches: no visible binding for global variable ‘home’
uss_make_matches: no visible binding for global variable ‘visitor’
uss_make_matches: no visible binding for global variable ‘hgoal’
uss_make_matches: no visible binding for global variable ‘vgoal’
Undefined global functions or variables:
Date Season hgoal home tier vgoal visitor
0 errors ✓ | 0 warnings ✓ | 1 note x
Where does tier
, etc. come from?
We know it’s a column in a data frame, but R doesn’t know that.
How to specify “this comes from a data frame” ?
The {rlang} package (Henry and Wickham 2022) provides pronouns.
In a package function, we would write:
mtcars |>
dplyr::mutate(cons_lper100km = 235.215 / .data$mpg)
"2.1.1"
usethis::use_package("rlang")
.data
, .env
pronouns:# adds to R/ussie-package.R
usethis::use_import_from("rlang", c(".data", ".env"))
matches.R
: use .data
, .env
in uss_make_matches()
.
devtools::check()
should be happy now.
A thing I like about tidyverse:
Because:
functions and arguments follow naming conventions
arguments are ordered according to purpose
we know what to expect for return values
The way we approach problems is always evolving; tidyverse is no exception:
If writing a smaller package, consider prefixing your functions:
{ussie}: uss_make_matches()
{btt22}: btt_get()
Use a verb next:
Use a noun if building up a specific type of object:
Tidyverse uses snake_case
; Shiny prefers camelCase
Python prefers snake_case
JavaScript prefers:
camelCase
for functionsPascalCase
for classes, interfacesPick a convention according to your domain, follow it.
Here, mtcars
is an argument:
head(mtcars)
Here, data
is a formal argument:
head <- function(data){
...
}
In R, we sometimes use these terms interchangeably; we sometimes use the term formals.
¯\_(ツ)_/¯
Like naming functions, strive to be:
There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton
And off-by-one errors – Leon Bambrick
...
): stuff that gets passed to other functionsI have seen the order of dots and details reversed.
However, data and descriptors almost always come first.
Which are: data, descriptors, details?
# there are acutally more args...
pivot_longer <- function(
data,
cols,
names_to = "name",
names_prefix = NULL
) {
...
}
Which are: data, descriptors, details?
# there are acutally more args...
pivot_longer <- function(
data, # data
cols, # descriptor
names_to = "name", # details
names_prefix = NULL #
) {
...
}
This is a key to tidyverse.
Type of return-value depends only on the types of the inputs.
return_tibble = TRUE
arguments.return same type as data (first) argument
return constant type, e.g. double
When I think of tidyverse functions, I can remember type for:
data (first) argument
return value
For example:
dplyr::mutate()
: tibble -> tibble
tidyr::pivot_longer()
: tibble -> tibble
tibble -> tibble pattern makes it easy to work with the pipe: |>
"2.1.2"
Implement a function, uss_get_matches()
:
country
, return a matches tibblebtt22::btt_reset_hard("2.1.2")
Get new files, btt22::btt_get("2.1.2")
:
columns.R
, get-matches.R
test-get-matches.R
usethis::use_package("engsoccerdata")
columns.R
In {ussie} we (will) have all sorts of tibbles:
and groupings:
columns.R
:devtools::load_all()
build_time
, run_time()
If you need to delay evaluation, try a function.
get_soccer_data()
Given name of dataset in {engsoccerdata}, return dataset:
uss_countries()
Return set of valid values for country
:
uss_countries <- function() {
c("england", "germany", "holland", "italy", "spain")
}
Safer habit: delay evaluation by wrapping code in a function
best_wins_leeds()
usethis::use_package("engsoccerdata")
devtools::check()
, comment best_wins_leeds()
restore commented-out code
uss_get_matches()
Given country
, return matches data:
Show an example (or two) of using uss_get_matches()
in your package vignette.
A function where:
the return value depends only on argument values
only change is the return value
Examples:
function(x, y) {
x + y
}
cos(pi)
A function where:
the return value can depend on “the outside universe”
there is a change in the “the outside universe”
Examples:
readr::read_csv("myfile.csv")
runif(1)
Are these {ussie} functions pure?
uss_countries()
uss_make_matches()
get_soccer_data()
uss_get_matches()
Try to separate tasks into pure functions and side effects:
easier to test the pure functions and side effects separately
use these functions in higher-level functions
For example:
uss_make_matches()
is a pure function.
get_soccer_data()
uses side effects.
uss_get_matches()
calls each of these functions.
|>
.Hadley’s keynote at rstudio::conf(2017):
not available on YouTube 😢
talks about tidyverse design
Joe Cheng’s talks (Part 1, Part 2) on reactivity at Shiny Developers Conference (2016), precursor to rstudio::conf():
these were the talks that changed my (Ian’s) perspective on programming
pure functions vs. side effects