05:00
rstudio::conf(2022)
Designing the data science classroom
Mine Çetinkaya-Rundel + Maria Tackett
Dr. Mine Çetinkaya-Rundel
Dr. Maria Tackett
Introduce yourselves:
One link for all materials
Time | Activity |
---|---|
09:00 - 10:30 | Hello #teachds |
10:30 - 11:00 | Coffee break |
11:00 - 12:30 | Teaching data science with the tidyverse |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Teaching modern modeling with tidymodels |
15:00 - 15:30 | Coffee break |
15:30 - 17:00 | Interactivity and immediate feedback with learnr |
Time | Activity |
---|---|
09:00 - 10:30 | Computing infrastructure with RStudio Cloud |
10:30 - 11:00 | Coffee break |
11:00 - 12:30 | Reproducible workflows: Quarto, Git, GitHub |
12:30 - 13:30 | Lunch break |
13:30 - 15:00 | Making a data package |
15:00 - 15:30 | Coffee break |
15:30 - 17:00 | Organizing teaching materials + Wrap-up / Q&A |
Username: conf22
Password: together!
All details are available at https://www.rstudio.com/conference/2022/2022-conf-code-of-conduct/. Please review them carefully.
You can report Code of Conduct violations in person (any rstudio::conf staff ), by email (conf@rstudio.com), or by phone (844-448-1212). Please see the policy linked above for contact information.
Covid19 specific policies:
RStudio requires that you wear a mask that fully covers your mouth and nose at all times in all public spaces.
We strongly recommend that you use a correctly fitted N95, KN95, or similar particulate filtering mask; there is a limited supply available upon request.
There are gender neutral bathrooms by the National Harbor rooms.
The meditation room is located at National Harbor 9. Open 8am - 5pm, Monday - Thursday. The hotel also has a dedicated room behind the reception.
The lactation room is located at Potomac Dressing Room. Open 8am - 5pm, Monday - Thursday.
Participants who do not wish to be photographed have red lanyards, please note everyone’s lanyard colors before taking a photo and respect their choices.
I’m stuck
I’m done
I have a general question
You should have received an email with an invitation and instructions for joining the conference’s discord server.
This workshop has a private channel under Workshops:
#📚designing-the-data-science-classroom
This is a great place to ask questions, share responses to exercises, post resources, memes, or most anything else before, during, and after the workshop.
Take a minute to
You can use the following link to join the workshops RStudio cloud space,
Once you have joined, navigate to Projects on the top menu.
If you’d like to use your own system, please see https://rstudio-conf-2022.github.io/teach-ds/#install.
Imagine you’re new to baking, and you’re in a baking class. I’m going to present two options for starting the class. Which one gives you better sense of the final product?
Today we’re going to make a pineapple and coconut sandwich sponge cake with these ingredients
Today we’re going to make a pineapple and coconut sandwich sponge cake with these ingredients
Set goals for educational curriculum before choosing instructional methods + forms of assessment
2016 Guidelines for Assessment and Instruction in Statistics Education
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking […] to answer challenging questions that require them to investigate and explore relationships among many variables.
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking […] to answer challenging questions that require them to investigate and explore relationships among many variables.
NOT a commonly used subset of tests and intervals and produce them with hand calculations
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking […] to answer challenging questions that require them to investigate and explore relationships among many variables.
Multivariate analysis requires the use of computing
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking […] to answer challenging questions that require them to investigate and explore relationships among many variables.
NOT use technology that is only applicable in the intro course or that doesn’t follow good science principles
Teach statistics as an investigative process of problem-solving and decision making.
Give students experience with multivariable thinking […] to answer challenging questions that require them to investigate and explore relationships among many variables.
Data analysis isn’t just inference and modelling, it’s also data importing, cleaning, preparation, exploration, and visualization
Discuss in pairs and then as a large group.
05:00
Go to rstd.io/teach-ds-conf22-cloud to join the RStudio Cloud workspace for this workshop > Log in > Project (top left) > Start “Module 1 - Hello” > ex-1-1.qmd
Open the Quarto document called ex-1-1.qmd
, render the document, view the result. Then, change “Turkey” to another country, and render again.
Discuss with your neighbor:
15:00
Turkey
to a different country, and plot againun_votes |>
filter(country %in% c("United States", "Turkey")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
mutate(issue = ifelse(issue == "Nuclear weapons and nuclear material",
"Nuclear weapons and materials", issue)) |>
group_by(country, year = year(date), issue) |>
summarize(
votes = n(),
percent_yes = mean(vote == "yes")
) |>
filter(votes > 5) |>
ggplot(mapping = aes(x = year, y = percent_yes, color = country)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(~ issue) +
labs(
title = "Percentage of Yes votes in the UN General Assembly",
subtitle = "1946 to 2015",
y = "% Yes",
x = "Year",
color = "Country"
)
un_votes |>
filter(country %in% c("United States", "Turkey")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
group_by(country, year = year(date), issue) |>
summarize(
votes = n(),
perc_yes = mean(vote == "yes")
) |>
filter(votes > 5) |>
ggplot(mapping = aes(x = year, y = perc_yes, color = country)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(~ issue) +
labs(
title = "Percentage of Yes votes in the UN General Assembly",
subtitle = "1946 to 2015",
y = "% Yes", x = "Year", color = "Country"
)
un_votes |>
filter(country %in% c("United States", "France")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
group_by(country, year = year(date), issue) |>
summarize(
votes = n(),
perc_yes = mean(vote == "yes")
) |>
filter(votes > 5) |>
ggplot(mapping = aes(x = year, y = perc_yes, color = country)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(~ issue) +
labs(
title = "Percentage of Yes votes in the UN General Assembly",
subtitle = "1946 to 2015",
y = "% Yes", x = "Year", color = "Country"
)
RStudio Cloud > “Module 1 - Hello” > ex-1-2.qmd
ggplot2::diamonds
or dplyr::starwars
dataset or any dataset from nycflights13
or gapminder
packages.Compare notes with your neighbor. Share your exercise on Discord.
15:00
The following code is used to create the multivariate visualisation we saw earlier. How much of the code would you show/hide when just starting teaching ggplot2?
un_votes |>
filter(country %in% c("United States")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
mutate(
importantvote = ifelse(importantvote == 0, "No", "Yes"),
issue = ifelse(issue == "Nuclear weapons and nuclear material", "Nuclear weapons and materials", issue)
) |>
ggplot(aes(y = importantvote, fill = vote)) +
geom_bar(position = "fill") +
facet_wrap(~ issue, ncol = 1) +
labs(
title = "How the US voted in the UN",
subtitle = "By issue and importance of vote",
x = "Important vote", y = "", fill = "Vote"
) +
theme_minimal() +
scale_fill_viridis_d(option = "E")
05:00
Write it out to your heart’s desire and polish it
Then, split into three parts:
Finally, decide on the pace at which to scaffold and layer
We’ll call the highlighted lines us_votes
un_votes |>
filter(country %in% c("United States")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
mutate(
importantvote = ifelse(importantvote == 0, "No", "Yes"),
issue = ifelse(issue == "Nuclear weapons and nuclear material", "Nuclear weapons and materials", issue)
) |>
ggplot(aes(y = importantvote, fill = vote)) +
geom_bar(position = "fill") +
facet_wrap(~ issue, ncol = 1) +
labs(
title = "How the US voted in the UN",
subtitle = "By issue and importance of vote",
x = "Important vote", y = "", fill = "Vote"
) +
theme_minimal() +
scale_fill_viridis_d(option = "E")
# A tibble: 5,718 × 14
rcid country country_code vote session importantvote date unres amend para short descr short_name issue
<dbl> <chr> <chr> <fct> <dbl> <chr> <date> <chr> <int> <int> <chr> <chr> <chr> <chr>
1 6 United States US no 1 No 1946-01-04 R/1/107 0 0 DECLARATION OF HUMAN RIGHTS "TO ADOPT A CUBAN PROPOSAL (A/3-C) THAT AN ITEM ON A DEC… hr 4
2 8 United States US no 1 No 1946-01-05 R/1/297 1 0 ECOSOC POWERS "TO ADOPT A SECOND 6TH COMM. AMENDMENT (A/14) TO THE PRO… ec 3
3 11 United States US yes 1 No 1946-02-05 R/1/376 0 0 TRUSTEESHIP AMENDMENTS "TO ADOPT DRAFT RESOLUTIONS I AND II AS A WHOLE, OF THE … co 1
4 11 United States US yes 1 No 1946-02-05 R/1/376 0 0 TRUSTEESHIP AMENDMENTS "TO ADOPT DRAFT RESOLUTIONS I AND II AS A WHOLE, OF THE … ec 3
5 18 United States US no 1 No 1946-02-03 R/1/532 1 0 ECOSOC CONSULTANTS "TO ADOPT USSR (ORAL) AMENDMENT REPLACING THE 1ST COMM. … ec 3
6 19 United States US yes 1 No 1946-02-03 R/1/534 0 0 ECOSOC CONSULTANTS "TO ADOPT THE 1ST COMM. DRAFT RESOLUTION (A/54/REV.1) PR… ec 3
7 24 United States US yes 1 No 1946-12-05 R/1/1229 0 0 ECOSOC ELECTIONS "TO ADOPT BELGIAN ORAL PROPOSAL TO SURRENDER BELGIUM'S S… ec 3
8 26 United States US no 1 No 1946-12-06 R/1/1286 0 0 TRUSTEESHIP AGREEMENTS "TO ADOPT USSR ORAL RESOL. REJECTING 8 DRAFT TRUSTEESHIP… co 1
9 27 United States US yes 1 No 1946-12-06 R/1/1287/A 0 0 NEW GUINEA TRUSTEESHIP "TO ADOPT THE TRUSTEESHIP AGREEMENT FOR NEW GUINEA SUBMI… co 1
10 28 United States US yes 1 No 1946-12-06 R/1/1287/B 0 0 RUANDA-URUNDI TRUSTEESHIP "TO ADOPT THE TRUSTEESHIP AGREEMENT FOR RUANDA-URUNDI SU… co 1
# … with 5,708 more rows
un_votes |>
filter(country %in% c("United States")) |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid") |>
mutate(
importantvote = ifelse(importantvote == 0, "No", "Yes"),
issue = ifelse(issue == "Nuclear weapons and nuclear material", "Nuclear weapons and materials", issue)
) |>
ggplot(aes(y = importantvote, fill = vote)) +
geom_bar(position = "fill") +
facet_wrap(~ issue, ncol = 1) +
labs(
title = "How the US voted in the UN",
subtitle = "By issue and importance of vote",
x = "Important vote", y = "", fill = "Vote"
) +
theme_minimal() +
scale_fill_viridis_d(option = "E")
Today we’re going to do web scraping
Estimate the difference between the average evaluation score of male and female faculty.
Welch Two Sample t-test
data: evals$score by evals$gender
t = -2.7507, df = 398.7, p-value = 0.006218
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.24264375 -0.04037194
sample estimates:
mean in group female mean in group male
4.092821 4.234328
The objective of this package is to perform statistical inference using an expressive statistical grammar that coheres with the tidyverse design framework.
Response: score (numeric)
Explanatory: gender (factor)
# A tibble: 46,300 × 3
# Groups: replicate [100]
replicate score gender
<int> <dbl> <fct>
1 1 4 female
2 1 3.1 male
3 1 5 male
4 1 4.4 male
5 1 3.5 female
6 1 4.5 female
7 1 4.5 male
8 1 4.9 male
9 1 4.4 male
10 1 3.5 male
# … with 46,290 more rows
set.seed(1234)
evals |>
specify(score ~ gender) |>
generate(reps = 100, type = "bootstrap") |>
calculate(stat = "diff in means", order = c("male", "female"))
Response: score (numeric)
Explanatory: gender (factor)
# A tibble: 100 × 2
replicate stat
<int> <dbl>
1 1 0.230
2 2 0.134
3 3 0.100
4 4 0.230
5 5 0.128
6 6 0.201
7 7 0.168
8 8 0.130
9 9 -0.00490
10 10 0.123
# … with 90 more rows
Do it all in R!
🔗 rstd.io/teach-ds-conf22 / Module 1