Designing the data science classroom


🗓️ July 25 and 26, 2022

⏰ 09:00 - 17:00

🏨 Maryland 3

✍️ Click here to register


Overview


There has been significant innovation in introductory statistics and data science courses to equip students with the statistical, computing, and communication skills needed for modern data analysis. Innovating subsequent courses is also important, so students can continue developing these skills beyond the first course. In this session, we’ll present a modern approach to teaching undergraduate regression analysis, the second statistics course for many students. We’ll share strategies for using real-world data sets and examples, teaching modern computing skills, and incorporating non-technical skills such as writing and effective collaboration as part of the course. We’ll share example activities and assignments, along with a demo of the computing toolkit using the R tidymodels package, Quarto for reproducible reports, and Git and GitHub for version control and collaboration. The activities and demo will be hands-on; attendees will also have the opportunity to exchange ideas and ask questions throughout the session.

Success in data science and statistics is dependent on the development of both analytical and computational skills, and the demand for educators who are proficient at teaching both these skills is growing. The goal of this workshop is to equip educators with concrete information on content, workflows, and infrastructure for painlessly introducing modern computation with R and RStudio within a data science curriculum.

In addition to gaining technical knowledge, participants will engage in discussion around the decisions that go into developing a data science curriculum and choosing workflows and infrastructure that best support the curriculum and allow for scalability. Workshop attendees will work through several exercises from existing courses and get first-hand experience with doing and teaching data science with the tidyverse and tidymodels packages. Attendees will also get experience using relevant tool-chains and techniques, including running a course on RStudio Cloud, literate programming with Quarto, and workflows for collaboration, version control, and automated feedback with Git/GitHub. We will also discuss best practices for configuring and deploying classroom infrastructures to support these tools.

This workshop is aimed primarily at participants teaching data science in an academic setting in semester-long courses, however much of the information and tooling we introduce is applicable for shorter teaching experiences like workshops and bootcamps as well. Basic knowledge of R is assumed and familiarity with the tidyverse and Git is preferred.

Learning objectives

Curriculum, workflow, and infrastructure design for teaching data science with R and RStudio.

Is this course for me?

If your answer to any of the following questions is “yes”, then this is the right workshop for you.

  • Do you want to learn / discuss curriculum, pedagogy, and computing infrastructure design for teaching data science with R and RStudio?

  • Are you interested in setting up your class in RStudio Cloud?

  • Do you want to integrate version control with git into your teaching and learn about tools and best practices for running your course on GitHub?

Prework

In this workshop you will wear two hats: the educator and the learner. At times we will be demoing workflows for instructors (whom I assume are familiar with R, RStudio, and the tidyverse and have taught with or are interested in teaching with RStudio Cloud, Git, and GitHub) and at other times you will be working through the student view (on RStudio Cloud, assuming you’re not using your local setup).

Watch

For the latter, you can mostly assume that you’re a student in an intro data science course and this is the first day of class (i.e. there’s no expectation of prep). However we’d also like to give you a taste of how the remainder of the semester goes, which often entails students watching background videos to prepare. To prepare for the first module, please watch the following video.

🖥️ Slides

Set up

For the former, you’ll might need to prepare a bit more. If you already are a Git and GitHub user, have a personal token saved, and can push to and pull from GitHub without having to type your password in each time, you’re good to go. If not, please complete the following steps before the workshop1. If you need help with any of the steps, please ask on GitHub Discussions.

Install

If you choose to use your local R and RStudio install for the workshop, you should also install the following packages:

  • From CRAN:

    install.packages("devtools")
    install.packages("DT")
    install.packages("fivethirtyeight")
    install.packages("gapminder")
    install.packages("ghclass")
    install.packages("learnr")
    install.packages("lubridate")
    install.packages("nycflights13")
    install.packages("openintro")
    install.packages("rvest")
    install.packages("shiny")
    install.packages("sortable")
    install.packages("tidyverse")
    install.packages("tidymodels")
    install.packages("unvotes")
  • From GitHub:

    devtools::install_github("rundel/learnrhash")

RStudio Cloud

You can join the RStudio Cloud workspace for this workshop at rstd.io/teach-ds-conf22-cloud.

Instructors

Headshot of Dr. Mine Çetinkaya-Rundel

Dr. Mine Çetinkaya-Rundel (she/her) is Professor of the Practice at Duke University and Developer Educator at RStudio. Mine’s work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing retention of women and under-represented minorities in STEM. Mine works on integrating computation into the undergraduate statistics curriculum, using reproducible research methodologies and analysis of real and complex datasets. Mine works on the OpenIntro project, whose mission is to make educational products that are free, transparent, and lower barriers to education. As part of this project she co-authored four open-source introductory statistics textbooks. She is also the creator and maintainer of datasciencebox.org and she teaches the popular Statistics with R MOOC on Coursera. Mine is a Fellow of the ASA and Elected Member of the ISI as well as the winner of the 2021 Robert V. Hogg Award for For Excellence in Teaching Introductory Statistics.

Headshot of Dr. Maria Tackett

Dr. Maria Tackett is an Assistant Professor of the Practice in the Department of Statistical Science at Duke University. Prior to joining the faculty at Duke, Maria earned a Ph.D. in Statistics from the University of Virginia and worked as a statistician at Capital One. Her current work focuses on understanding how active learning strategies can be used to promote engagement and student motivation in undergraduate statistics courses. She also studies how classroom practices in introductory math and statistics courses impact students’ sense of community, self-efficacy, and learning outcomes. Maria is an RStudio certified trainer and is actively involved in the R and statistics education communities.

Footnotes

  1. These steps will direct you to relevant chapters from Happy Git with R by Jenny Bryan et. al. It’s such a great and comprehensive resource that I have not felt the need to replicate the information elsewhere.↩︎