Methods for Reproducibility in Biomedical Informatics Research

Lesson Plan

Lesson: Introduction to GitHub

  • Duration: 3 hours
  • Intended class size: ~20 students

Lesson Topic and Context

GitHub is a mainstay method for ensuring reproducibility in data science and research settings. In this lesson, students will be introduced to GitHub as an essential tool for facilitating version control, and thus, scientific reproducibility. This course is designed for early students in biomedical informatics or data science programs who will also be conducting research. This lesson will be introduced early into the course (Lesson #2) once students have already received an overview of scientific reproducibility and its importance in biomedical research (Lesson #1). This lesson aims to give students a greater theoretical understanding of GitHub, in addition to a hands-on practicum.

Lesson Objectives

  • Describe the whats and hows of GitHub and recognize its role in facilitating scientific reproducibility for biomedical informatics research.

  • Implement GitHub and its core functions on their own computers to gain hands-on experience with the ways in which these techniques may be applied to their own research and projects.

Summary of Lesson

At the start of the class, students will be given a worksheet and asked to reflect on their experiences with reproducibility. Here, they will complete the first two pre-class exericses.

During the lesson, students will create a Github Repository for their quarter-long project. They will then follow along with the instructor to perform a series of exercises to demonstrate knowledge of Github terms and concepts (e.g., ‘cloning’, ‘pushing’, ‘committing’, and ‘pulling’). Students will then break into groups and practice ‘cloning’ and ‘pulling’ each others’ sample repository and script.

Upon the in-class and group exercises, the students will finish the remaining questions on the worksheet individually. The post-lesson questions provide an opportunity to demonstrate understanding of key terms and concepts from the lesson. Students will also have space to detail any remaining points of confusion. They will also be asked to reflect and consider how Github may best help them with their specific research and semester-long projects.

Lesson Timeline

00:00 - 00:10: The class will begin with a review from concepts introduced in the previous lecture which covered the importance of scientific reproducibility. Students will be given a worksheet and asked to first reflect on the Pre-class question which provides an opportunity to think and describe their own experiences or thoughts regarding scientific reproducibility. In the last few minutes, students will be encouraged to discuss their experiences if they are comfortable doing so.

00:10 - 00:25: A short PowerPoint presentation on GitHub and its use and pervasiveness among biomedical informatics and computational research domains.

00:25 - 00:45: Students will have been asked to create a GitHub account prior to the start of class. The instructor will walk the students through connecting GitHub to their local RStudio environment. This process should not take long but built in time for troubleshooting has been considered.

01:00 - 2:00: As a group, students will go through a series of exercises with the instructor to learn GitHub’s basic functions. Students will become familiar with key terms including repository, clone, pull, push, and commit.

These concepts will first be demonstrated through a combination of the GitHub webpage, RStudio, and the command-line. This portion of class will allow students and the instructor to go through the main functions of GitHub togetherstep by step - troubleshooting specific issues as they arise.

Specific exercises include: 

  • Create a new GitHub repository

  • Clone the newly created GitHub repository onto their local computers

  • Push a simple.R script to their repository

  • Commit a change to their script

  • Push the new commit to their repository

01:30-01:45: Break 

01:45 - 2:15: Students will pair up in groups and will practice cloning and pulling each other’s sample repositories and scripts. Of note, if any students have trouble with any of the previous exercises, the instructor will work with them individually to troubleshoot.

2:15 - 2:45: Students will take several minutes to reflect on the course and lesson activities. Here, they will be asked to complete a worksheet that will provide an opportunity to demonstrate understanding of key terms and concepts from the lesson. Students will also have space to detail any remaining points of confusion. They will also be asked to reflect and consider how GitHub may best help them with their specific research and semester-long projects.

2:45 - 03:00: Students will turn in their worksheets and submit a link to their sample repository. The remainder of the class will be spent answering any final questions, or helping any students who may have run into technical issues during the class.

Lesson Assessment

In this lesson, basic working knowledge of GitHub will be assessed by reviewing each student’s repository and the log of changes they ‘committed’. A more detailed assessment plan and rubric can be found here


Review additional course materials: