Methods for Reproducibility in Biomedical Informatics Research
Course Description & Objectives
Course Description
There is an ongoing reproducibility crisis in science. Even computational research, which can avoid experimental variation from animal samples or environmental conditions, has fallen victim to poor reproducibility. For example, a recent study reported that the code from only 24% of Jupyter notebooks shared on GitHub, the most prominent version control resource, were successfully re-executed (Pimental, J. 2019).
This course aims to teach incoming biomedical informatics and data science students to recognize the importance of scientific reproducibility, and its crucial role in advancing scientific research. Throughout the quarter-long course, students will develop and execute their own fully reproducible analytic pipeline. Specifically, students will be tasked with constructing a pipeline to run a predictive model that will answer a research question of their choosing. Students may choose to work with publicly available omics or health-based datasets.
Each week the course will introduce a new concept and its utility for ensuring scientific reproducibility. Students will have the opportunity to integrate this knowledge into their course projects. Course concepts and their utility for ensuring scientific reproducibility will be taught in a sequential, scaffolding-like manner covering the following:
Acquisition and storage of data
Structuring their computational workflow
Incorporating GitHub for version control
Documenting their work
Testing to ensure reproducibility
This class is crucial for ensuring that early biomedical informatics students will develop a deep understanding of scientific reproducibility before embarking on dissertation research projects and their future careers.
Course Objectives
By the end of the course, students will be able to:
- Recognize the importance of scientific reproducibility and the potential pitfalls when research is not reproducible
- Describe best practices and tools for ensuring scientific reproducibility in biomedical informatics research
- Implement popular tools such as version control (GitHub) and best practices for data acquisition, code documentation, project structure and organization, and testing that will ensure scientific reproducibility.
- Execute a reproducible, computational pipeline implementing the tools and best practices learned throughout the course
Review additional course materials: