UNIVERSITY PARK — Penn State University Libraries is presenting a series of free virtual workshops Nov. 3–17 on research reproducibility and data management using RStudio.
R is a statistical programming language that allows users to wrangle data sets, manage analysis workflows, conduct statistical analyses and create data visualizations. This series for graduate students, postdoctoral scholars, staff and faculty will teach fundamental coding skills and data management best practices to support research reproducibility, transparency and integrity. Participants will acquire hands-on experience with coding in the RStudio integrated development environment using R packages and connections to manage R projects. They will also learn to create data visualizations using R packages such as ggplot2.
Because the workshops build upon one another, participants must attend the data wrangling workshop to attend the subsequent sessions on data management and reproducibility as well as data visualization. The “Basics of RStudio” session is optional, but recommended for those who have never used RStudio before.
The series is limited to 25 participants, so advance registration is required.
For more information, contact Briana Ezray, research data librarian for STEM (science, technology, engineering and mathematics), at bde125@psu.edu.
Following is a schedule of workshop topics and dates:
Basics of RStudio — Tuesday, Nov. 3, 2–3 p.m.
This session will introduce R and RStudio, walk through the platform interface, and discuss the utility of using the software for reproducible research practices.
Introduction to Data Wrangling Using RStudio — Wednesday, Nov. 4, 2–4 p.m.
This session will provide an overview of data wrangling. In this workshop, you will learn how to:
- Use basic indexing and functions in R such as setting a working directory and loading data and packages.
- Wrangle (manage, clean, and transform) data into tidy format or to create new variables.
- Handle string and date/time data.
- Find resources to support the analysis you would like to conduct.
Data Management and Research Reproducibility Using RStudio — Friday, Nov. 13, 2–4 p.m.
This session will provide an overview of data management strategies for a reproducible analysis and output workflow to facilitate transparent and reproducible research as well as support open data sharing. In this workshop, you will:
- Learn about file organization, naming, and versioning as well as code documentation best practices.
- Connect RStudio to Git and GitHub.
- Implement a reproducible, project-based, data analysis and output workflow in RStudio.
Data Visualization with RStudio — Tuesday, Nov. 17, 2–4 p.m.
The focus of this session is data communication. Sharing research results is a critical step for any research project, thus effective communication of research results is vital. In R, tools such as R Markdown and ggplot2 can be used to document data exploration, analysis and visualization steps and create meaningful data visualizations. In this workshop you will learn to:
- Identify the traits of effective data visualizations.
- Create numerous types of plots using ggplot2.
- Perfect plots iteratively using ggplot2.
- Document data exploration, analysis and visualization in publishable documents using R Markdown.