Academics

Libraries to offer free virtual workshops on R programming language

Credit: Chris Blaska. All Rights Reserved.

UNIVERSITY PARK — Penn State University Libraries is presenting a series of free virtual workshops March 21 to April 20 on research reproducibility in R, a statistical programming language used to wrangle data sets, manage analysis workflows, conduct statistical analyses and create data visualizations.

This series will offer hands-on training in fundamental coding skills, statistical data analysis, data visualization and data management strategies to support research reproducibility. Participants can expect to learn how to wrangle data into an analysis-ready format, use R packages and connections to manage projects, use the ggplot2 package to create data visualizations, and implement statistical analyses to understand trends and differences in data.

The workshops are open to Penn State graduate students, postdoctoral scholars, staff and faculty. No previous knowledge of R is required.

Because the workshops build upon one another, participants in each workshop are expected to have attended the previous session. The “Basics of R and RStudio” workshop is optional but recommended for those who have never used R or RStudio before.

Participants will need access to a computer with a Macintosh, Linux or Windows operating system and administrative privileges to download and install applications.

The registration deadline is March 17. Participants must register in advance.

For more information, contact Briana Ezray Wham, research data librarian for STEM (science, technology, engineering and mathematics), at bde125@psu.edu.

Following is a schedule of workshop topics and dates:

Basics of R and RStudio — Monday, March 21, 1–2 p.m.

This session will introduce R and RStudio, walk through the platform interface, and discuss the utility of using the software for reproducible research practices. 

Data Wrangling in R — Wednesday, March 23, 1–3 p.m.

This overview of data wrangling will show you how to:

  • Use basic indexing and functions such as setting a working directory and loading data and packages
  • Wrangle (manage, clean and transform) data into tidy format or to create new variables
  • Handle string and date/time data
  • Find resources to support the analysis you would like to conduct

Data Management and Research Reproducibility in RStudio — Wednesday, March 30, 1–3 p.m.

This session will provide an overview of data management strategies for a reproducible analysis and output workflow to facilitate transparent and reproducible research and support open data sharing. Participants will learn:

  • Best practices for code documentation, file organization, naming and versioning
  • How to connect RStudio to Git and GitHub
  • How to implement a reproducible, project-based data analysis and output workflow                                                                                                  
  • How to document data exploration, analysis and visualization in publishable documents using RMarkdown

Participants will apply data management and reproducibility strategies from previous workshops to their R workflow.

Data Visualization in R — Wednesday, April 6, 1–3 p.m.

This session focuses on data communication. Because sharing research results is a critical step in any research project, effective communication is vital. Tools such as RMarkdown and ggplot2 can be used to document data exploration, analysis and visualization steps and create meaningful data visualizations. This workshop will show you how to:

  • Identify the traits of effective data visualizations
  • Use ggplot2 to create numerous types of plots and perfect them iteratively

Participants will use their reproducible, project-based data analysis workflow to apply data wrangling tasks learned in previous sessions to prepare data and create visualizations.

Statistical Data Analysis in R — Wednesday, April 13, 1–3 p.m.

Implementing statistical analyses to understand trends and differences in data is an important tool for effective research communication. This workshop will cover exploratory data analysis, including methods for testing hypotheses, computing confidence intervals and reporting results. Participants will test and apply their R programming skills to basic data analyses.

Bring Your Data Day! Wednesday, April 20, TBD

Workshop participants will have a chance to meet with the instructors and put their new skills to use by applying them to their own data.

Last Updated March 18, 2022

Contact