Collaborating on reproducible manuscripts for dummies: an introduction using git, R Markdown and Zotero

This document is an introduction to a workflow that facilitates collaborating with other researchers if your aim is to apply Open Science principles.

Background

This workflow is consistent with the following Open Science principles:

  • Being as transparent as possible. In scientific publications of empirical endeavours, it is important that people can inspect how the results derive from the collected data. One way to achieve this is to include deparate analysis scripts and let the readers (and the future ‘you’) sort things out for themselves. However, this still leaves a lot of room for interpretation. Ideally, every reported result can be traced back unequivocally to analyses and data. Reproducible manuscripts enable you to achieve this with relatively little effort. This will serve your future self, your colleagues, and the scientific community.

  • Building an Open infrastructure for our science. One of the Open Science foundations is inclusiveness. To build an inclusive science requires a number of things. One is writing tutorials such as this one, so that learning the required competences is in everybody’s reach. Another is to use Free/Libre Open Source Software (FLOSS) and open standards as much as possible, and use proprietary software as little as possible. This is important because one of the many divides between scientists world-wise is financial. Choosing to use proprietary software therefore excludes a part of the scientific community, either now or potentially in the future (in those cases where limited free versions of proprietary software exist).

This workflow is designed to be relatively easy to master; although some quite advanced tools are used (e.g. R Markdown and git), you only use very limited functionality from each.

The tools

This workflow uses the following tools:

  • Markdown for writing the manuscript text. Why Markdown? Because it’s much easier than HTML, and therefore, much much much easier than LaTeX. Also, HTML can be included, if desired, as can LaTeX Math expressions, in R Markdown. In this sense Markdown represents the sweet spot between being accessible to new users and powerful enough to fit most use cases.

  • R for the analyses. Why R? Because R is both very powerful and FLOSS. Furthermore, R has a lot of freely available handbooks and tutorials geared towards research in psychology and statistics. Finally, R has R Markdown, which allows embedding chunks of R code in Markdown documents.

  • R Markdown to integrate the analyses with the manuscript text, pretty much for the above reasons.

  • RStudio as an environment to do the writing in. RStudio allows rendering R Markdown documents with the press of a button, and integrates with git.

  • git for version/revision control and syncing. Git is FLOSS and extremely powerful, but also allows relative novices to play around with with relatively few challenges.

  • GitLab as online headquarters and to manage the project. GitLab is a FLOSS git management system with a number of nice extras such as issue management and so-called continuous integration.

  • Zotero for reference management. Zotero is a FLOSS reference manager that has an online API, which is exactly what we need to automagically have updated references in our manuscript.

The installation

Of course, before you can collaborate on or start a project with this workflow, you need those tools installed and configured properly. This will only need to happen once on every PC, of course - once everything’s installed and configured, you can use it in all projects.

You need the following:

Once you installed R, git, and RStudio, RStudio should automatically find both R and git. You can check this by clicking “File” in RStudio, then clicking “New Project”, and then you should be able to create a new project using Version Control. Click it, and then click the Git option. If this works, RStudio found your git. No luck? Reboot your PC. Still no? See https://happygitwithr.com/rstudio-see-git.html.

The project set-up

Once you have the necessary software and accounts set up, you only have to do a few things to start a new project.

  1. Log in to GitLab. Create a new project. Try to set a name that is as much as possible self-explanatory (to others, not to you) but not too long, and set the project to Public (it’s Open Science, after all). Once created, you’re taken to the project overview: click the ‘Clone’ button and copy the “https” link.

  2. Open RStudio. In the File menu, select New Project, and create a new Version Control project, specifically, a Git project. Copy-paste the “Clone URL” that you just copied from GitLab. Pick a good place for the project (mine are all in a “Research” directory) and create it.

  3. RStudio now created an (as yet mostly empty) directory for you. In that directory, create subdirectories to structure your project.

  4. Log in to Zotero. Create a new public group (but let only admins add and edit entries). I recommend using the same name you used for the Git project.

  5. In RStudio, create a new file of the R Script type. Go to … and copy-paste the contents.

  6. In RStudio, create a new file of the R Markdown type. This will be where you type your manuscript. As a convention, I recommend giving it a filename that’s equal to the git repo name (i.e. you project’s name with only lowercase letters and dashes instead of spaces).

  7. In the project’s root, create a new file called “.gitlab-ci.yml”, and copy-paste the contents from …

Now, you have the project set up so that most of the tasks are automated. This means

Working in your project

Once everything’s set up, you can start working in your project.

Avatar
Gjalt-Jorn Peters
Assistant Professor of Methodology and Statistics

Gjalt-Jorn Peters works at the Dutch Open University, where he teaches methodology and statistics, and does research into health psychology, specifically behavior change, in general and applied to nightlife-related risk behavior. He is involved in Dutch nightlife prevention project Celebrate Safe, where he is responsible for the Party Panel study. In addition, he develops and maintains a number of R packages.