Many of the ideals and methods in reproducible research can, and often are, be applied to data journalism.

At the New Zealand Herald we have been putting some of these principals to work. In particular automation, we try to ensure that any interactive article can be completely rebuilt and deployed via a single command when the source data is updated.

All our interactive articles are based on a template that incorporates most of the logic we need to do this.


We also share the analysis and working behind some of our articles. The most significant of these from last year was this vaccination rate R notebook which supported this article.

However, I have not found a general approach that I am happy with so I have experimenting with another R based reproducile research tool Workflowr.

Workflowr is one of a number of tools designed to make it easier to share not only the final results of an analysis, but also the steps taken along the way. This is something I would like to be able to do with our data journalism.


Workflowr’s vignettes are fantastic and provide a good guide to getting started with the package. Here I just want to outline the steps I use to getting a project up and running with Workflowr — and drake, renv, and git.

Setup

  • Workflowr provides a way to create project directories and setup git repositories. But I like to manage my own. Assumes git is installed.

    git init project-name
  • I use renv to manage a project’s dependencies. Needs R with the renv library.

    cd project-name
    Rscript -e 'renv::init()'
  • Install the R libraries most projects need.

    Rscript -e "renv::install(c('drake', 'workflowr', 'tidyverse'))"

I usually create a new RStudio project pointed at the git repository at this stage.

Initialise Workflowr

  • Initialise the workflowr project

    Assuming you have an R session running in your project directory run the following commands.

    library(workflowr)
    wflow_start('.', name = 'Project Name', existing = T)

Tweak output

Workflowr will build your analysis - assuming you follow worflowr’s patterns - into a sharable website. The file the controls the appearence of the website is analysis/_site.yml. But default it looks like this:

name: "Project Name"
output_dir: ../docs
navbar:
  title: "Project Name"
  left:
  - text: Home
    href: index.html
  - text: About
    href: about.html
  - text: License
    href: license.html
output:
  workflowr::wflow_html:
    toc: yes
    toc_float: yes
    theme: cosmo
    highlight: textmate

I like to change it too:

name: "Project Name"
output_dir: ../docs
navbar:
  title: "Project Name"
  left:
  - text: Home
    href: index.html
  - text: About
    href: about.html
  - text: License
    href: license.html
  right:
  - icon: fa-github
    text: Source code
    href: https://github.com/nzherald/project-name
output:
  workflowr::wflow_html:
    toc: yes
    theme: readable
    highlight: textmate
    css: nzh-style.css
    dev: svg
    includes:
      in_header: header.html
      before_body: doc_prefix.html
      after_body: doc_suffix.html

This has the effect of:

  • change the theme from cosmo to readable
  • pull in the nzh herald specific CSS
  • output images as svg not png
  • add custom html snippets to the header, at the start of the document and at the end. Usually these are empty - although prior to publishing I add a draft warning at the top of each page.

License

I usually set the license file too:

All source code and software in this repository are made available
under the terms of the
[MIT license](https://opensource.org/licenses/mit-license.html). 

Note that the data is released under different agreements - these will be detailed
here prior to publication.

Then commit all the files and push to your git repository.

Next

The next step is to use drake to grab some data to analyse - but that is the topic of the next post.