Reproducible Journalism - using Workflowr

07 Feb, 20202 Min Read — In reproducible journalism

Many of the ideals and methods in reproducible research can, and often are, be applied to data journalism.

At the New Zealand Herald we have been putting some of these principals to work. In particular automation, we try to ensure that any interactive article can be completely rebuilt and deployed via a single command when the source data is updated.

All our interactive articles are based on a template that incorporates most of the logic we need to do this.


We also share the analysis and working behind some of our articles. The most significant of these from last year was this vaccination rate R notebook which supported this article.

However, I have not found a general approach that I am happy with so I have experimenting with another R based reproducile research tool Workflowr.

Workflowr is one of a number of tools designed to make it easier to share not only the final results of an analysis, but also the steps taken along the way. This is something I would like to be able to do with our data journalism.


Workflowr's vignettes are fantastic and provide a good guide to getting started with the package. Here I just want to outline the steps I use to getting a project up and running with Workflowr and drake, renv, and git.

Setup

  • Workflowr provides a way to create project directories and setup git repositories. But I like to manage my own. Assumes git is installed.

    git init project-name
  • I use renv to manage a project's dependencies. Needs R with the renv library.

    cd project-name
    Rscript -e 'renv::init()'
  • Install the R libraries most projects need.

    Rscript -e "renv::install(c('drake', 'workflowr', 'tidyverse'))"

I usually create a new RStudio project pointed at the git repository at this stage.

Initialise Workflowr

  • Initialise the workflowr project

    Assuming you have an R session running in your project directory run the following commands.

    library(workflowr)
    wflow_start('.', name = 'Project Name', existing = T)

Tweak output

Workflowr will build your analysis - assuming you follow worflowr's patterns - into a sharable website. The file the controls the appearence of the website is analysis/_site.yml. But default it looks like this:

name: "Project Name"
output_dir: ../docs
navbar:
title: "Project Name"
left:
- text: Home
href: index.html
- text: About
href: about.html
- text: License
href: license.html
output:
workflowr::wflow_html:
toc: yes
toc_float: yes
theme: cosmo
highlight: textmate

I like to change it too:

name: "Project Name"
output_dir: ../docs
navbar:
title: "Project Name"
left:
- text: Home
href: index.html
- text: About
href: about.html
- text: License
href: license.html
right:
- icon: fa-github
text: Source code
href: https://github.com/nzherald/project-name
output:
workflowr::wflow_html:
toc: yes
theme: readable
highlight: textmate
css: nzh-style.css
dev: svg
includes:
in_header: header.html
before_body: doc_prefix.html
after_body: doc_suffix.html

This has the effect of:

  • change the theme from cosmo to readable
  • pull in the nzh herald specific CSS
  • output images as svg not png
  • add custom html snippets to the header, at the start of the document and at the end. Usually these are empty - although prior to publishing I add a draft warning at the top of each page.

License

I usually set the license file too:

All source code and software in this repository are made available
under the terms of the
[MIT license](https://opensource.org/licenses/mit-license.html).
Note that the data is released under different agreements - these will be detailed
here prior to publication.

Then commit all the files and push to your git repository.

Next

The next step is to use drake to grab some data to analyse - but that is the topic of the next post.

© 2019–20 by Chris Knox. All rights reserved.
Last build: 2020-11-25