Reproducible Journalism - using Workflowr
Many of the ideals and methods in reproducible research can, and often are, be applied to data journalism.
At the New Zealand Herald we have been putting some of these principals to work. In particular automation, we try to ensure that any interactive article can be completely rebuilt and deployed via a single command when the source data is updated.
All our interactive articles are based on a template that incorporates most of the logic we need to do this.
Workflowr is one of a number of tools designed to make it easier to share not only the final results of an analysis, but also the steps taken along the way. This is something I would like to be able to do with our data journalism.
Workflowr's vignettes are fantastic and provide a good guide to getting started with the package. Here I just want to outline the steps I use to getting a project up and running with Workflowr — and drake, renv, and git.
Workflowr provides a way to create project directories and setup git repositories. But I like to manage my own. Assumes git is installed.git init project-name
I use renv to manage a project's dependencies. Needs R with the renv library.cd project-nameRscript -e 'renv::init()'
Install the R libraries most projects need.Rscript -e "renv::install(c('drake', 'workflowr', 'tidyverse'))"
I usually create a new RStudio project pointed at the git repository at this stage.
Initialise the workflowr project
Assuming you have an R session running in your project directory run the following commands.library(workflowr)wflow_start('.', name = 'Project Name', existing = T)
Workflowr will build your analysis - assuming you follow worflowr's patterns - into a
sharable website. The file the controls the appearence of the website is
But default it looks like this:
name: "Project Name"output_dir: ../docsnavbar:title: "Project Name"left:- text: Homehref: index.html- text: Abouthref: about.html- text: Licensehref: license.htmloutput:workflowr::wflow_html:toc: yestoc_float: yestheme: cosmohighlight: textmate
I like to change it too:
name: "Project Name"output_dir: ../docsnavbar:title: "Project Name"left:- text: Homehref: index.html- text: Abouthref: about.html- text: Licensehref: license.htmlright:- icon: fa-githubtext: Source codehref: https://github.com/nzherald/project-nameoutput:workflowr::wflow_html:toc: yestheme: readablehighlight: textmatecss: nzh-style.cssdev: svgincludes:in_header: header.htmlbefore_body: doc_prefix.htmlafter_body: doc_suffix.html
This has the effect of:
- change the theme from
- pull in the nzh herald specific CSS
- output images as svg not png
- add custom html snippets to the header, at the start of the document and at the end. Usually these are empty - although prior to publishing I add a draft warning at the top of each page.
I usually set the license file too:
All source code and software in this repository are made availableunder the terms of the[MIT license](https://opensource.org/licenses/mit-license.html).Note that the data is released under different agreements - these will be detailedhere prior to publication.
Then commit all the files and push to your git repository.
The next step is to use drake to grab some data to analyse - but that is the topic of the next post.