Data Journalism

Chris Knox

Data Editor at the New Zealand Herald


  • Persist with getting data from NZ.Stat and Infoshare
  • There is a lot of data for stories in

Inflation adjusted median household income

Write out the steps

  • Get the median household income data
  • Get the CPI data
  • Combine the data
  • Calculate the inflation adjusted value


We have the income (NZ.Stat) and the CPI (Infoshare) data

(I forgot how much of a pain it is to open NZ.Stat files - data is here)

  • Load them both into separate workflow tabs

Income data

  • Upload wb 01 upload income
  • Grrr commas in year
    • Convert year column to date
    • Actually not the right thing - we will come back to this wb 02 year to date
  • Results of conversion wb 03 date done
  • Select Year and Value wb 04 select
  • Convert weekly income to annual wb 05 calc
  • Drop value column wb 06 select

CPI data

(We did this yesterday)

  • Use a new tab wb 10 tabs
  • Upload wb 11 upload
  • Remove extra row wb 12 remove row
  • And rename while we are cleaning up wb 13 autorename
  • Filter out Q1,Q3,Q4
    • It doesn't matter which quarter - just be consistent wb 14 filter
  • Filter using text contains wb 15 text contains
  • Let's shorten the CPI column name wb 16 add rename
  • Another rename wb 17 short name
  • Ohh lost filter
    • Set column again wb 18 redo filter

Joining time

  • Start a new tab for joining

    • Leave data loading tabs for data loading
  • Start a new tab using an existing tab wb 200 start tab

  • Join to the other tab wb 201 join

  • Ohoh not ready yet wb 202 ohoh no columns

  • Columns need to match

    • Type, name, and content wb 203 columns no match
  • Turn the year into a number without formatting

    • Could do this back at beginning wb 204 year format
  • Remove Q2 from years wb 205 remove q2

  • Convert to number without formatting wb 206 convert to num

  • Rename to Year wb 207 rename to match

  • Hooray wb 208 join

  • Oh Index is text wb 209 convert index to num

  • Calculation time

    • Index is designed so it provides a ratio
    • Think about whether number should get bigger or smaller wb 210 calc
  • Divide through by year's index wb 211 divide by year

  • Multiple by index now wb 212 multiply by now

  • Tidy up wb 213 select output

  • Look charts wb 214 graph

  • Does that look right? wb 215 graph done

Stories in police data

  • Use the little plus to expand pd 100 expand
  • See more levels pd 101 expanded
  • Select a level pd 102 christams assault
  • Select a sublevel pd 103 whatabout serious resulting in injury
  • What is happening? pd 104 wtf
  • How can we download? pd 105 download
  • Don't download the summary pd 106 dl summary
  • Download the full data pd 107 full
  • What if we want to look at cities? pd 108 police stations for cities
  • Police stations report pd 109 rends again
  • Use the variance to look for big changes pd 110 sort by variance
  • Sort by variance
    • Ignore big changes in small numbers pd 111 sorted serious assault
  • Select the cities you are interested in
    • Lower Hutt and Wellington too pd 112 get wellington cities



Write a short pitch for a data-supported or data-driven article based on police data.

  • No more than 150 words
  • What is the (draft - more summary than headline) headline?
  • Why is the data interesting/newsworthy?
  • What other data will you use?
  • What data processing do you expect to have to do?


Write the data-supported or data-driven article.


  • Christmas assaults

  • Rise in serious assaults

  • Burglarly town vs town

  • You don't need to interview people

  • But do say who you would interview and why

  • Final article should be 400 to 600 words

    • Actual article could be less to leave room for interviews
  • Office hours

  • Reach out to me if you have questions like:

    • I think I should be able to download XX but I can't
    • I want to calculate XX but I'm stuck
© 2019–20 by Chris Knox. All rights reserved.
Last build: 2020-11-25