Paul Bradshaw from Birmingham City University says:
Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both.
The Bureau of Investigative Journalism says:
Data journalism is simply journalism.
The former is a new and trendy term but ultimately, it is just a way of describing journalism in the modern world.
- Consider journalism
Both in the sense of spending some of your career as a data journalist - but also in the work you produce. Ask yourself
Are there stories in this data that are of public interest?
Journalists are overworked and deadline driven
The easier you can make your data to understand and consume the more likely it is to be picked up by a journalist
Not sure exactly what to call it - but I think it is important.
- More and more stories need (or are enhanced by) analysis and communication of data.
- Data can serve as the primary source for an story.
- Data can also be the used to tell the story
- Continuously updated data can be the story - Covid-19
You should be able to run a single command that updates your data, runs your analysis, creates your assets and then publishes your article/report
- Automated workflows can breed inflexibility
Things change all the time - and more interesting things change more often. Don't become King Canute and try to stop the tide coming in.
If you are not in control of data collection and your workflow tries to control the data collection and collation your workflow will break
- example/rant New Zealand Covid Data
- SQL (incl. Postgis)
Automated workflows can lock you into a single technology restricting you ability to make use of the best tools for a job.
- You develop a project and publish it
- One year passes
- You get asked to update the project with this year's data
- Has the data changed?
- Can you run your script safely?
- Can you even understand your script without a couple of days work?
Use the compiler Luke
- Compiled languages
- Dynamic languages
- Strongly typed languages
- Weakly typed languages
Flexible (dynamic and often weakly-typed langauges) are the mainstay for analysis - especially exploratory analysis
Haskell is a wonderful language with a steep learning curve - find a mentor if you want to learn it
- Haskell is strongly-typed compiled functional language
- In Haskell it is easy to use the type system to capture your assumptions about the state of the system.
You probably don't want to do this as part of your actual analysis workflow - it is possible - but I have not found it to be very efficient.
- Do your analysis however you like
- Write Haskell code to consume on stage of your pipeline - check it for consistency - and then spit it out for the next stage
- That's all
The point is that your Haskell pipeline will break - on your computer - if your assumptions are no longer true
This approach can be implement in other languages too