linux.conf.au 2017 | Presentation: Reusable R for automation, small area estimation and legacy systems

Reusable R for automation, small area estimation and legacy systems

Presented by Rhydwyn
Thursday 10:40 a.m.–11:25 a.m.
Target audience: User

Abstract

In order to safely share open public health data for small areas, health researchers employ complex spatial adjustment models called generalised additive models. We need to run these models hundreds of times, with inbuilt checking and debugging.

Proprietary statistical programs are designed for run-once analyses. They did not meet our needs. We need something better: we need well-engineered R.

We need to make it easy for analysts to use. We wanted to use the tools of software engineering and reusable research to allow statisticians and epidemiologists to be more efficient - but statisticians and epidemiologists are not computer scientists, and a lot of this world is new to them.

Therefore, we had to develop not only for good software practice, but to ensure that others could use our tools - even when the tool comes with a very different focus from what users are used to.

Using the example of batch small area estimation using generalized additive models, I will talk about the project, the tools we used and how to integrate R into a legacy SAS environment with a minimum of pain: allowing for uptake of the strengths of R without exposing new users to its complexity.

This case will show how open source scientists and statisticians can benefit from open source, and how open source and openness in general can make science better.

Presented by

Rhydwyn

Rhydwyn is a statistician and data nerd currently working in the healthcare system, working with large and rapidly changing data sets. His interests include open source, open data, population health statistics and reproducible research. Rhydwyn is passionate about open source technology that makes science easier and gets meaningful results into scientists’ and policymakers’ hands.