Introduction to plnr • plnr

Introduction

plnr is a framework for planning and executing analyses in R. It’s designed to help you organize and run multiple analyses efficiently, whether you’re applying the same function with different arguments or running multiple different functions on your data.

Core Concepts

Broad technical terms

Object	Description
argset	A named list containing a set of arguments.
analysis	These are the fundamental units that are scheduled in `plnr`: 1 argset 1 (action) function that takes two arguments data (named list) argset (named list)
plan	This is the overarching “scheduler”: 1 data pull 1 list of analyses

Different types of plans

Plan Type	Description
Single-function plan	Same action function applied multiple times with different argsets applied to the same datasets.
Multi-function plan	Different action functions applied to the same datasets.

Plan Examples

Plan Type	Example
Single-function plan	Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing).
Single-function plan	Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots).
Multi-function plan	Creating the output for a report (e.g. multiple different tables and graphs).

Basic Usage

Let’s start with a simple example that demonstrates the core concepts:

library(plnr)

## plnr 2025.3.19
## https://www.csids.no/plnr/

library(ggplot2)
library(data.table)

# Create a new plan
p <- Plan$new()

# Add data
p$add_data(
  name = "deaths",
  direct = data.table(deaths=1:4, year=2001:2004)
)

# Add argsets for different years
p$add_argset(
  name = "fig_1_2002",
  year_max = 2002
)

p$add_argset(
  name = "fig_1_2003",
  year_max = 2003
)

# Define analysis function
fn_fig_1 <- function(data, argset) {
  plot_data <- data$deaths[year <= argset$year_max]
  
  ggplot(plot_data, aes(x=year, y=deaths)) +
    geom_line() +
    geom_point(size=3) +
    labs(title = glue::glue("Deaths from 2001 until {argset$year_max}"))
}

# Apply function to all argsets
p$apply_action_fn_to_all_argsets(fn_name = "fn_fig_1")

# Run analyses
p$run_one("fig_1_2002")

Advanced Features

Data Management

The framework ensures efficient data management by: - Loading data once and reusing across analyses - Separating data cleaning from analysis - Providing hash-based tracking of data changes

Debugging Tools

plnr includes several tools to help with development and debugging:

# Access data directly
p$get_data()

## $deaths
##    deaths  year
##     <int> <int>
## 1:      1  2001
## 2:      2  2002
## 3:      3  2003
## 4:      4  2004
## 
## $hash
## $hash$current
## [1] "aa2296ddc2b519c9c2a3d1a39471ac9e"
## 
## $hash$current_elements
## $hash$current_elements$deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"

# Access specific argset
p$get_argset("fig_1_2002")

## $year_max
## [1] 2002

# Access analysis by name or index
p$get_analysis(1)

## $argset
## $argset$year_max
## [1] 2002
## 
## $argset$index_analysis
## [1] 1
## 
## 
## $fn_name
## [1] "fn_fig_1"

# Use is_run_directly() for development
fn_analysis <- function(data, argset) {
  if(plnr::is_run_directly()) {
    data <- p$get_data()
    argset <- p$get_argset("fig_1_2002")
  }
  
  # function continues here
}

Function Naming

When adding analyses, you can use either fn_name or fn:

# Using fn_name (recommended)
p$add_analysis(
  name = "fig_1_2002",
  fn_name = "fn_fig_1",
  year_max = 2002
)

# Using fn (for function factories)
p$add_analysis(
  name = "fig_1_2003",
  fn = fn_fig_1,
  year_max = 2003
)

Hash-based Caching

The framework uses hashing to track data changes:

# Create two plans with same data
p1 <- Plan$new()
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")

p2 <- Plan$new()
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")

# Same data has same hash
identical(p1$get_data()$hash$current_elements, p2$get_data()$hash$current_elements)

## [1] TRUE

# Different data has different hash
p1$add_data(direct = data.table(deaths=1:5, year=2001:2005), name = "deaths3")
p1$get_data()$hash$current_elements

## $deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
## 
## $deaths2
## [1] "82519debaef80054a7b2ed512f8dfb94"
## 
## $deaths3
## [1] "d740b5c163d702dde31061bcd9e00716"

Best Practices

Data Organization
- Keep data cleaning separate from analysis
- Use meaningful names for datasets
- Document data structure and assumptions
Analysis Functions
- Always accept data and argset parameters
- Use is_run_directly() for development
- Keep functions focused and single-purpose
Plan Structure
- Use meaningful names for argsets and analyses
- Group related analyses together
- Document plan structure and dependencies
Development Workflow
- Start with small examples
- Use debugging tools during development
- Test analyses individually before running full plan

Next Steps

Read the Adding Analyses vignette for more detailed examples
Check out the package website for additional resources
Explore the function documentation with help(package="plnr")