Skip to contents

Introduction

plnr is a framework for planning and executing analyses in R. It’s designed to help you organize and run multiple analyses efficiently, whether you’re applying the same function with different arguments or running multiple different functions on your data.

Core Concepts

Broad technical terms

Object Description
argset A named list containing a set of arguments.
analysis

These are the fundamental units that are scheduled in plnr:

  • 1 argset
  • 1 (action) function that takes two arguments
    1. data (named list)
    2. argset (named list)
plan

This is the overarching “scheduler”:

  • 1 data pull
  • 1 list of analyses

Different types of plans

Plan Type Description
Single-function plan Same action function applied multiple times with different argsets applied to the same datasets.
Multi-function plan Different action functions applied to the same datasets.

Plan Examples

Plan Type Example
Single-function plan Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing).
Single-function plan Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots).
Multi-function plan Creating the output for a report (e.g. multiple different tables and graphs).

Basic Usage

Let’s start with a simple example that demonstrates the core concepts:

## plnr 2025.3.19
## https://www.csids.no/plnr/
library(ggplot2)
library(data.table)

# Create a new plan
p <- Plan$new()

# Add data
p$add_data(
  name = "deaths",
  direct = data.table(deaths=1:4, year=2001:2004)
)

# Add argsets for different years
p$add_argset(
  name = "fig_1_2002",
  year_max = 2002
)

p$add_argset(
  name = "fig_1_2003",
  year_max = 2003
)

# Define analysis function
fn_fig_1 <- function(data, argset) {
  plot_data <- data$deaths[year <= argset$year_max]
  
  ggplot(plot_data, aes(x=year, y=deaths)) +
    geom_line() +
    geom_point(size=3) +
    labs(title = glue::glue("Deaths from 2001 until {argset$year_max}"))
}

# Apply function to all argsets
p$apply_action_fn_to_all_argsets(fn_name = "fn_fig_1")

# Run analyses
p$run_one("fig_1_2002")

Advanced Features

Data Management

The framework ensures efficient data management by: - Loading data once and reusing across analyses - Separating data cleaning from analysis - Providing hash-based tracking of data changes

Debugging Tools

plnr includes several tools to help with development and debugging:

# Access data directly
p$get_data()
## $deaths
##    deaths  year
##     <int> <int>
## 1:      1  2001
## 2:      2  2002
## 3:      3  2003
## 4:      4  2004
## 
## $hash
## $hash$current
## [1] "aa2296ddc2b519c9c2a3d1a39471ac9e"
## 
## $hash$current_elements
## $hash$current_elements$deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
# Access specific argset
p$get_argset("fig_1_2002")
## $year_max
## [1] 2002
# Access analysis by name or index
p$get_analysis(1)
## $argset
## $argset$year_max
## [1] 2002
## 
## $argset$index_analysis
## [1] 1
## 
## 
## $fn_name
## [1] "fn_fig_1"
# Use is_run_directly() for development
fn_analysis <- function(data, argset) {
  if(plnr::is_run_directly()) {
    data <- p$get_data()
    argset <- p$get_argset("fig_1_2002")
  }
  
  # function continues here
}

Function Naming

When adding analyses, you can use either fn_name or fn:

# Using fn_name (recommended)
p$add_analysis(
  name = "fig_1_2002",
  fn_name = "fn_fig_1",
  year_max = 2002
)

# Using fn (for function factories)
p$add_analysis(
  name = "fig_1_2003",
  fn = fn_fig_1,
  year_max = 2003
)

Hash-based Caching

The framework uses hashing to track data changes:

# Create two plans with same data
p1 <- Plan$new()
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")

p2 <- Plan$new()
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")

# Same data has same hash
identical(p1$get_data()$hash$current_elements, p2$get_data()$hash$current_elements)
## [1] TRUE
# Different data has different hash
p1$add_data(direct = data.table(deaths=1:5, year=2001:2005), name = "deaths3")
p1$get_data()$hash$current_elements
## $deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
## 
## $deaths2
## [1] "82519debaef80054a7b2ed512f8dfb94"
## 
## $deaths3
## [1] "d740b5c163d702dde31061bcd9e00716"

Best Practices

  1. Data Organization
    • Keep data cleaning separate from analysis
    • Use meaningful names for datasets
    • Document data structure and assumptions
  2. Analysis Functions
    • Always accept data and argset parameters
    • Use is_run_directly() for development
    • Keep functions focused and single-purpose
  3. Plan Structure
    • Use meaningful names for argsets and analyses
    • Group related analyses together
    • Document plan structure and dependencies
  4. Development Workflow
    • Start with small examples
    • Use debugging tools during development
    • Test analyses individually before running full plan

Next Steps

  1. Read the Adding Analyses vignette for more detailed examples
  2. Check out the package website for additional resources
  3. Explore the function documentation with help(package="plnr")