Introduction
plnr
is a framework for planning and executing analyses in R. It’s designed to help you organize and run multiple analyses efficiently, whether you’re applying the same function with different arguments or running multiple different functions on your data.
Core Concepts
Broad technical terms
Object | Description |
argset | A named list containing a set of arguments. |
analysis |
These are the fundamental units that are scheduled in
|
plan |
This is the overarching “scheduler”:
|
Different types of plans
Plan Type | Description |
Single-function plan | Same action function applied multiple times with different argsets applied to the same datasets. |
Multi-function plan | Different action functions applied to the same datasets. |
Plan Examples
Plan Type | Example |
Single-function plan | Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing). |
Single-function plan | Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots). |
Multi-function plan | Creating the output for a report (e.g. multiple different tables and graphs). |
Basic Usage
Let’s start with a simple example that demonstrates the core concepts:
## plnr 2025.3.19
## https://www.csids.no/plnr/
library(ggplot2)
library(data.table)
# Create a new plan
p <- Plan$new()
# Add data
p$add_data(
name = "deaths",
direct = data.table(deaths=1:4, year=2001:2004)
)
# Add argsets for different years
p$add_argset(
name = "fig_1_2002",
year_max = 2002
)
p$add_argset(
name = "fig_1_2003",
year_max = 2003
)
# Define analysis function
fn_fig_1 <- function(data, argset) {
plot_data <- data$deaths[year <= argset$year_max]
ggplot(plot_data, aes(x=year, y=deaths)) +
geom_line() +
geom_point(size=3) +
labs(title = glue::glue("Deaths from 2001 until {argset$year_max}"))
}
# Apply function to all argsets
p$apply_action_fn_to_all_argsets(fn_name = "fn_fig_1")
# Run analyses
p$run_one("fig_1_2002")
Advanced Features
Data Management
The framework ensures efficient data management by: - Loading data once and reusing across analyses - Separating data cleaning from analysis - Providing hash-based tracking of data changes
Debugging Tools
plnr
includes several tools to help with development and debugging:
# Access data directly
p$get_data()
## $deaths
## deaths year
## <int> <int>
## 1: 1 2001
## 2: 2 2002
## 3: 3 2003
## 4: 4 2004
##
## $hash
## $hash$current
## [1] "aa2296ddc2b519c9c2a3d1a39471ac9e"
##
## $hash$current_elements
## $hash$current_elements$deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
# Access specific argset
p$get_argset("fig_1_2002")
## $year_max
## [1] 2002
# Access analysis by name or index
p$get_analysis(1)
## $argset
## $argset$year_max
## [1] 2002
##
## $argset$index_analysis
## [1] 1
##
##
## $fn_name
## [1] "fn_fig_1"
# Use is_run_directly() for development
fn_analysis <- function(data, argset) {
if(plnr::is_run_directly()) {
data <- p$get_data()
argset <- p$get_argset("fig_1_2002")
}
# function continues here
}
Function Naming
When adding analyses, you can use either fn_name
or fn
:
# Using fn_name (recommended)
p$add_analysis(
name = "fig_1_2002",
fn_name = "fn_fig_1",
year_max = 2002
)
# Using fn (for function factories)
p$add_analysis(
name = "fig_1_2003",
fn = fn_fig_1,
year_max = 2003
)
Hash-based Caching
The framework uses hashing to track data changes:
# Create two plans with same data
p1 <- Plan$new()
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p1$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")
p2 <- Plan$new()
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths")
p2$add_data(direct = data.table(deaths=1:4, year=2001:2004), name = "deaths2")
# Same data has same hash
identical(p1$get_data()$hash$current_elements, p2$get_data()$hash$current_elements)
## [1] TRUE
# Different data has different hash
p1$add_data(direct = data.table(deaths=1:5, year=2001:2005), name = "deaths3")
p1$get_data()$hash$current_elements
## $deaths
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $deaths2
## [1] "82519debaef80054a7b2ed512f8dfb94"
##
## $deaths3
## [1] "d740b5c163d702dde31061bcd9e00716"
Best Practices
-
Data Organization
- Keep data cleaning separate from analysis
- Use meaningful names for datasets
- Document data structure and assumptions
-
Analysis Functions
- Always accept
data
andargset
parameters - Use
is_run_directly()
for development - Keep functions focused and single-purpose
- Always accept
-
Plan Structure
- Use meaningful names for argsets and analyses
- Group related analyses together
- Document plan structure and dependencies
-
Development Workflow
- Start with small examples
- Use debugging tools during development
- Test analyses individually before running full plan
Next Steps
- Read the Adding Analyses vignette for more detailed examples
- Check out the package website for additional resources
- Explore the function documentation with
help(package="plnr")