Skip to contents

Concept

The concept behind org is straightforward - most analyses have three main sections:

  • Code: Analysis scripts and functions
  • Results: Output files and figures
  • Data: Input data files

Each section has unique requirements:

Code Requirements

  • Must be version controlled
  • Should be publicly accessible
  • Needs a single analysis pipeline documenting all steps
  • Should be organized into modular functions

Results Requirements

  • Must be immediately shareable with collaborators
  • Should maintain a history of changes over time
  • Should be organized by date for tracking
  • Should be stored in a shared location (e.g., Dropbox)

Data Requirements

  • Should be encrypted if sensitive
  • Should not be stored on cloud if sensitive
  • Should be organized by project/analysis
  • Should maintain clear separation from code and results

Project Structure

Core Components

1. org::initialize_project

This is the main function that sets up your project structure. It takes 2+ arguments and saves folder locations in org::project for use throughout your analysis:

  • home: Location of Run.R and the R/ folder (accessible via org::project$home)
  • results: Results folder that creates date-based subfolders (accessible via org::project$results_today)
  • ...: Additional folders as needed (e.g., data_raw, data_clean)

2. Run.R

This is your main analysis script that orchestrates the entire workflow:

  • Data cleaning
  • Analysis
  • Result generation

All code sections should be encapsulated in functions in the R/ folder. You should not have multiple main files, as this creates confusion when returning to your code later. However, you can have versioned files (e.g., Run_v01.R, Run_v02.R) where later versions supersede earlier ones.

3. R/ Directory

All analysis functions should be defined in org::project$home/R. The initialize_project function automatically sources all R scripts in this directory.

Example Project Structure

Here’s a complete example of how to structure your project:

# Initialize the project
org::initialize_project(
  env = .GlobalEnv,
  home = "/git/analyses/2019/analysis3/",
  results = "/dropbox/analyses_results/2019/analysis3/",
  data_raw = "/data/analyses/2019/analysis3/"
)

# Document changes in archived results
txt <- glue::glue("
  2019-01-01:
    Included:
    - Table 1
    - Table 2
  
  2019-02-02:
    Changed Table 1 from mean -> median
", .trim=FALSE)

org::write_text(
  txt = txt,
  file = fs::path(org::project$results, "info.txt")
)

# Load required packages
library(data.table)
library(ggplot2)

# Run analysis
d <- clean_data()  # Accesses data from org::project$data_raw
table_1(d)         # Saves to org::project$results_today
figure_1(d)        # Saves to org::project$results_today
figure_2(d)        # Saves to org::project$results_today

Research Article Versioning

When writing research articles, you often need multiple versions (initial submission, resubmissions). org helps manage this by using date-based versioning:

  1. Initial submission:
    • Rename Run.R to Run_YYYY_MM_DD_submission_1.R
    • Rename R/ to R_YYYY_MM_DD_submission_1/
  2. Resubmission:
    • Create new files with updated dates
    • Keep old versions for reference

This preserves the code that produced results for each submission, ensuring all changes are deliberate and intentional.

Best Practices

Store your project components in appropriate locations:

# Code (GitHub)
git/
└── analyses/
    ├── 2018/
    │   ├── analysis_1/          # org::project$home
    │   │   ├── Run.R
    │   │   └── R/
    │   │       ├── clean_data.R
    │   │       ├── descriptives.R
    │   │       ├── analysis.R
    │   │       └── figure_1.R
    │   └── analysis_2/
    └── 2019/
        └── analysis_3/

# Results (Dropbox)
dropbox/
└── analyses_results/
    ├── 2018/
    │   ├── analysis_1/          # org::project$results
    │   │   ├── 2018-03-12/     # org::project$results_today
    │   │   │   ├── table_1.xlsx
    │   │   │   └── figure_1.png
    │   │   ├── 2018-03-15/
    │   │   └── 2018-03-18/
    │   └── analysis_2/
    └── 2019/
        └── analysis_3/

# Data (Local)
data/
└── analyses/
    ├── 2018/
    │   ├── analysis_1/          # org::project$data_raw
    │   │   └── data.xlsx
    │   └── analysis_2/
    └── 2019/
        └── analysis_3/

Alternative Structures

RMarkdown Project

For projects on a shared network drive without GitHub/Dropbox:

project_name/              # org::project$home
├── Run.R
├── R/
│   ├── CleanData.R
│   ├── Descriptives.R
│   ├── Analysis1.R
│   └── Graphs1.R
├── paper/
│   └── paper.Rmd
├── results/              # org::project$results
│   └── 2018-03-12/      # org::project$results_today
│       ├── table1.xlsx
│       └── figure1.png
└── data_raw/            # org::project$data_raw
    └── data.xlsx

Single Folder Project

For projects with limited access:

project_name/              # org::project$home
├── Run.R
├── R/
│   ├── clean_data.R
│   ├── descriptives.R
│   ├── analysis.R
│   └── figure_1.R
├── results/              # org::project$results
│   └── 2018-03-12/      # org::project$results_today
│       ├── table_1.xlsx
│       └── figure_1.png
└── data_raw/            # org::project$data_raw
    └── data.xlsx

Path Naming Conventions

Understanding path components is important:

Component Name
/home/richard/test.src Absolute (file)path
richard/test.src Relative (file)path
/home/richard/ Absolute (directory) path
./richard/ Relative (directory) path
richard Directory
test.src Filename

A path specifies a location in a directory structure, while a filename only includes the file name itself. Directories only include directory name information.