07 December, 2022

iRODS + R

Introduction to R

  • Declarative programming centered around functions and logic (OOP capable)
  • Emphasis on statistics and visualization of data
  • Used by researchers and industry
  • Open sourced, active useRs community (25610 packages)

kdnuggets.com

Why iRODS?

Never again wonder what method did I use to center variable “foo” in my regression model … ?

  • But what about the data itself?
    • SQLite, MySQL, PostgreSQL, MonetDB with DBI package and ODBC drivers
    • iRODS?

Opening-up the black box with iRODS + R

  • Store unprocessed data
  • Share data and scripts with collaborators
  • Reviewer trace back the origin of data and imposed manipulations
  • Publish everything = open science

xkcd.com

Design + Implementation

The foundation

Global Design

  • Functional style (modular build-up)
  • Mimic iCommands
  • Strictly user facing
  • Interactive + batch scripts

Advanced R by Hadley Wickham

Design

  • Authentication
    • connect with iRODS server
    • authenticate



# configuration file
create_irods("<host>", "<zone>")
# authenticate
iauth()

Design

  • Navigation
    • icommand like



# current working collection
ipwd()
# change working collection
icd("<path>")
# list
ils()

Design

  • Objects/files
    • icommand like



foo <- 1:10
# configuration file
iput(foo)
# authenticate
iget("foo") 
# or 
iget(foo) # ?

Design

  • Data discovery
    • imeta vs iquest



# add some metadata
imeta(
  "foo", 
  "data_object", 
  operations = 
    list(operation = "add", attribute = "foo", value = "bar", units = "baz")
)
# discover 
iquery("SELECT COLL_NAME, DATA_NAME WHERE COLL_NAME LIKE '/tempZone/home/%'")

Implementation

  • Curl in R
    • R interface to libcurl curl (Ooms 2022a)
    • Wrapper httr2 (Wickham 2022) for curl and jsonlite (Ooms 2022b)
  • Development + Testing
    • irods demo server docker-compose up -d nginx-reverse-proxy
    • Testing with mocking httptest2 (Richardson 2022)
    • Automatic updates of snapshots with GitHub actions
    • R CMD check without internet (simulate CRAN checks)

Future plans

Contribute

Roadmap

  • Submission to CRAN
  • Official release at UGM (summer 2023)
  • More R packages: datamanager + panacaea


FAIReLABS

References