07 July, 2022

Integrated lab solutions

Modern lab requirements

  • High throughput of samples
  • Multiple parameters
  • Many (partly-)automatized techniques
  • Software and computer systems
  • Multifaceted and large data streams

A 2020 study of my own: atomic absorption spectroscopy, atomic emission spectroscopy, mass spectrometry, scanning electron microscopy, spectrophotometry, energy-dispersive X-ray spectroscopy, elemental analyser, and high-sensitive balance

Taking command of lab-data



  • Default commercial software (Cameca, Zeiss, Thermo Fisher)
  • Prevent tracking data from source to publication
  • Fragmented storage
  • Monitoring and troubleshooting is reduced to current analysis


Vendor lock-in

Opening-up the black box of lab-data

  • GUI based dashboards, wizards and dialogs that hide (part) of the transformations and calculations taking place
  • The reviewer that wants to trace back the origin of data
  • Old/defunct machinery

Is more data better?

  • New innovations
  • Inclusive science
  • More transparent science (proof of final published values)

xkcd.com

The integrated lab

  • Data collecting and harmonization
    • Parsing of unstructured data
    • Data normalization (SQL-like)
  • Modular processing, analysis, and diagnostics suite
    • Count statistics, spectral analysis, regression, …
  • Online monitoring
    • Dashboards of the lab’s long-term reproducibility
    • Troubleshooting

The integrated lab

  • Data collecting and harmonization
    • Parsing of unstructured (meta)data
    • Data normalization (SQL-like)
  • Modular processing, analysis, and diagnostics suite
    • Count statistics, spectral analysis, regression, …
  • Online monitoring
    • Dashboards of the lab’s long-term reproducibility
    • Troubleshooting

Data collecting and harmonization

Custom solutions

  • Deciphering the vendor’s data-model is labor intensive
    • Multiple files
    • Many observations
    • Inconsistent syntax
    • Unstructured
  • Accommodate vendor’s software/data-model updates



An universal solution?

Parsing lab-data

Text data (encoded or decoded)

Unstructured raw data files from analytical laboratory

Unstructured raw data files from analytical laboratory

Proposed solutions

Three possible solutions, which require varying degrees of human intervention:

  1. A mechanism to aid the location of variables based on user input
  2. A human-crafted (and adaptable) rule based system
  3. A natural language processing approach involving self-supervised machine learning

The last two solutions would be preceded by a step entailing text normalization through tokenization.

iRODS and user accesibility

Integration with iRODS

  • Sub-system for automated ingest
  • Automated workflows
  • Better collaboration

Accessibility

  • Interfaces for R and Python (standalone usage)

Evan-Amos, 2011. A SanDisk Cruzer USB drive from 2011, with 4 GB of storage capacity Wikipedia

Implementation




panacea: Portable ANalytical data Aggregation and Coordination for database Entry and Access

  • C++ for optimal performance with large datasets
  • R and python bindings for user-friendliness and standalone usage

auxiliary

  • Updating rirods (irods/irods_client_library_r_cpp) to work with iRODS from R
  • Restrictive and complex system requirements not ideal for R and C++ integration

Roadmap

Long-term goals

The integrated lab will foster:

  • more efficient labs and innovations
  • better open science practices
  • inclusive science

Stimulate a push in the industry of lab equipment towards open sourced software solutions

FAIReLABS