Data management and subsequent downstream recycling of data is increasingly seen as a primer for future innovation and new discoveries. Above all, good data management practices are critical to open science and an inclusive, connected worldwide academic community—providing opportunities for developing countries that do not have the same resources for data generation as wealthy countries.

Solutions for better data management infrastructures, such as formalised in the Findable, Accessible, Interoperable, and Reusable (FAIR) data guiding principles (Wilkinson et al. 2016), are an active, yet developing field of research. Hence, researchers in every stage of their career; from beginning PhDs to a professor running a large research group, have either only limited knowledge of good data management and open science practices, or do not have the capacity to invest time and resources in familiarising themselves with current developments in this field.

The task of optimizing data management infrastructures can be especially daunting for laboratories populated by a range of analytical instruments—each with their own vendor supplied software suite for data processing, analysing, and diagnostics. The wealth of commercial analytical instruments results in various data models which are not easily integrated. This so-called “vendor lock-in” further prevents transparency of the workflow from raw to analysed data.

Low-level access to raw data and insights in workflows is not necessary for all scientist, but it can be important for special purpose research questions and the reproducibility of published research. On top of that, many disciplines of natural sciences rely on field observations and on-site experiments besides laboratory output. All this data should ideally be standardized, integrated, and managed at one central point throughout the life cycle from raw, analysed, published to archived data.

Solutions to the previous cited problems can be addressed at three levels.

  1. Raising awareness for good open science practices, and its merits for inclusiveness, at the level of the individual
  2. Through auditing of existing data management infrastructures of institutes
  3. By researching the actions needed to facilitate universally centralised data management infrastructures and full transparency of workflows by actively developing open-sourced solutions


Individuals

Training of scientist in how to manage their data effectively, and how to make their science reproducible, is a first step to improve open science practices of the community as a whole. This includes the usage of metadata and standards for data management as well as practical and digital solutions to improve practices for secure storage and maintenance of data (Briney 2015).

Institutes

Providing support for open science solutions in the lab. This is achieved by evaluating existing laboratory and data management infrastructures and develop strategies on how to upgrade to, a rigid, centralized structure that can accommodate automated data ingestion from various sources.

Universally

Universal solutions can be achieved through active development of data retrieval and harmonization tools, and the creation of transparent, customizable and shareable workflows, that can replace vendor supplied software for data processing, analyzing and diagnostics. These solutions should be based upon solid research of existing practices and workflow routines, and by identifying common bottlenecks that prevent data from being FAIR, and workflows from being reproducible.


Briney, Kristin. 2015. Data Management for Researchers: Organize, maintain and share your data for research success. Pelagic Publishing Ltd.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. Comment: The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data 3: 1–9. https://doi.org/10.1038/sdata.2016.18.