MS-Cheminformatics

Analytical Science meets Computer Science

Data Model & Metadata

Data file definition

The expected output of analytical instrument operation is a set of results in electronic form. A “data file” corresponds to a sample and contains both raw acquisition streams and processed results. Raw streams may include spectra, voltages, UV/Vis absorption, temperature, solvent composition, pressure, and other signals as functions of time.

Requirements for raw data

Raw data is the primary evidence and must be accompanied by metadata sufficient for traceability, diagnostics, and long-term scientific discussion. Capturing the original ADC signal (plus scaling parameters) can be preferable to saving only converted user-space values, because it preserves diagnostic information.

Raw data metadata should include at least:

Requirements for processed data

Processed data is derived from raw data through algorithms (spectra, chromatograms, quantitation, identification results, etc.). Processing may be interactive or batch-based, and may involve multiple stages. Processed results should inherit identifying metadata from raw data and additionally record: processing operator, processing computer, and software versions used during processing.

Audit and regulated environments

In regulated environments, altering derived results is not necessarily prohibited, but create/modify/delete activities should be recorded with reasons and signatures (audit trail concepts such as those referenced in FDA 21 CFR Part 11). The data model should support traceable history of processed results.