Data Model & Metadata
Data file definition
The expected output of analytical instrument operation is a set of results in electronic form. A “data file” corresponds to a sample and contains both raw acquisition streams and processed results. Raw streams may include spectra, voltages, UV/Vis absorption, temperature, solvent composition, pressure, and other signals as functions of time.
Requirements for raw data
Raw data is the primary evidence and must be accompanied by metadata sufficient for traceability, diagnostics, and long-term scientific discussion. Capturing the original ADC signal (plus scaling parameters) can be preferable to saving only converted user-space values, because it preserves diagnostic information.
Raw data metadata should include at least:
- Instrument identification (e.g., serial number)
- Instrument configuration (tray, valves, detectors, ion source, etc.)
- Software configuration (versions, configured options)
- Operator and computer identification
- Instrument diagnostic signals (power monitors, detector runtime, reference signals)
- Calibration information (results, calibration sample, operator approval)
- Time stamps (including time since power-on/reset when relevant)
- Event in/out data for synchronization with collaborating devices
Requirements for processed data
Processed data is derived from raw data through algorithms (spectra, chromatograms, quantitation, identification results, etc.). Processing may be interactive or batch-based, and may involve multiple stages. Processed results should inherit identifying metadata from raw data and additionally record: processing operator, processing computer, and software versions used during processing.
Audit and regulated environments
In regulated environments, altering derived results is not necessarily prohibited, but create/modify/delete activities should be recorded with reasons and signatures (audit trail concepts such as those referenced in FDA 21 CFR Part 11). The data model should support traceable history of processed results.