Scientific Data Principles

The ICAD Principles

FAIR describes what scientific data should be.
ICAD defines how it drives decisions at scale.

In 2016, the FAIR principles gave the scientific community a shared vocabulary for data quality — Findable, Accessible, Interoperable, Reusable. A decade later, pharma R&D organizations have FAIR data management plans filed with every funding agency and regulatory body.
And they still cannot answer: what is the stability trend across our portfolio?
ICAD fills the operational gap that FAIR left open.

Download as PDF Cite ICAD

Watch

The ICAD Principles in 3 Minutes

Four principles. One compounding sequence. The operational framework that turns governed scientific data into AI-enabled decisions.

The Gap

From FAIR Data to Actionable Intelligence

FAIR (2016)

The FAIR principles defined what good data looks like. They answered: can this data be found? Can it be accessed? Can it interoperate with other data? Can it be reused? These are properties of data at rest — criteria for data quality, not instructions for data action.

The Gap

FAIR does not address what happens next. How does raw scientific output become a governed, contextualized, analyzable dataset that AI can act on? FAIR assumes the infrastructure exists. In most pharma R&D organizations, it does not.

ICAD (2026) defines the operational sequence. Four steps, each building on the last.
Not properties of data, but operations on data. Not a checklist — a compounding sequence.

The Specification

Four Principles. One Compounding Sequence.

Each principle builds on the one before it. Skip a step and the sequence breaks.

Integrate

Connect every scientific data source to a governed pipeline.

I1. Every scientific output is captured at the point of creation, not exported manually, not aggregated after the fact.
I2. Raw data is preserved in its native format with full provenance to source system, timestamp, and operator.
I3. Integration is industrialized — each build creates a reusable asset that reduces the cost and time of the next integration.
I4. New data sources are onboarded in days, not months — the integration pipeline is a factory, not a project.

What it enables

Scientific data — any vendor, any format, any site — enters a governed pipeline, eliminating manual exports, CSV transfers, and site-specific data silos. Each new data source connected enriches the dataset available to every downstream step in the sequence. Governed ELN integration captures experimental records with full provenance — protecting intellectual property by establishing timestamped, traceable evidence of inventorship and priority.

Contextualize

Add scientific meaning to raw data.

C1. Data is linked to its scientific context — method, sample, experiment, study, program, and regulatory submission.
C2. Master data is reconciled across sites, systems, and naming conventions — one identity per entity, globally.
C3. Lineage traces every data point from instrument through transformation to decision, with no gaps.
C4. Context is readable by scientists and machines — not trapped in PDF reports, Electronic Laboratory Notebook (ELN) narratives, or spreadsheet column headers.

What it enables

Scientists query across programs, sites, and therapeutic areas using scientific terms — not system identifiers. Regulatory teams trace any data point from submission back to the originating instrument run. Master data reconciliation eliminates the "same compound, five names" problem that plagues multi-site R&D. Contextualized ELN records with full lineage strengthen intellectual property claims — tracing from a patent filing back to the original experimental record with governed provenance.

Analyze

Generate intelligence across programs and sites.

A1. Analysis operates on contextualized data, never on raw output alone — context is a prerequisite.
A2. Cross-program comparisons are governed — same method, same context rules, same statistical framework.
A3. Statistical models are traceable to their training data with full provenance — no black-box analytics.
A4. Insights are reproducible — any analyst, any site, same governed dataset produces equivalent results within governed tolerances.

What it enables

Out-of-specification (OOS) investigations that previously took weeks complete in hours because root-cause analysis operates on governed, cross-referenced data. Method transfers between sites carry full lineage. Stability trending spans programs and geographies. Analytical method comparisons are reproducible by any qualified analyst at any site.

Decide

Act with AI grounded in governed data.

D1. AI operates only on data that has passed through I→C→A — never on ungoverned, decontextualized inputs.
D2. Every decision — human, AI-assisted, or automated — is traceable to its source data, analytical provenance, and decision logic.
D3. Autonomy is configurable per decision type — from fully autonomous execution for routine operations to human-in-the-loop approval where regulatory classification or risk profile requires it.
D4. Decisions feed back into the sequence — improving integration targets, context models, and analytical frameworks.

What it enables

Every decision that acts on governed data — human, AI-assisted, or automated — is traceable to its source data and analytical provenance. AI recommendations include full I→C→A lineage. Human decisions on governed data carry the same provenance chain. IND application compilation draws on governed data across disciplines — no manual assembly from disconnected systems. CMC sections reference live, traceable analytical data. Laboratory workflows close the loop from experiment design through execution to result — with configurable autonomy that scales from full automation for routine operations to human-gated approval where GxP classification requires it. Shelf-life predictions and trend analyses operate on complete, multi-program datasets rather than single-study snapshots.

Why Sequence Matters

Each Step Makes the Next One More Valuable

This is what separates ICAD from a checklist. FAIR principles are independent — you can make data Findable without making it Accessible. ICAD principles are sequential and compounding.

Integration that has been contextualized is exponentially more valuable than raw integration. Analysis built on contextualized data is exponentially more reliable than analysis on raw feeds. Decisions from governed analysis are exponentially more trustworthy than decisions from ungoverned models.

This is why integration 50 makes the entire dataset more valuable than integration 1 did — because every new data source enriches the context, sharpens the analysis, and improves the decisions. And this is why AI without governed data is guesswork — you skipped three steps.

Complementary Principles

FAIR + ICAD

FAIR

2016

Defines: Data properties
Question: "Is this data good?"
Principles: Findable, Accessible, Interoperable, Reusable
Relationship: Independent — any order
Scope: Data at rest

ICAD

2026

Defines: Data operations
Question: "How do you drive decisions at scale?"
Principles: Integrate, Contextualize, Analyze, Decide
Relationship: Sequential — compounding
Scope: Data in motion

ICAD assumes FAIR. You cannot contextualize data that is not findable. You cannot analyze data that is not interoperable. FAIR is the prerequisite. ICAD is the operating sequence. Together, they define the full lifecycle from data quality to data action.

Reference

Cite the ICAD Principles

ZONTAL (2026). The ICAD Principles: Integrate, Contextualize, Analyze, Decide — A compounding sequence for scientific AI operations in Pharmaceutical R&D. Available at: https://zontal.io/icad-principles

Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to share and adapt this material for any purpose, including commercial, with attribution.

Propose an amendment →

Sources

References

[1] Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. doi:10.1038/sdata.2016.18
[2] U.S. Food and Drug Administration. (2003). 21 CFR Part 11: Electronic Records; Electronic Signatures. Title 21, Code of Federal Regulations.
[3] European Commission. (2011). EU GMP Annex 11: Computerised Systems. EudraLex Volume 4.
[4] ICH Expert Working Group. (2003). ICH Q1A(R2): Stability Testing of New Drug Substances and Products. International Council for Harmonisation.
[5] ICH Expert Working Group. (2005). ICH Q2(R1): Validation of Analytical Procedures: Text and Methodology. International Council for Harmonisation.
[6] Pistoia Alliance. (2024). AI in Pharmaceutical R&D: State of Adoption Survey.
[7] Deloitte. (2024). AI in Life Sciences: From Pilot to Production. Deloitte Insights.
[8] Evaluate Pharma. (2024). World Preview 2024: Patent Cliff Analysis. Evaluate Ltd.
[9] Allotrope Foundation. (2024). Allotrope Framework and Simple Model (ASM).
[10] ASTM International. (2024). E1947: Standard Specification for Analytical Data Interchange Protocol (AnIML). ASTM International.
[11] Martens, L. et al. (2011). mzML — a community standard for mass spectrometry data. Molecular & Cellular Proteomics, 10(1), R110.000133. doi:10.1074/mcp.R110.000133
[12] SiLA 2 Consortium. (2024). SiLA 2 — Standardization in Lab Automation.
[13] OPC Foundation. (2024). OPC Unified Architecture Specification.
[14] WHO Expert Committee. (2006). Annex 4: Supplementary guidelines on good manufacturing practices: validation. WHO Technical Report Series No. 937.
[15] European Medicines Agency. (2023). Guideline on computerised systems and electronic data in clinical trials. EMA/226170/2021.
[16] Medicines and Healthcare products Regulatory Agency. (2018). GxP Data Integrity Guidance and Definitions. MHRA.
[17] Pharmaceutical Inspection Co-operation Scheme. (2021). PI 041-1: Good Practices for Data Management and Integrity in Regulated GMP/GDP Environments. PIC/S.
[18] U.S. Food and Drug Administration & European Medicines Agency. (2026). Guiding Principles of Good AI Practice in Drug Development. FDA/EMA.
[19] International Organization for Standardization. (2023). ISO/IEC 42001: Artificial Intelligence — Management System. ISO.
[20] Nature Editorial. (2026). Human skills remain essential in AI-driven pharmaceutical research. Nature, February 2026.
[21] Pharma Manufacturing. (2025). FDA Complete Response Letter Analysis: Quality and Manufacturing Issues. Pharma Manufacturing.
[22] Creative Commons. (2013). Attribution 4.0 International (CC BY 4.0).

Where Are You on the ICAD Sequence?

Most pharmaceutical organizations have integrated some of their scientific data sources. Few have contextualized them with governed scientific meaning. Fewer still analyze across programs and geographies. Almost none make AI-enabled decisions grounded in governed, traceable data.

Assess Your ICAD Maturity