A1

A1: Analysis operates on contextualized data, never on raw output alone

Analysis operates on contextualized data, never on raw output alone — context is a prerequisite.

What does this mean?

A1 is the gate between Contextualize and Analyze. It states a hard prerequisite: no analysis — statistical, comparative, predictive, or AI-driven — should operate on data that has not passed through the contextualization sequence (C1–C4). Raw instrument output without scientific context is numbers without meaning. Analysis of numbers without meaning produces conclusions without validity.

The context dependency

Consider a concrete example. A statistical trending algorithm receives 500 chromatographic purity values and fits a regression line. Without context, the algorithm does not know:

  • Whether these 500 values come from the same compound or five different compounds with similar names
  • Whether the same analytical method was used for all measurements, or whether method revisions occurred mid-study
  • Whether the data spans one site or three, and whether inter-site variability is a factor
  • Whether some values were generated during method validation (not for stability trending), and should be excluded
  • Whether a reference standard lot change occurred during the study, introducing a systematic shift

Without context, the trend line is statistically valid but scientifically meaningless. The regression happily fits data from different compounds, different methods, and different purposes into a single trend — and the result is not wrong in a way that a statistical test detects. It is wrong in a way that only scientific context reveals.

Operational test

For every analytical computation in the governed system, verify: does the computation receive its inputs from the contextualized data layer (C1–C4), or does it directly query raw instrument data? If any analysis pathway bypasses contextualization, A1 is violated.

Why this principle exists

A1 exists because the most common failure mode in scientific data analysis is not bad statistics — it is good statistics applied to the wrong data. Organizations invest heavily in analytical tools (statistical packages, visualization platforms, AI/ML frameworks) and insufficiently in the contextual infrastructure that makes those tools' outputs trustworthy.

A1 inverts the usual priority: context first, then analysis. The analytical tool is only as reliable as the context of the data it operates on.

In pharmaceutical R&D, analysis frequently operates on sparse datasets where each data point is costly and contextually rich — 6 stability time points, 30 method validation injections, a handful of PK samples. A1’s requirement that analysis never operates on raw output alone is especially critical in these domains: statistical volume cannot compensate for missing context, so every axis of context matters.

External analytical tools — statistical software, multivariate analysis packages, visualization environments — typically operate on de-contextualized data: flat tables with column headers and numeric values. A1 does not require every tool to be ontology-aware. It requires that the governed system maintains the context-to-analysis link even when the tool does not. Context-preserving exports — carrying structured metadata (compound, method, study, conditions) alongside numeric values — are the mechanism. The tool may ignore the metadata, but the governed system records what was exported, to which tool, and with which context.

Antipattern

A data science team builds a stability prediction model by extracting purity values directly from the CDS database. The model achieves high accuracy on the training set. During validation, the team discovers that the training data included system suitability injections (not stability samples), data from a method version that was subsequently corrected, and measurements from two different compounds that share a project code prefix. The model learned patterns that do not exist in reality. Context would have excluded these data points before the model was trained.

Relationship to other principles

A1 enforces the sequential property of ICAD. You cannot analyze what is not contextualized, just as you cannot contextualize what is not integrated. A1 is the checkpoint that ensures the Analyze phase receives data that has the scientific meaning attached (C1), reconciled identities (C2), complete lineage (C3), and machine-readable context (C4). Without A1, analysis degrades to computation — technically correct, scientifically unreliable.