A2: Cross-program comparisons are governed
Cross-program comparisons are governed — same method, same context rules, same statistical framework.
What does this mean?
A2 addresses the most valuable and most dangerous capability that scientific data infrastructure enables: comparison across programs, sites, and time. Valuable because cross-program comparison reveals insights invisible within a single study — process trends, method performance patterns, compound class behaviors. Dangerous because ungoverned comparison produces misleading conclusions that look authoritative.
A2 requires that every cross-program comparison operates under explicit governance rules: which data is included, which method version is used as the basis, which statistical framework is applied, and which context filters are enforced. These rules must be defined, versioned, and auditable.
The governance requirement
Governed comparison requires explicit rules in three domains:
- Method equivalence: Are the analytical methods comparable? A dissolution test using USP Apparatus I at 50 rpm cannot be compared with one using Apparatus II at 75 rpm without a method bridging study. The comparison framework must enforce method equivalence checks before permitting cross-site or cross-study aggregation.
- Context alignment: Is the scientific context comparable? Stability data at 25°C/60% RH cannot be pooled with accelerated data at 40°C/75% RH for long-term trending without explicit modeling of the temperature/humidity effect. The comparison framework must verify that context variables are aligned or explicitly modeled.
- Statistical framework: Is the statistical approach appropriate? Comparing assay values across sites requires accounting for inter-site variability (a random effect, not a nuisance factor). The comparison framework must apply the correct statistical model for the comparison type.
Take any cross-site or cross-program analysis generated by the system. Can you identify the governance rules that determined: which data was included, which was excluded, why, and which statistical model was applied? If these rules are implicit (embedded in an analyst's Python script) rather than explicit (declared in the governed framework), A2 is not satisfied.
Governance rules themselves have a lifecycle. They are created, versioned, and eventually superseded — and the governed system must record which rules applied when data was created, not just which rules exist now. In pharmaceutical R&D, multiple GxP frameworks apply simultaneously: the same dataset may be governed under GLP (preclinical), GMP (manufacturing), and GCP (clinical) rules concurrently. Regulatory context — GxP classification, jurisdiction, applicable guidance version — evolves over time. Data created under one regulatory framework may be inspected years later under updated guidance. A2 requires that this temporal and multi-framework governance is explicit, versioned, and auditable.
Why ungoverned comparison is dangerous
Consider a scenario: a portfolio analytics dashboard shows that Compound A's impurity profile is trending upward across all manufacturing sites. This triggers a CAPA investigation. Upon investigation, the team discovers that the "upward trend" is an artifact of pooling data from two different HPLC method versions — the updated method has improved sensitivity and detects a previously below-limit-of-detection (LOD) impurity. The impurity level has not changed. The method's detection capability has.
Without governed comparison, this artifact is invisible. The dashboard displays the aggregated trend; the underlying method version change is not filtered or flagged. A2 prevents this class of error by requiring method equivalence verification before aggregation.
ICH Q1E defines statistical approaches for stability data evaluation, including pooling criteria for batches. A governed comparison framework for stability trending: (1) verifies all data points use the same method version, (2) applies the ICH Q1E poolability test before pooling batches, (3) applies the appropriate regression model (linear or nonlinear) based on the data pattern, and (4) flags any data points where context variables (site, method, storage condition) differ without explicit governance rules permitting the comparison.
Relationship to other principles
A2 depends on C2 (reconciled master data) to confirm that data labeled for comparison actually refers to the same entities. A2 depends on C1 (scientific context linking) to verify method equivalence and context alignment. A2 enables D1 (governed inputs for AI) — an AI system deciding on process adjustments requires that its input comparisons are governed, not artifacts of data heterogeneity.