C2: Master data is reconciled across sites, systems, and naming conventions
Master data is reconciled across sites, systems, and naming conventions — one identity per entity, globally.
What does this mean?
A pharmaceutical R&D organization with multiple sites, multiple LIMS instances, multiple ELN systems, and multiple CDS platforms will inevitably refer to the same entity by different names. The same compound is "ABC-1234" in LIMS-US, "Compound 1234" in LIMS-EU, and "ABC1234-HCl" in the formulation database. The same analytical method is "IMP-HPLC-v3" at one site and "Impurity Profile Method Rev C" at another.
C2 requires that the governed system maintains a reconciled master data layer: one canonical identity per compound, per method, per instrument, per site, per analyst — regardless of how many source systems refer to that entity by different names.
The reconciliation problem in pharma
Master data fragmentation is the single largest barrier to cross-site, cross-program scientific analysis. Its root cause is organizational, not technical: each site, each department, and each data system has its own naming conventions, and no governance mechanism enforces consistency. The consequences compound:
- Duplicate identities: The same compound appears as multiple entities in analytics, inflating counts and fragmenting trend data
- Missed connections: A stability trend for "Compound ABC-1234" at Site A cannot be compared with data for "ABC1234" at Site B because the system does not know they are the same molecule
- Regulatory risk: Submissions reference data by site-local identifiers. If a regulatory query asks for "all stability data for this compound," the organization cannot produce a complete answer without manual reconciliation across systems
Query the governed system for "all analytical data for compound X." Does the result include data from every site and every system that has produced data for that compound — regardless of the local identifier used? If not, C2 is not satisfied.
Reconciliation, not replacement
C2 does not require organizations to standardize naming conventions across all source systems — a multi-year governance effort that rarely succeeds. It requires a reconciliation layer that maps local identifiers to canonical identities. Source systems continue using their local names. The governed system maintains the mapping.
System heterogeneity in pharmaceutical R&D is permanent, not transitional. M&A brings acquired companies' LIMS and ELN systems that cannot be decommissioned as long as ongoing regulated work depends on them. Validated systems cannot be replaced without full revalidation — a multi-year, high-cost effort with regulatory risk. Different sites chose different platforms for legitimate operational reasons that predate the current data strategy. ICAD does not require consolidation. It requires reconciliation — a governed layer that unifies identities across systems while each system continues to operate.
This reconciliation layer must handle:
- Synonyms: multiple names for the same entity (ABC-1234, Compound 1234, ABC1234-HCl are all the same molecule)
- Hierarchy: a compound has forms (free base, hydrochloride, fumarate), each form has batches, each batch has samples
- Versioning: analytical methods are revisioned, instruments are calibrated and recalibrated, standard operating procedures (SOPs) are updated
- Cross-domain links: a sample measured by HPLC in the QC lab is the same sample registered in the LIMS, drawn from the same batch recorded in the formulation database
Three manufacturing sites run the same dissolution method for a solid oral dosage form. In LIMS-US it is "DOL-METH-001 Rev 5," in LIMS-EU it is "EU-DISS-v5.1," and in LIMS-AP it is "AP-Dissolution-5." The reconciliation layer maps all three to a single canonical method identifier, links each to the same SOP version, and enables cross-site method equivalence trending — a regulatory requirement for method transfer validation under ICH Q2(R2).
Relationship to other principles
C2 depends on C1 — you must first establish what a data point represents (its scientific context) before you can reconcile identities across systems. C2 enables A2 (governed cross-program comparisons) because comparison requires confirmed identity: you cannot compare stability trends for a compound across sites if you are not certain the data refers to the same compound.