Automated Schema Conversion

Utilize RWD in regulatory submissions and accelerate research with AI-driven schema conversion

Converting healthcare data into standard formats—such as CDISC SDTM, OMOP, or custom schemas — is a significant bottleneck in life sciences. Manual mapping is slow, expensive, and introduces risks to data fidelity.

Cornerstone accelerates the transformation of diverse source data into high-quality, submission- and analysis-ready formats in a fraction of the time, without sacrificing traceability. 

The Challenge

Evaluating quality and cleaning data shouldn’t slow down life sciences research, but it often does:

Mapping Bottleneck

Mapping source data to standard domains, standardizing terminology, and validating compliance can take weeks to months of FTE time.

Fragmented Data Silos

Traditional extract, transform, and load (ETL) pipelines are rigid and often break with slight changes to source data.

Data Integrity Gaps

Errors in source data, such as poor standardization rates, missing values, or anomalies, often carry through to converted outputs.

Auditability

Traceability is critical for regulators, yet preserving a clear, auditable trail from each transformed variable back to source values remains difficult.

Product

Automatically Convert Raw Native Schemas to Analysis-Ready Standards

Mapping

Algorithmic mapping to detect source structure and understand data contents across any native data schema

Standardization

Built-in support for MedDRA, SNOMED, RxNorm, WHODRUG, LOINC, and more

Validation

Automated quality validation and error detection pre/post conversion

Audit Trail

Produce fully harmonized datasets with a comprehensive audit trail

Labs: Standardization across audiences

OMOP and SDTM both standardize laboratory data but for different audiences. OMOP uses LOINC for cross-system clinical interoperability, while SDTM uses CDISC controlled terminology for regulatory submission. OMOP relies heavily on concept IDs which are used to translation between different ontologies, whereas CDISC utilizes interpretable, English descriptions for test names, units, specimens, etc.

Before: OMOP MEASUREMENT Table
MEASUREMENT_ID
10001
PERSON_ID
1042
MEASUREMENT_CONCEPT_ID
3000963
MEASUREMENT_CONCEPT_NAME
Cholesterol [Mass/volume] in Serum or Plasma
MEASUREMENT_DATE
2023-03-15
VALUE_AS_NUMBER
4.2
UNIT_CONCEPT_ID
8713
UNIT_CONCEPT_ID
gram per deciliter
RANGE_LOW
3.5
RANGE_HIGH
5.5
VISIT_OCCURRENCE_ID
9901
After: SDTM LB (Labratory Results)
USUBJID
STUDY01-1042
LBSEQ
1
LBTESTCD
HGB
LBTEST
Hemoglobin
LBSPEC
SERUM / PLASMA
LBCAT
HEMATOLOGY
LBORRES
4.2
LBORRESU
g/dL
LBSTRESN
4.2
LBSTRESU
g/dL
LBSTNRLO
3.5
LBSTNRHI
5.5
LBNRIND
NORMAL
LBDTC
2023-03-15
LBDY
15
VISITNUM
2
VISIT
Week 2
Before: OMOP MEASUREMENT Table
MEASUREMENT_ID
10001
PERSON_ID
1042
MEASUREMENT_CONCEPT_ID
3000963
MEASUREMENT_CONCEPT_NAME
Cholesterol [Mass/volume] in Serum or Plasma
MEASUREMENT_DATE
2023-03-15
VALUE_AS_NUMBER
4.2
UNIT_CONCEPT_ID
8713
UNIT_CONCEPT_ID
gram per deciliter
RANGE_LOW
3.5
RANGE_HIGH
5.5
VISIT_OCCURRENCE_ID
9901
After: SDTM LB (Labratory Results)
USUBJID
STUDY01-1042
LBSEQ
1
LBTESTCD
HGB
LBTEST
Hemoglobin
LBSPEC
SERUM / PLASMA
LBCAT
HEMATOLOGY
LBORRES
4.2
LBORRESU
g/dL
LBSTRESN
4.2
LBSTRESU
g/dL
LBSTNRLO
3.5
LBSTNRHI
5.5
LBNRIND
NORMAL
LBDTC
2023-03-15
LBDY
15
VISITNUM
2
VISIT
Week 2
Medications:
Splitting drugs requires protocol context

OMOP models all drug exposures uniformly in a single table, with no distinction between investigational and background medications. SDTM splits these into two separate domains with fundamentally different meanings, EX and CM, a categorisation that requires external protocol knowledge to impose, since it does not exist anywhere in the source data.

One table, split into two domains
Scroll right to see full chart →
Drug Exposure (OMOP)
Drug Exposure ID
Person ID
Drug Concept ID
Drug Concept Name
Drug Exposure Start
Drug Exposure End
Quantity
Days Supply
Sig
Route Concept ID
Dose Unit Concept ID
6101
1042
1503297
metformin
2022-11-01
2023-06-30
500
30
500mg BID oral
4132161
8576
5001
1042
37396350
Pembrolizumab
2023-03-01
2023-03-01
200
1
200
4171047
8576
6102
1042
1000560
ondansetron
2023-03-01
2023-03-03
8
3
8
4132161
8576
6103
1042
1518254
dexamethasone
2023-03-01
2023-03-02
4
2
4
4132161
8576
5002
1042
37396350
Pembrolizumab
2023-03-22
2023-03-22
200
1
200
4171047
8576
5003
1042
37396350
Pembrolizumab
2023-04-12
2023-04-12
200
1
200
4171047
8576
SDTM - CM
USUBJID
CMSEQ
CMTRT
CMDECOD
CMCAT
CMSCAT
CMDOSE
CMDOSU
CMROUTE
CMSTDTC
CMENDTC
CMENRTPT
CMSTDY
CMENDY
STUDY01-1042
1
METFORMIN
metformin
CONCOMI...
DIABET...
500
mg
ORAL
2022-11-01
2023-06-30
AFTER
-120
122
STUDY01-1042
2
ONDANSETRON
ondansetron
CONCOMI...
ANTIEM...
8
mg
ORAL
2023-03-01
2023-03-03
BEFORE
1
3
STUDY01-1042
3
DEXAMETHASONE
dexamethas...
CONCOMI...
PREMED...
4
mg
ORAL
2023-03-01
2023-03-02
BEFORE
1
2
SDTM - EX
USUBJID
EXSEQ
EXTRT
EXDOSE
EXDOSU
EXDOSFRM
EXDOSFRQ
CMDOSU
EXROUTE
EXSTDTC
EXENDTC
EXSTDY
VISITNUM
STUDY01-1042
1
PEMBROLIZUMAB
200
mg
SOLUTION FOR INFUSION
Q3W
mg
INTRAVENOUS
2023-03-01
2023-03-01
1
1
STUDY01-1042
2
PEMBROLIZUMAB
200
mg
SOLUTION FOR INFUSION
Q3W
mg
INTRAVENOUS
2023-03-22
2023-03-22
22
2
STUDY01-1042
3
PEMBROLIZUMAB
200
mg
SOLUTION FOR INFUSION
Q3W
mg
INTRAVENOUS
2023-04-12
2023-04-12
43
3
Diagnoses:
One table, two timing-based domains

OMOP records all conditions as clinical facts in a single table, with no structural distinction between a decades-old diagnosis and a rash that appeared after the first dose. SDTM splits these into two domains with fundamentally different regulatory meanings — MH and AE — separated by the temporal anchor of first dose, and with AE additionally requiring causality, severity, and seriousness assessments that OMOP has no native fields for, but in some case can be derived from other tables (not shown).

One table, split into two domains
Scroll right to see full chart →
OMOP - CONDITION_OCURRENCE
CONDITION_OCCURRENCE_ID
PERSON_ID
CONDITION_CONCEPT_ID
CONDITION_CONCEPT_NAME
CONDITION_START_DATE
CONDITION_END_DATE
STOP_REASON
3003
1042
317009
Asthma
2010-06-01
3001
1042
201826
Type 2 diabetes mellitus
2018-04-12
3002
1042
255848
Pneumonia
2021-09-03
2021-10-15
Resolved
4001
1042
31967
Nausea
2023-03-05
2023-03-09
Resolved
4002
1042
45951957
Rash
2023-03-22
4003
1042
378253
Headache
2023-04-14
2023-04-20
Resolved
SDTM - MH
USUBJID
CMSEQ
MHTERM
MHDECOD
MHBODSYS
MHSTDTC
CMDOSE
MHSTDY
STUDY01-1042
1
TYPE 2 DIABETES MELLITUS
Type 2 diabetes mellitus
Metabolism and nutrition disorders
2018-04-12
-1764
STUDY01-1042
2
PNEUMONIA
Pneumonia
Infections and infestations
2021-09-03
2021-10-15
-546
STUDY01-1042
2
ASTHMA
Asthma
Respiratory, thoracic and mediastinal disorders
2010-06-01
-4656
SDTM - AE
USUBJID
EXSEQ
AETERM
AEDECOD
AEBODSYS
AEREL
AEOUT
AESTDTC
AEENDTC
AESTDY
AEENDY
STUDY01-1042
1
NAUSEA
Nausea
Gastrointestinal disorders
RELATED
RECOVERED/RESOLVED
2023-03-05
2023-03-09
5
9
STUDY01-1042
2
RASH
Rash
Skin and subcutaneous tissue..,
POSSIBLY RELATED
NOT RECOVERED/NOT RE...
2023-03-22
22
STUDY01-1042
3
HEADACHE
Headache
Nervous system disorders
NOT RELATED
RECOVERED/RESOLVED
2023-04-14
2023-04-20
45
51

Example Use Cases

CDISC SDTM Conversion

Align and convert RWD and clinical trial schemas to CDISC-SDTM

OMOP Conversion

Convert native real-world data formats to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM)

RWD “Stacking” to Single Schema

Combine and harmonize distinct datasets into a single data schema, including some custom formats