General Approach to mapping datafiles

The General approach to mapping health data files is based on using predefined functions that will translate from a specific coding system and type of code into the proper table. This is made difficult by the mapping of a single coding system into multiple destination tables. For example:
ICD9 codes may map into Conditions, Observations or Procedures.
CPT4 codes may map into Observation, Procedures, Measurements or Drugs.

While these codes may be thought of having a default definition ( ICD9 = Condition or CPT4 = procedure ), all the matching concepts must be addressed. If a specific code does not match a defined concept, then the record must be created in the default table with a concept_id of 0. If the code gets mapped later by the vocabulary team, all current unmapped concepts may be updated easily. As the source code value is kept with the record, specific unmapped resources may still be used referencing this source value.

Another factor to consider are the ETL conventions used by PEDSnet PCORI project and the suggestions by OMOP. They provide some good opinions on the conversion and would be useful if involved with PEDSnet.

One issue that must be taken into consideration is to make sure the unmapped concepts get mapped to the default table with the concept_id of 0. Ensuring that all codes get mapped usually results in multiple passes over the dataset.

Another issue is when files contain a list of codes ( diag_code_1, diag_code_2, diag_code_3 ) . These must be flattened and the relative position kept to be used for the "type_concept_id" ( primary_code, 1st_code, 2nd_code, etc ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Approach to mapping datafiles

Clone this wiki locally