TY - JOUR
T1 - Processing of electronic medical records for health services research in an academic medical center
T2 - Methods and validation
AU - Rahman, Nabilah
AU - Wang, Debby D.
AU - Ng, Sheryl Hui Xian
AU - Ramachandran, Sravan
AU - Sridharan, Srinath
AU - Khoo, Astrid
AU - Tan, Chuen Seng
AU - Goh, Wei Ping
AU - Tan, Xin Quan
N1 - Publisher Copyright:
© 2021 JMIR Publications Inc.. All rights reserved.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - Background: Electronic medical records (EMRs) contain a wealth of information that can support data-driven decision making in health care policy design and service planning. Although research using EMRs has become increasingly prevalent, challenges such as coding inconsistency, data validity, and lack of suitable measures in important domains still hinder the progress. Objective: The objective of this study was to design a structured way to process records in administrative EMR systems for health services research and assess validity in selected areas. Methods: On the basis of a local hospital EMR system in Singapore, we developed a structured framework for EMR data processing, including standardization and phenotyping of diagnosis codes, construction of cohort with multilevel views, and generation of variables and proxy measures to supplement primary data. Disease complexity was estimated by Charlson Comorbidity Index (CCI) and Polypharmacy Score (PPS), whereas socioeconomic status (SES) was estimated by housing type. Validity of modified diagnosis codes and derived measures were investigated. Results: Visit-level (N=7,778,761) and patient-level records (n=549,109) were generated. The International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) codes were standardized to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) with a mapping rate of 87.1%. In all, 97.4% of the ICD-9-CM codes were phenotyped successfully using Clinical Classification Software by Agency for Healthcare Research and Quality. Diagnosis codes that underwent modification (truncation or zero addition) in standardization and phenotyping procedures had the modification validated by physicians, with validity rates of more than 90%. Disease complexity measures (CCI and PPS) and SES were found to be valid and robust after a correlation analysis and a multivariate regression analysis. CCI and PPS were correlated with each other and positively correlated with health care utilization measures. Larger housing type was associated with lower government subsidies received, suggesting association with higher SES. Profile of constructed cohorts showed differences in disease prevalence, disease complexity, and health care utilization in those aged above 65 years and those aged 65 years or younger. Conclusions: The framework proposed in this study would be useful for other researchers working with EMR data for health services research. Further analyses would be needed to better understand differences observed in the cohorts.
AB - Background: Electronic medical records (EMRs) contain a wealth of information that can support data-driven decision making in health care policy design and service planning. Although research using EMRs has become increasingly prevalent, challenges such as coding inconsistency, data validity, and lack of suitable measures in important domains still hinder the progress. Objective: The objective of this study was to design a structured way to process records in administrative EMR systems for health services research and assess validity in selected areas. Methods: On the basis of a local hospital EMR system in Singapore, we developed a structured framework for EMR data processing, including standardization and phenotyping of diagnosis codes, construction of cohort with multilevel views, and generation of variables and proxy measures to supplement primary data. Disease complexity was estimated by Charlson Comorbidity Index (CCI) and Polypharmacy Score (PPS), whereas socioeconomic status (SES) was estimated by housing type. Validity of modified diagnosis codes and derived measures were investigated. Results: Visit-level (N=7,778,761) and patient-level records (n=549,109) were generated. The International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) codes were standardized to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) with a mapping rate of 87.1%. In all, 97.4% of the ICD-9-CM codes were phenotyped successfully using Clinical Classification Software by Agency for Healthcare Research and Quality. Diagnosis codes that underwent modification (truncation or zero addition) in standardization and phenotyping procedures had the modification validated by physicians, with validity rates of more than 90%. Disease complexity measures (CCI and PPS) and SES were found to be valid and robust after a correlation analysis and a multivariate regression analysis. CCI and PPS were correlated with each other and positively correlated with health care utilization measures. Larger housing type was associated with lower government subsidies received, suggesting association with higher SES. Profile of constructed cohorts showed differences in disease prevalence, disease complexity, and health care utilization in those aged above 65 years and those aged 65 years or younger. Conclusions: The framework proposed in this study would be useful for other researchers working with EMR data for health services research. Further analyses would be needed to better understand differences observed in the cohorts.
KW - Data curation
KW - Electronic medical records
KW - Health services
KW - Validation studies
UR - http://www.scopus.com/inward/record.url?scp=85066828747&partnerID=8YFLogxK
U2 - 10.2196/10933
DO - 10.2196/10933
M3 - Article
AN - SCOPUS:85066828747
VL - 6
JO - JMIR Medical Informatics
JF - JMIR Medical Informatics
IS - 4
M1 - e10933
ER -