• Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100020, P. R. China;
QIAN Qing, Email: qian.qing@imicams.ac.cn
Export PDF Favorites Scan Get Citation

Objective To construct a demand model for electronic medical record (EMR) data quality in regards to the lifecycle in machine learning (ML)-based disease risk prediction, to guide the implementation of EMR data quality assessment. Methods Referring to the lifecycle in ML-based predictive model, we explored the demand for EMR data quality. First, we summarized the key data activities involved in each task on predicting disease risk with ML through a literature review. Second, we mapped the data activities in each task to the associated requirements. Finally, we clustered those requirements into four dimensions. Results We constructed a three-layer structured ring to represent the demand model for EMR data quality in ML-based disease risk prediction research. The inner layer shows the seven main tasks in ML-based predictive models: data collection, data preprocessing, feature representation, feature selection and extraction, model training, model evaluation and optimization, and model deployment. The middle layer is the key data activities in each task; and the outer layer represents four dimensions of data quality requirements: operability, completeness, accuracy, and timeliness. Conclusion The proposed model can guide real-world EMR data governance, improve its quality management, and promote the generation of real-world evidence.

Citation: DUAN Yifan, TANG Mingkun, SUN Haixia, HAO Jie, WANG Jiayang, ZHOU Jiayin, LI Jiao, QIAN Qing. Exploring data quality for machine learning-based disease risk predictions with electronic medical records. Chinese Journal of Evidence-Based Medicine, 2023, 23(9): 1072-1080. doi: 10.7507/1672-2531.202301076 Copy

Copyright © the editorial department of Chinese Journal of Evidence-Based Medicine of West China Medical Publisher. All rights reserved

  • Previous Article

    Evidence-based construction of transparency evaluation tool 2.0 of clinical practice guidelines
  • Next Article

    Data transformation method of real world study on traditional Chinese medicine