Impact of density of lab data in EHR for prediction of potentially preventable events

Chandrima Sarkar, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents an analysis of sparse and incomplete Electronic Health Record (EHR) data for the prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Machine learning techniques have made the task of PPE detection less difficult. However, it is still a challenging task due to the sparse and incomplete nature of the EHR data. It is therefore important to investigate the factors that impact the prediction of PPE in EHR data. In this paper we define the term density for evaluating sparse and incomplete nature of the EHR data set. We analyze three important factors that impacts PPE prediction in sparse and incomplete EHR data. These factors are - 1) Effect of varying domain information in the patient records on PPE prediction, 2) Impact of a popular data mining pre-processing technique known as rank aggregation based feature selection on PPE prediction, and 3) Effect of ensemble classification on prediction of PPE. The results of the analysis indicate that the rank aggregation based feature selection technique and ensemble classification improves classification accuracy by approximately 3-4\% despite of the sparse and incomplete nature of the data. However, eliminating patient records with less domain information, in order to reduce incompleteness in the data, does not cause an enhancement in the classification accuracy. We conclude that feature selection and ensemble classification techniques are important factors that affect classification accuracy even in sparse and incomplete data sets. We conclude as well that randomly decreasing domain information by varying lab values does not assist in increasing accuracy for the prediction of PPE.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013
Pages529-534
Number of pages6
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 1st IEEE International Conference on Healthcare Informatics, ICHI 2013 - Philadelphia, PA
Duration: 9 Sep 201311 Sep 2013

Other

Other2013 1st IEEE International Conference on Healthcare Informatics, ICHI 2013
CityPhiladelphia, PA
Period9/9/1311/9/13

Fingerprint

Electronic Health Records
Data Mining
Hospital Emergency Service

Keywords

  • Domain information
  • Ensemble classification
  • Feature selection
  • Potentially preventable events
  • Sparse and incomplete data

ASJC Scopus subject areas

  • Health Informatics

Cite this

Sarkar, C., & Srivastava, J. (2013). Impact of density of lab data in EHR for prediction of potentially preventable events. In Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013 (pp. 529-534). [6680530] https://doi.org/10.1109/ICHI.2013.82

Impact of density of lab data in EHR for prediction of potentially preventable events. / Sarkar, Chandrima; Srivastava, Jaideep.

Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013. 2013. p. 529-534 6680530.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sarkar, C & Srivastava, J 2013, Impact of density of lab data in EHR for prediction of potentially preventable events. in Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013., 6680530, pp. 529-534, 2013 1st IEEE International Conference on Healthcare Informatics, ICHI 2013, Philadelphia, PA, 9/9/13. https://doi.org/10.1109/ICHI.2013.82
Sarkar C, Srivastava J. Impact of density of lab data in EHR for prediction of potentially preventable events. In Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013. 2013. p. 529-534. 6680530 https://doi.org/10.1109/ICHI.2013.82
Sarkar, Chandrima ; Srivastava, Jaideep. / Impact of density of lab data in EHR for prediction of potentially preventable events. Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013. 2013. pp. 529-534
@inproceedings{849d6bc5cd2f46fe9ec9f07119567480,
title = "Impact of density of lab data in EHR for prediction of potentially preventable events",
abstract = "This paper presents an analysis of sparse and incomplete Electronic Health Record (EHR) data for the prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Machine learning techniques have made the task of PPE detection less difficult. However, it is still a challenging task due to the sparse and incomplete nature of the EHR data. It is therefore important to investigate the factors that impact the prediction of PPE in EHR data. In this paper we define the term density for evaluating sparse and incomplete nature of the EHR data set. We analyze three important factors that impacts PPE prediction in sparse and incomplete EHR data. These factors are - 1) Effect of varying domain information in the patient records on PPE prediction, 2) Impact of a popular data mining pre-processing technique known as rank aggregation based feature selection on PPE prediction, and 3) Effect of ensemble classification on prediction of PPE. The results of the analysis indicate that the rank aggregation based feature selection technique and ensemble classification improves classification accuracy by approximately 3-4\{\%} despite of the sparse and incomplete nature of the data. However, eliminating patient records with less domain information, in order to reduce incompleteness in the data, does not cause an enhancement in the classification accuracy. We conclude that feature selection and ensemble classification techniques are important factors that affect classification accuracy even in sparse and incomplete data sets. We conclude as well that randomly decreasing domain information by varying lab values does not assist in increasing accuracy for the prediction of PPE.",
keywords = "Domain information, Ensemble classification, Feature selection, Potentially preventable events, Sparse and incomplete data",
author = "Chandrima Sarkar and Jaideep Srivastava",
year = "2013",
doi = "10.1109/ICHI.2013.82",
language = "English",
isbn = "9780769550893",
pages = "529--534",
booktitle = "Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013",

}

TY - GEN

T1 - Impact of density of lab data in EHR for prediction of potentially preventable events

AU - Sarkar, Chandrima

AU - Srivastava, Jaideep

PY - 2013

Y1 - 2013

N2 - This paper presents an analysis of sparse and incomplete Electronic Health Record (EHR) data for the prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Machine learning techniques have made the task of PPE detection less difficult. However, it is still a challenging task due to the sparse and incomplete nature of the EHR data. It is therefore important to investigate the factors that impact the prediction of PPE in EHR data. In this paper we define the term density for evaluating sparse and incomplete nature of the EHR data set. We analyze three important factors that impacts PPE prediction in sparse and incomplete EHR data. These factors are - 1) Effect of varying domain information in the patient records on PPE prediction, 2) Impact of a popular data mining pre-processing technique known as rank aggregation based feature selection on PPE prediction, and 3) Effect of ensemble classification on prediction of PPE. The results of the analysis indicate that the rank aggregation based feature selection technique and ensemble classification improves classification accuracy by approximately 3-4\% despite of the sparse and incomplete nature of the data. However, eliminating patient records with less domain information, in order to reduce incompleteness in the data, does not cause an enhancement in the classification accuracy. We conclude that feature selection and ensemble classification techniques are important factors that affect classification accuracy even in sparse and incomplete data sets. We conclude as well that randomly decreasing domain information by varying lab values does not assist in increasing accuracy for the prediction of PPE.

AB - This paper presents an analysis of sparse and incomplete Electronic Health Record (EHR) data for the prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Machine learning techniques have made the task of PPE detection less difficult. However, it is still a challenging task due to the sparse and incomplete nature of the EHR data. It is therefore important to investigate the factors that impact the prediction of PPE in EHR data. In this paper we define the term density for evaluating sparse and incomplete nature of the EHR data set. We analyze three important factors that impacts PPE prediction in sparse and incomplete EHR data. These factors are - 1) Effect of varying domain information in the patient records on PPE prediction, 2) Impact of a popular data mining pre-processing technique known as rank aggregation based feature selection on PPE prediction, and 3) Effect of ensemble classification on prediction of PPE. The results of the analysis indicate that the rank aggregation based feature selection technique and ensemble classification improves classification accuracy by approximately 3-4\% despite of the sparse and incomplete nature of the data. However, eliminating patient records with less domain information, in order to reduce incompleteness in the data, does not cause an enhancement in the classification accuracy. We conclude that feature selection and ensemble classification techniques are important factors that affect classification accuracy even in sparse and incomplete data sets. We conclude as well that randomly decreasing domain information by varying lab values does not assist in increasing accuracy for the prediction of PPE.

KW - Domain information

KW - Ensemble classification

KW - Feature selection

KW - Potentially preventable events

KW - Sparse and incomplete data

UR - http://www.scopus.com/inward/record.url?scp=84893456386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893456386&partnerID=8YFLogxK

U2 - 10.1109/ICHI.2013.82

DO - 10.1109/ICHI.2013.82

M3 - Conference contribution

AN - SCOPUS:84893456386

SN - 9780769550893

SP - 529

EP - 534

BT - Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013

ER -