A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data

Mahtab Fard, Ping Wang, Sanjay Chawla, Chandan Reddy

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Predicting event occurrence at early stage in longitudinal studies is an important and challenging problem which has high practical value in real applications. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. Survival analysis aims at directly predicting the time to an event of interest using the data collected in the past for a certain duration. However, it cannot give an answer to the open question of 'how to forecast whether a subject will experience an event by end of the study using event occurrence information of other subjects at the early stage of such a longitudinal study?'. The goal of this work is to predict for which of the subjects in the study, an event will occur in the future based on the fewer events information that occurred at the initial stages of a longitudinal study. This problem exhibits two major challenges: (1) absence of complete information about event occurrence (censoring) and (2) availability of only a partial set of events that occurred during the initial phase of the study. We propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. First, we develop a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. We then extend the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and develop three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. More specifically, our approach effectively integrates Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework in average is 20% more accurate compared to existing schemes using only a limited amount of training data compared to the other alternative prediction methods.

Original languageEnglish
Article number7564399
JournalIEEE Transactions on Knowledge and Data Engineering
VolumePP
Issue number99
DOIs
Publication statusPublished - 2016
Externally publishedYes

Fingerprint

Data handling
Bayesian networks
Labels
Availability
Experiments

Keywords

  • Bayesian network
  • early stage prediction
  • event data
  • Naive Bayes
  • regression
  • survival analysis

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data. / Fard, Mahtab; Wang, Ping; Chawla, Sanjay; Reddy, Chandan.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. PP, No. 99, 7564399, 2016.

Research output: Contribution to journalArticle

@article{25a2934985af454fbe2aa0efca8116e9,
title = "A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data",
abstract = "Predicting event occurrence at early stage in longitudinal studies is an important and challenging problem which has high practical value in real applications. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. Survival analysis aims at directly predicting the time to an event of interest using the data collected in the past for a certain duration. However, it cannot give an answer to the open question of 'how to forecast whether a subject will experience an event by end of the study using event occurrence information of other subjects at the early stage of such a longitudinal study?'. The goal of this work is to predict for which of the subjects in the study, an event will occur in the future based on the fewer events information that occurred at the initial stages of a longitudinal study. This problem exhibits two major challenges: (1) absence of complete information about event occurrence (censoring) and (2) availability of only a partial set of events that occurred during the initial phase of the study. We propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. First, we develop a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. We then extend the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and develop three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. More specifically, our approach effectively integrates Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework in average is 20{\%} more accurate compared to existing schemes using only a limited amount of training data compared to the other alternative prediction methods.",
keywords = "Bayesian network, early stage prediction, event data, Naive Bayes, regression, survival analysis",
author = "Mahtab Fard and Ping Wang and Sanjay Chawla and Chandan Reddy",
year = "2016",
doi = "10.1109/TKDE.2016.2608347",
language = "English",
volume = "PP",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "99",

}

TY - JOUR

T1 - A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data

AU - Fard, Mahtab

AU - Wang, Ping

AU - Chawla, Sanjay

AU - Reddy, Chandan

PY - 2016

Y1 - 2016

N2 - Predicting event occurrence at early stage in longitudinal studies is an important and challenging problem which has high practical value in real applications. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. Survival analysis aims at directly predicting the time to an event of interest using the data collected in the past for a certain duration. However, it cannot give an answer to the open question of 'how to forecast whether a subject will experience an event by end of the study using event occurrence information of other subjects at the early stage of such a longitudinal study?'. The goal of this work is to predict for which of the subjects in the study, an event will occur in the future based on the fewer events information that occurred at the initial stages of a longitudinal study. This problem exhibits two major challenges: (1) absence of complete information about event occurrence (censoring) and (2) availability of only a partial set of events that occurred during the initial phase of the study. We propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. First, we develop a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. We then extend the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and develop three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. More specifically, our approach effectively integrates Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework in average is 20% more accurate compared to existing schemes using only a limited amount of training data compared to the other alternative prediction methods.

AB - Predicting event occurrence at early stage in longitudinal studies is an important and challenging problem which has high practical value in real applications. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. Survival analysis aims at directly predicting the time to an event of interest using the data collected in the past for a certain duration. However, it cannot give an answer to the open question of 'how to forecast whether a subject will experience an event by end of the study using event occurrence information of other subjects at the early stage of such a longitudinal study?'. The goal of this work is to predict for which of the subjects in the study, an event will occur in the future based on the fewer events information that occurred at the initial stages of a longitudinal study. This problem exhibits two major challenges: (1) absence of complete information about event occurrence (censoring) and (2) availability of only a partial set of events that occurred during the initial phase of the study. We propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. First, we develop a novel approach to address the first challenge by introducing a new method for handling censored data using Kaplan-Meier estimator. We then extend the Naive Bayes, Tree-Augmented Naive Bayes (TAN) and Bayesian Network methods based on the proposed framework, and develop three algorithms, namely, ESP-NB, ESP-TAN and ESP-BN, to effectively predict event occurrence using the training data obtained at early stage of the study. More specifically, our approach effectively integrates Bayesian methods with an Accelerated Failure Time (AFT) model by adapting the prior probability of the event occurrence for future time points. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework in average is 20% more accurate compared to existing schemes using only a limited amount of training data compared to the other alternative prediction methods.

KW - Bayesian network

KW - early stage prediction

KW - event data

KW - Naive Bayes

KW - regression

KW - survival analysis

UR - http://www.scopus.com/inward/record.url?scp=84992066420&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992066420&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2016.2608347

DO - 10.1109/TKDE.2016.2608347

M3 - Article

VL - PP

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 99

M1 - 7564399

ER -