Regrocket: Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks

Ibrahim Sabek, Mashaal Musleh, Mohamed F. Mokbel

Research output: Contribution to journalArticle

Abstract

Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

Original languageEnglish
Article number27
JournalACM Transactions on Spatial Algorithms and Systems
Volume5
Issue number4
DOIs
Publication statusPublished - Dec 2019

Fingerprint

Categorical variable
Unordered
Regression
Logic
Prediction
Multinomial Model
Predictors
Grid
Earth Observation
Data Grid
Statistical Learning
Binary Variables
Scale-up
Satellite Images
Cell
Spatial Data
First-order Logic
Categorical
Predicate
Divides

Keywords

  • Autologistic models
  • Factor graph
  • First-order logic
  • Markov logic networks
  • Multinomial spatial regression

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems
  • Modelling and Simulation
  • Computer Science Applications
  • Geometry and Topology
  • Discrete Mathematics and Combinatorics

Cite this

@article{e740b9707b1d4b0cb4076f80c4a74e75,
title = "Regrocket: Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks",
abstract = "Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.",
keywords = "Autologistic models, Factor graph, First-order logic, Markov logic networks, Multinomial spatial regression",
author = "Ibrahim Sabek and Mashaal Musleh and Mokbel, {Mohamed F.}",
year = "2019",
month = "12",
doi = "10.1145/3366459",
language = "English",
volume = "5",
journal = "ACM Transactions on Spatial Algorithms and Systems",
issn = "2374-0353",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Regrocket

T2 - Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks

AU - Sabek, Ibrahim

AU - Musleh, Mashaal

AU - Mokbel, Mohamed F.

PY - 2019/12

Y1 - 2019/12

N2 - Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

AB - Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

KW - Autologistic models

KW - Factor graph

KW - First-order logic

KW - Markov logic networks

KW - Multinomial spatial regression

UR - http://www.scopus.com/inward/record.url?scp=85077793061&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077793061&partnerID=8YFLogxK

U2 - 10.1145/3366459

DO - 10.1145/3366459

M3 - Article

AN - SCOPUS:85077793061

VL - 5

JO - ACM Transactions on Spatial Algorithms and Systems

JF - ACM Transactions on Spatial Algorithms and Systems

SN - 2374-0353

IS - 4

M1 - 27

ER -