### Abstract

Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

Original language | English |
---|---|

Article number | 27 |

Journal | ACM Transactions on Spatial Algorithms and Systems |

Volume | 5 |

Issue number | 4 |

DOIs | |

Publication status | Published - Dec 2019 |

### Fingerprint

### Keywords

- Autologistic models
- Factor graph
- First-order logic
- Markov logic networks
- Multinomial spatial regression

### ASJC Scopus subject areas

- Signal Processing
- Information Systems
- Modelling and Simulation
- Computer Science Applications
- Geometry and Topology
- Discrete Mathematics and Combinatorics

### Cite this

**Regrocket : Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks.** / Sabek, Ibrahim; Musleh, Mashaal; Mokbel, Mohamed F.

Research output: Contribution to journal › Article

}

TY - JOUR

T1 - Regrocket

T2 - Scalable multinomial autologistic regression with unordered categorical variables using Markov logic networks

AU - Sabek, Ibrahim

AU - Musleh, Mashaal

AU - Mokbel, Mohamed F.

PY - 2019/12

Y1 - 2019/12

N2 - Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

AB - Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is performed at each cell in the grid. The prediction at any location is based on a set of predictors (i.e., features) at this location and predictions from neighboring locations. In this article, we address the problem of building efficient autologistic models with multinomial (i.e., categorical) prediction and predictor variables, where the categories represented by these variables are unordered. Unfortunately, existing methods to build autologistic models are designed for binary variables in addition to being computationally expensive (i.e., do not scale up for large-scale grid data such as fine-grained satellite images). Therefore, we introduce RegRocket: a scalable framework to build multinomial autologistic models for predicting large-scale spatial phenomena. RegRocket considers both the accuracy and efficiency aspects when learning the regression model parameters. To this end, RegRocket is built on top of Markov Logic Network (MLN), a scalable statistical learning framework, where its internals and data structures are optimized to process spatial data. RegRocket provides an equivalent representation of the multinomial prediction and predictor variables using MLN where the dependencies between these variables are transformed into first-order logic predicates. Then, RegRocket employs an efficient framework that learns the model parameters from the MLN representation in a distributed manner. Extensive experimental results based on two large real datasets show that RegRocket can build multinomial autologistic models, in minutes, for 1 million grid cells with 0.85 average F1-score.

KW - Autologistic models

KW - Factor graph

KW - First-order logic

KW - Markov logic networks

KW - Multinomial spatial regression

UR - http://www.scopus.com/inward/record.url?scp=85077793061&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077793061&partnerID=8YFLogxK

U2 - 10.1145/3366459

DO - 10.1145/3366459

M3 - Article

AN - SCOPUS:85077793061

VL - 5

JO - ACM Transactions on Spatial Algorithms and Systems

JF - ACM Transactions on Spatial Algorithms and Systems

SN - 2374-0353

IS - 4

M1 - 27

ER -