Penalized regression combining the L1 norm and a correlation based penalty

Mohammed El Anbari, Abdallah Mkhadri

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L1 and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p ≫ n, our method remains competitive and also allows the selection of more than n variables.

Original languageEnglish
Pages (from-to)82-102
Number of pages21
JournalSankhya: The Indian Journal of Statistics
Volume76B
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Penalized Regression
L1-norm
Penalty
Predictors
Elastic Net
Penalized Least Squares
Lasso
Penalization
Penalty Function
Estimation Error
Linear Regression Model
Grouping
Feature Selection
Empirical Study
Covariates
Sample Size
High-dimensional
Simulation Study
Tend
Norm

Keywords

  • Correlation based penalty
  • Elastic-net
  • Lasso
  • Regression
  • Regularization
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Penalized regression combining the L1 norm and a correlation based penalty. / El Anbari, Mohammed; Mkhadri, Abdallah.

In: Sankhya: The Indian Journal of Statistics, Vol. 76B, 2014, p. 82-102.

Research output: Contribution to journalArticle

@article{e9c7bde5ef8b4ccebac319a895a5b636,
title = "Penalized regression combining the L1 norm and a correlation based penalty",
abstract = "We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L1 and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p ≫ n, our method remains competitive and also allows the selection of more than n variables.",
keywords = "Correlation based penalty, Elastic-net, Lasso, Regression, Regularization, Variable selection",
author = "{El Anbari}, Mohammed and Abdallah Mkhadri",
year = "2014",
language = "English",
volume = "76B",
pages = "82--102",
journal = "Sankhya: The Indian Journal of Statistics",
issn = "0972-7671",
publisher = "Indian Statistical Institute",

}

TY - JOUR

T1 - Penalized regression combining the L1 norm and a correlation based penalty

AU - El Anbari, Mohammed

AU - Mkhadri, Abdallah

PY - 2014

Y1 - 2014

N2 - We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L1 and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p ≫ n, our method remains competitive and also allows the selection of more than n variables.

AB - We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L1 and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p ≫ n, our method remains competitive and also allows the selection of more than n variables.

KW - Correlation based penalty

KW - Elastic-net

KW - Lasso

KW - Regression

KW - Regularization

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84969170232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969170232&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84969170232

VL - 76B

SP - 82

EP - 102

JO - Sankhya: The Indian Journal of Statistics

JF - Sankhya: The Indian Journal of Statistics

SN - 0972-7671

ER -