Optimizing non-decomposable measures with deep networks

Amartya Sanyal, Pawan Kumar, Purushottam Kar, Sanjay Chawla, Fabrizio Sebastiani

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.

Original languageEnglish
Pages (from-to)1-24
Number of pages24
JournalMachine Learning
DOIs
Publication statusAccepted/In press - 2 Jul 2018

Fingerprint

Labels
Recurrent neural networks
Multilayer neural networks
Entropy
Neural networks
Deep neural networks
Deep learning

Keywords

  • Deep learning
  • F-measure
  • Optimization
  • Task-specific training

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Optimizing non-decomposable measures with deep networks. / Sanyal, Amartya; Kumar, Pawan; Kar, Purushottam; Chawla, Sanjay; Sebastiani, Fabrizio.

In: Machine Learning, 02.07.2018, p. 1-24.

Research output: Contribution to journalArticle

Sanyal, Amartya ; Kumar, Pawan ; Kar, Purushottam ; Chawla, Sanjay ; Sebastiani, Fabrizio. / Optimizing non-decomposable measures with deep networks. In: Machine Learning. 2018 ; pp. 1-24.
@article{eacee7639f154503bc1efcabcae0b53b,
title = "Optimizing non-decomposable measures with deep networks",
abstract = "We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.",
keywords = "Deep learning, F-measure, Optimization, Task-specific training",
author = "Amartya Sanyal and Pawan Kumar and Purushottam Kar and Sanjay Chawla and Fabrizio Sebastiani",
year = "2018",
month = "7",
day = "2",
doi = "10.1007/s10994-018-5736-y",
language = "English",
pages = "1--24",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Optimizing non-decomposable measures with deep networks

AU - Sanyal, Amartya

AU - Kumar, Pawan

AU - Kar, Purushottam

AU - Chawla, Sanjay

AU - Sebastiani, Fabrizio

PY - 2018/7/2

Y1 - 2018/7/2

N2 - We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.

AB - We present a class of algorithms capable of directly training deep neural networks with respect to popular families of task-specific performance measures for binary classification such as the F-measure, QMean and the Kullback–Leibler divergence that are structured and non-decomposable. Our goal is to address tasks such as label-imbalanced learning and quantification. Our techniques present a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations offer several advantages including (i) the use of fewer training samples to achieve a desired level of convergence, (ii) a substantial reduction in training time, (iii) a seamless integration of our implementation into existing symbolic gradient frameworks, and (iv) assurance of convergence to first order stationary points. It is noteworthy that the algorithms achieve this, especially point (iv), despite being asked to optimize complex objective functions. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as popular techniques used to handle label imbalance.

KW - Deep learning

KW - F-measure

KW - Optimization

KW - Task-specific training

UR - http://www.scopus.com/inward/record.url?scp=85049596317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049596317&partnerID=8YFLogxK

U2 - 10.1007/s10994-018-5736-y

DO - 10.1007/s10994-018-5736-y

M3 - Article

SP - 1

EP - 24

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

ER -