Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition

Mohamad Hasan Bahari, Najim Dehak, Hugo Van Hamme, Lukas Burget, Ahmed Ali, Jim Glass

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complimentary, information to GMM means for language and dialect recognition. However, state-of-the-art language recognition systems usually do not use this information. In this research, a non-negative factor analysis (NFA) approach is developed for GMM weight decomposition and adaptation. This modeling, which is conceptually simple and computationally inexpensive, suggests a new low-dimensional utterance representation method using a factor analysis similar to that of the i-vector framework. The obtained subspace vectors are then applied in conjunction with i-vectors to the language/dialect recognition problem. The suggested approach is evaluated on the NIST 2011 and RATS language recognition evaluation (LRE) corpora and on the QCRI Arabic dialect recognition evaluation (DRE) corpus. The assessment results show that the proposed adaptation method yields more accurate recognition results compared to three conventional weight adaptation approaches, namely maximum likelihood re-estimation, non-negative matrix factorization, and a subspace multinomial model. Experimental results also show that the intermediate-level fusion of i-vectors and NFA subspace vectors improves the performance of the state-of-the-art i-vector framework especially for the case of short utterances.

Original languageEnglish
Article number2319159
Pages (from-to)1117-1129
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume22
Issue number7
DOIs
Publication statusPublished - 1 Jul 2014

Fingerprint

factor analysis
Factor analysis
evaluation
Information use
Factorization
factorization
Maximum likelihood
Fusion reactions
fusion
Decomposition
decomposition
matrices

Keywords

  • Dialect recognition
  • Gaussian mixture model weight
  • Language recognition
  • Model adaptation
  • Non-negative factor analysis

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition. / Bahari, Mohamad Hasan; Dehak, Najim; Van Hamme, Hugo; Burget, Lukas; Ali, Ahmed; Glass, Jim.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, No. 7, 2319159, 01.07.2014, p. 1117-1129.

Research output: Contribution to journalArticle

Bahari, Mohamad Hasan ; Dehak, Najim ; Van Hamme, Hugo ; Burget, Lukas ; Ali, Ahmed ; Glass, Jim. / Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition. In: IEEE Transactions on Audio, Speech and Language Processing. 2014 ; Vol. 22, No. 7. pp. 1117-1129.
@article{48f48d07486642478e51c38ef18797a0,
title = "Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition",
abstract = "Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complimentary, information to GMM means for language and dialect recognition. However, state-of-the-art language recognition systems usually do not use this information. In this research, a non-negative factor analysis (NFA) approach is developed for GMM weight decomposition and adaptation. This modeling, which is conceptually simple and computationally inexpensive, suggests a new low-dimensional utterance representation method using a factor analysis similar to that of the i-vector framework. The obtained subspace vectors are then applied in conjunction with i-vectors to the language/dialect recognition problem. The suggested approach is evaluated on the NIST 2011 and RATS language recognition evaluation (LRE) corpora and on the QCRI Arabic dialect recognition evaluation (DRE) corpus. The assessment results show that the proposed adaptation method yields more accurate recognition results compared to three conventional weight adaptation approaches, namely maximum likelihood re-estimation, non-negative matrix factorization, and a subspace multinomial model. Experimental results also show that the intermediate-level fusion of i-vectors and NFA subspace vectors improves the performance of the state-of-the-art i-vector framework especially for the case of short utterances.",
keywords = "Dialect recognition, Gaussian mixture model weight, Language recognition, Model adaptation, Non-negative factor analysis",
author = "Bahari, {Mohamad Hasan} and Najim Dehak and {Van Hamme}, Hugo and Lukas Burget and Ahmed Ali and Jim Glass",
year = "2014",
month = "7",
day = "1",
doi = "10.1109/TASLP.2014.2319159",
language = "English",
volume = "22",
pages = "1117--1129",
journal = "IEEE Transactions on Audio, Speech and Language Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "7",

}

TY - JOUR

T1 - Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition

AU - Bahari, Mohamad Hasan

AU - Dehak, Najim

AU - Van Hamme, Hugo

AU - Burget, Lukas

AU - Ali, Ahmed

AU - Glass, Jim

PY - 2014/7/1

Y1 - 2014/7/1

N2 - Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complimentary, information to GMM means for language and dialect recognition. However, state-of-the-art language recognition systems usually do not use this information. In this research, a non-negative factor analysis (NFA) approach is developed for GMM weight decomposition and adaptation. This modeling, which is conceptually simple and computationally inexpensive, suggests a new low-dimensional utterance representation method using a factor analysis similar to that of the i-vector framework. The obtained subspace vectors are then applied in conjunction with i-vectors to the language/dialect recognition problem. The suggested approach is evaluated on the NIST 2011 and RATS language recognition evaluation (LRE) corpora and on the QCRI Arabic dialect recognition evaluation (DRE) corpus. The assessment results show that the proposed adaptation method yields more accurate recognition results compared to three conventional weight adaptation approaches, namely maximum likelihood re-estimation, non-negative matrix factorization, and a subspace multinomial model. Experimental results also show that the intermediate-level fusion of i-vectors and NFA subspace vectors improves the performance of the state-of-the-art i-vector framework especially for the case of short utterances.

AB - Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complimentary, information to GMM means for language and dialect recognition. However, state-of-the-art language recognition systems usually do not use this information. In this research, a non-negative factor analysis (NFA) approach is developed for GMM weight decomposition and adaptation. This modeling, which is conceptually simple and computationally inexpensive, suggests a new low-dimensional utterance representation method using a factor analysis similar to that of the i-vector framework. The obtained subspace vectors are then applied in conjunction with i-vectors to the language/dialect recognition problem. The suggested approach is evaluated on the NIST 2011 and RATS language recognition evaluation (LRE) corpora and on the QCRI Arabic dialect recognition evaluation (DRE) corpus. The assessment results show that the proposed adaptation method yields more accurate recognition results compared to three conventional weight adaptation approaches, namely maximum likelihood re-estimation, non-negative matrix factorization, and a subspace multinomial model. Experimental results also show that the intermediate-level fusion of i-vectors and NFA subspace vectors improves the performance of the state-of-the-art i-vector framework especially for the case of short utterances.

KW - Dialect recognition

KW - Gaussian mixture model weight

KW - Language recognition

KW - Model adaptation

KW - Non-negative factor analysis

UR - http://www.scopus.com/inward/record.url?scp=84904156635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904156635&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2014.2319159

DO - 10.1109/TASLP.2014.2319159

M3 - Article

VL - 22

SP - 1117

EP - 1129

JO - IEEE Transactions on Audio, Speech and Language Processing

JF - IEEE Transactions on Audio, Speech and Language Processing

SN - 1558-7916

IS - 7

M1 - 2319159

ER -