Combined gesture-speech analysis and speech driven gesture synthesis

M. E. Sargin, O. Aran, A. Karpov, Ferda Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, A. M. Tekalp

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying "yes/no". In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.

Original languageEnglish
Title of host publication2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
Pages893-896
Number of pages4
Volume2006
DOIs
Publication statusPublished - 1 Dec 2006
Externally publishedYes
Event2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Toronto, ON, Canada
Duration: 9 Jul 200612 Jul 2006

Other

Other2006 IEEE International Conference on Multimedia and Expo, ICME 2006
CountryCanada
CityToronto, ON
Period9/7/0612/7/06

Fingerprint

Speech analysis
Animation
Signal analysis
Electric fuses
Linguistics
Demonstrations

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Cite this

Sargin, M. E., Aran, O., Karpov, A., Ofli, F., Yasinnik, Y., Wilson, S., ... Tekalp, A. M. (2006). Combined gesture-speech analysis and speech driven gesture synthesis. In 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings (Vol. 2006, pp. 893-896). [4036744] https://doi.org/10.1109/ICME.2006.262663

Combined gesture-speech analysis and speech driven gesture synthesis. / Sargin, M. E.; Aran, O.; Karpov, A.; Ofli, Ferda; Yasinnik, Y.; Wilson, S.; Erzin, E.; Yemez, Y.; Tekalp, A. M.

2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings. Vol. 2006 2006. p. 893-896 4036744.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sargin, ME, Aran, O, Karpov, A, Ofli, F, Yasinnik, Y, Wilson, S, Erzin, E, Yemez, Y & Tekalp, AM 2006, Combined gesture-speech analysis and speech driven gesture synthesis. in 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings. vol. 2006, 4036744, pp. 893-896, 2006 IEEE International Conference on Multimedia and Expo, ICME 2006, Toronto, ON, Canada, 9/7/06. https://doi.org/10.1109/ICME.2006.262663
Sargin ME, Aran O, Karpov A, Ofli F, Yasinnik Y, Wilson S et al. Combined gesture-speech analysis and speech driven gesture synthesis. In 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings. Vol. 2006. 2006. p. 893-896. 4036744 https://doi.org/10.1109/ICME.2006.262663
Sargin, M. E. ; Aran, O. ; Karpov, A. ; Ofli, Ferda ; Yasinnik, Y. ; Wilson, S. ; Erzin, E. ; Yemez, Y. ; Tekalp, A. M. / Combined gesture-speech analysis and speech driven gesture synthesis. 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings. Vol. 2006 2006. pp. 893-896
@inproceedings{f468b48e4b4a4d9698a170eb90d32c91,
title = "Combined gesture-speech analysis and speech driven gesture synthesis",
abstract = "Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying {"}yes/no{"}. In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.",
author = "Sargin, {M. E.} and O. Aran and A. Karpov and Ferda Ofli and Y. Yasinnik and S. Wilson and E. Erzin and Y. Yemez and Tekalp, {A. M.}",
year = "2006",
month = "12",
day = "1",
doi = "10.1109/ICME.2006.262663",
language = "English",
isbn = "1424403677",
volume = "2006",
pages = "893--896",
booktitle = "2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings",

}

TY - GEN

T1 - Combined gesture-speech analysis and speech driven gesture synthesis

AU - Sargin, M. E.

AU - Aran, O.

AU - Karpov, A.

AU - Ofli, Ferda

AU - Yasinnik, Y.

AU - Wilson, S.

AU - Erzin, E.

AU - Yemez, Y.

AU - Tekalp, A. M.

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying "yes/no". In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.

AB - Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying "yes/no". In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.

UR - http://www.scopus.com/inward/record.url?scp=34247646607&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247646607&partnerID=8YFLogxK

U2 - 10.1109/ICME.2006.262663

DO - 10.1109/ICME.2006.262663

M3 - Conference contribution

SN - 1424403677

SN - 9781424403677

VL - 2006

SP - 893

EP - 896

BT - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings

ER -