Berkeley MHAD

A comprehensive Multimodal Human Action Database

Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, Rene Vidal, Ruzena Bajcsy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

190 Citations (Scopus)

Abstract

Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.

Original languageEnglish
Title of host publicationProceedings of IEEE Workshop on Applications of Computer Vision
Pages53-60
Number of pages8
DOIs
Publication statusPublished - 4 Apr 2013
Externally publishedYes
Event2013 IEEE Workshop on Applications of Computer Vision, WACV 2013 - Clearwater Beach, FL, United States
Duration: 15 Jan 201317 Jan 2013

Other

Other2013 IEEE Workshop on Applications of Computer Vision, WACV 2013
CountryUnited States
CityClearwater Beach, FL
Period15/1/1317/1/13

Fingerprint

Microphones
Testbeds
Accelerometers
Cameras
Sensors

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive Multimodal Human Action Database. In Proceedings of IEEE Workshop on Applications of Computer Vision (pp. 53-60). [6474999] https://doi.org/10.1109/WACV.2013.6474999

Berkeley MHAD : A comprehensive Multimodal Human Action Database. / Ofli, Ferda; Chaudhry, Rizwan; Kurillo, Gregorij; Vidal, Rene; Bajcsy, Ruzena.

Proceedings of IEEE Workshop on Applications of Computer Vision. 2013. p. 53-60 6474999.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ofli, F, Chaudhry, R, Kurillo, G, Vidal, R & Bajcsy, R 2013, Berkeley MHAD: A comprehensive Multimodal Human Action Database. in Proceedings of IEEE Workshop on Applications of Computer Vision., 6474999, pp. 53-60, 2013 IEEE Workshop on Applications of Computer Vision, WACV 2013, Clearwater Beach, FL, United States, 15/1/13. https://doi.org/10.1109/WACV.2013.6474999
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R. Berkeley MHAD: A comprehensive Multimodal Human Action Database. In Proceedings of IEEE Workshop on Applications of Computer Vision. 2013. p. 53-60. 6474999 https://doi.org/10.1109/WACV.2013.6474999
Ofli, Ferda ; Chaudhry, Rizwan ; Kurillo, Gregorij ; Vidal, Rene ; Bajcsy, Ruzena. / Berkeley MHAD : A comprehensive Multimodal Human Action Database. Proceedings of IEEE Workshop on Applications of Computer Vision. 2013. pp. 53-60
@inproceedings{fe83f16c2b984442a17a445185417461,
title = "Berkeley MHAD: A comprehensive Multimodal Human Action Database",
abstract = "Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.",
author = "Ferda Ofli and Rizwan Chaudhry and Gregorij Kurillo and Rene Vidal and Ruzena Bajcsy",
year = "2013",
month = "4",
day = "4",
doi = "10.1109/WACV.2013.6474999",
language = "English",
isbn = "9781467350532",
pages = "53--60",
booktitle = "Proceedings of IEEE Workshop on Applications of Computer Vision",

}

TY - GEN

T1 - Berkeley MHAD

T2 - A comprehensive Multimodal Human Action Database

AU - Ofli, Ferda

AU - Chaudhry, Rizwan

AU - Kurillo, Gregorij

AU - Vidal, Rene

AU - Bajcsy, Ruzena

PY - 2013/4/4

Y1 - 2013/4/4

N2 - Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.

AB - Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.

UR - http://www.scopus.com/inward/record.url?scp=84875595728&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875595728&partnerID=8YFLogxK

U2 - 10.1109/WACV.2013.6474999

DO - 10.1109/WACV.2013.6474999

M3 - Conference contribution

SN - 9781467350532

SP - 53

EP - 60

BT - Proceedings of IEEE Workshop on Applications of Computer Vision

ER -