Hierarchical video content description and summarization using unified semantic and visual similarity

Xingquan Zhu, Jianping Fan, Ahmed Elmagarmid, Xindong Wu

Research output: Contribution to journalArticle

52 Citations (Scopus)

Abstract

Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

Original languageEnglish
Pages (from-to)31-53
Number of pages23
JournalMultimedia Systems
Volume9
Issue number1
Publication statusPublished - 1 Jul 2003
Externally publishedYes

Fingerprint

Summarization
Semantics
Audio signal processing
Data description
Multimedia systems
Merging
Ontology
Unfolding
Unit
Processing
Video Processing
Multimedia Systems
Vision
Similarity
Communication Channels
Browsing
Granularity
Experiments
Annotation
Signal Processing

Keywords

  • Content description
  • Hierarchical video summarization
  • Semi-automatic video annotation
  • Video grouping

ASJC Scopus subject areas

  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Hierarchical video content description and summarization using unified semantic and visual similarity. / Zhu, Xingquan; Fan, Jianping; Elmagarmid, Ahmed; Wu, Xindong.

In: Multimedia Systems, Vol. 9, No. 1, 01.07.2003, p. 31-53.

Research output: Contribution to journalArticle

@article{1ca5e064b0a04688bce57b09b79d5c59,
title = "Hierarchical video content description and summarization using unified semantic and visual similarity",
abstract = "Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.",
keywords = "Content description, Hierarchical video summarization, Semi-automatic video annotation, Video grouping",
author = "Xingquan Zhu and Jianping Fan and Ahmed Elmagarmid and Xindong Wu",
year = "2003",
month = "7",
day = "1",
language = "English",
volume = "9",
pages = "31--53",
journal = "Multimedia Systems",
issn = "0942-4962",
publisher = "Springer Verlag",
number = "1",

}

TY - JOUR

T1 - Hierarchical video content description and summarization using unified semantic and visual similarity

AU - Zhu, Xingquan

AU - Fan, Jianping

AU - Elmagarmid, Ahmed

AU - Wu, Xindong

PY - 2003/7/1

Y1 - 2003/7/1

N2 - Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

AB - Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

KW - Content description

KW - Hierarchical video summarization

KW - Semi-automatic video annotation

KW - Video grouping

UR - http://www.scopus.com/inward/record.url?scp=1442333048&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1442333048&partnerID=8YFLogxK

M3 - Article

VL - 9

SP - 31

EP - 53

JO - Multimedia Systems

JF - Multimedia Systems

SN - 0942-4962

IS - 1

ER -