Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

113

ICASSP
2008
IEEE

127views Signal Processing» more ICASSP 2008»

Caption-aided speech detection in videos

15 years 10 months ago

Caption-aided speech detection in videos

Download www.ee.tsinghua.edu.cn

This paper presents a novel audio-visual fusion method for speech detection, which is an important front-end for content-based video processing. This approach aims to extract homogeneous speech segments from the accompanying audio stream in real-world movie/TV videos with the help of video captions. Note that captions are mainly created to help viewers to follow the dialog, rather than to accurately locate the speech regions. We propose a caption-aided speech detection approach, which makes use of both caption information and audio information. The inaccurate positions of the captions are refined through using audio features (pitch and MFCCs) and BIC-based acoustic change detection. Comparison experiments against several other traditional speech detection approaches are conducted, showing that the proposed approach improves the speech detection performance greatly.

Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang

Real-time Traffic

ICASSP 2008 | Signal Processing | Speech Detection | Speech Detection Approach | Speech Detection Approaches |

claim paper

Related Content

» Video retrieval using speech and image information

» Generating a Time Shrunk Lecture Video by Event Detection

» SpeechNonSpeech Detection in Meetings from Automatically Extracted low Resolution Visual F...

» PredictionBased Gesture Detection in Lecture Videos by Combining Visual Speech and Electro...

» Automatic Speech Activity Detection Source Localization and Speech Recognition on the Chil...

» Assembling personal speech collections by monologue scene detection from a news video arch...

» Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detectio...

» Exploiting Speech Recognition Transcripts for Narrative Peak Detection in ShortForm Docume...

» An audiovideo analysis mechanism for web indexing

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICASSP
Authors	Cong Li, Zhijian Ou, Wei Hu, Tao Wang, Yimin Zhang

Comments (0)