论文浏览

【论文题目】Complex event detection via attention-based video representation and classification

【作    者】Zhicheng Zhao, Rui Xiang, Fei Su        点击下载PDF全文

【关 键 字】Multimedia event detection, Visual attention, Salient object, Vlad

【发表刊物/会议】
    Multimedia Tools and Applications

【摘    要】
     As an important task in managing unconstrained web videos, multimedia event detection (MED) has attracted wide attention recently. However, due to the complexities such as high abstraction of the events, various scenes and frequent interactions of individuals etc., MED is quite challenging. In this paper, we propose a novel MED algorithm via attention-based video representation and classification. Firstly, inspired by human’s selective attention mechanism, an attention-based saliency localization network (ASLN) is constructed to quickly predict the semantic saliency objects of video frames. Afterwards, in order to complementarily represent salient objects and the surroundings, two Convolutional Neural Networks (CNNs) features, i.e., local saliency feature and global feature are respectively extracted from the salient objects and the whole feature map. Thirdly, after binding two features together, Vector of Locally Aggregated Descriptors (VLAD) is applied to encode them into the video representation. Finally, the linear Support Vector Machine (SVM) classifiers are trained to classify. We extensively evaluate the performance on TRECVID MED14_10Ex, MED14_100Ex and Columbia Consume Video (CCV) datasets. Experimental results show that the proposed single model outperforms state-of-the-art approaches on all three real-world video datasets, and demonstrate the effectiveness.

【发 表 年】2017

【发 表 月】8

【类    别】计算机视觉


Tel: 086-010-62283118 邮编:100876
地址:北京市海淀区西土城路10号北京邮电大学教二楼多媒体中心
北京市海淀区西土城路十号113#信箱
版权所有:北京邮电大学多媒体通信与模式识别研究室 京ICP证14002347号