Conference Publication Details
Mandatory Fields
Haithem Afli, Feiyan Hu, Jinhua Du, Daniel Cosgrove, Kevin McGuinness, Noel E. O'Connor, Eric Arazo Sanchez, Jiang Zhou, Alan F. Smeaton.
TRECVID2017 - TREC VIDEO RETRIEVAL EVALUATION
Dublin City University Participation in the VTT Track at TRECVid 2017
2017
November
Published
1
()
Optional Fields
Dublin City University participated in the video-to-text caption generation task in TRECVid and this paper describes the three approaches we took for our 4 submitted runs. The first approach is based on extracting regularly-spaced keyframes from a video, generating a text caption for each keyframe and then combining the keyframe captions into a single caption. The second approach is based on detecting image crops from those keyframes using saliency map to include as much of the attractive part of the image as possible, generating a caption for each crop in each keyframe, and combining the captions into one. The third approach is an end-to-end system, a true deep learning submission based on MS-COCO, an externally available set of training captions. The paper presents a description and the official results of each of the approaches.
https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/dcuinsight.pdf
Grant Details