Mandatory Fields

Authors

Haithem Afli, Feiyan Hu, Jinhua Du, Daniel Cosgrove, Kevin McGuinness, Noel E. O'Connor, Eric Arazo Sanchez, Jiang Zhou, Alan F. Smeaton.

Conference Title

TRECVID2017 - TREC VIDEO RETRIEVAL EVALUATION

Title of Paper

Dublin City University Participation in the VTT Track at TRECVid 2017

Year

2017

Month

November

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Editors

Start Page

End Page

Location

Start Date

End Date

Abstract

Dublin City University participated in the video-to-text caption generation task in TRECVid and this paper describes the three approaches we took for our 4 submitted runs. The first approach is based on extracting regularly-spaced keyframes from a video, generating a text caption for each keyframe and then combining the keyframe captions into a single caption. The second approach is based on detecting image crops from those keyframes using saliency map to include as much of the attractive part of the image as possible, generating a caption for each crop in each keyframe, and combining the captions into one. The third approach is an end-to-end system, a true deep learning submission based on MS-COCO, an externally available set of training captions. The paper presents a description and the official results of each of the approaches.

Funded By

URL

https://www-nlpir.nist.gov/projects/tvpubs/tv17.papers/dcuinsight.pdf

DOI Link

Grant Details

Funding Body

Grant Details