Mandatory Fields

Authors

Haithem Afli, Pintu Lohar and Andy Way

Conference Title

IJCNLP 2017 Workshop on Curation and Applications of Parallel and Comparable Corpora (Cupral 2017)

Title of Paper

MultiNews: A Web collection of an Aligned Multimodal and Multilingual Corpus

Year

2017

Month

November

Status

Published

Peer Reviewed

Times Cited

()

Optional Fields

Search Keyword

Editors

Start Page

End Page

Location

Taipei, Taiwan

Start Date

27-NOV-17

End Date

01-DEC-17

Abstract

Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, the applicability of these methods directly depends on the availability of a specific multimodal data that includes images and texts. In this paper, we present a collection of a Multimodal corpus of comparable document and their images in 9 languages from the web news articles of Euronews website.1 This corpus has found widespread use in the NLP community in Multilingual and multimodal tasks. Here, we focus on its acquisition of the images and text data and their multilingual alignment.

Funded By

URL

http://aclweb.org/anthology/W17-5602

DOI Link

Grant Details

Funding Body

Science Foundation Ireland (SFI)

Grant Details

13/RC/2106