Abstract Data sparsity is a common problem for machine translation of minority and less-
resourced languages. While data collection for standard, grammatical text can be
challenging enough, efforts for collection of parallel user-generated content can be even
more challenging. In this paper we describe an approach to collecting English↔ Irish
translations of user-generated content (tweets) that overcomes some of these hurdles. We
show how a crowd-sourced data collection campaign, which was tailored to our target
audience (the Irish language community), proved successful in gathering data for a niche
domain. We also discuss the reliablity of crowdsourcing English↔ Irish tweet translations in
terms of quality by reporting on a self-rating approach along with qualified reviewer ratings.