In translation, considering the document
as a whole can help to resolve ambiguities
and inconsistencies. In this paper, we propose
a cross-sentence context-aware approach
and investigate the influence of historical
contextual information on the performance
of neural machine translation
(NMT). First, this history is summarized
in a hierarchical way. We then integrate
the historical representation into NMT in
two strategies: 1) a warm-start of encoder
and decoder states, and 2) an auxiliary
context source for updating decoder
states. Experimental results on a large
Chinese-English translation task show that
our approach significantly improves upon
a strong attention-based NMT system by
up to +2.1 BLEU points.