The use of machine translation (MT) has become widespread since statistical machine translation (SMT) became the dominant paradigm. However, there is growing interest in the research community in the possibilities of neural machine translation (NMT) based largely on impressive results in automatic evaluation. There has to date been no pub-lished large-scale human evaluations of NMT output. This paper reports on a compara-tive human evaluation of phrase-based SMT and NMT in four language pairs, using the PET tool to compare output from both systems using a variety of metrics. These metrics comprise automatic evaluation, human rankings of adequacy and fluency, error-type markup, and post-editing effort (technical and temporal effort). This evaluation is part of the work of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality MT of educational data. While the primary intention for this evaluation is to identify the best MT paradigm for our proposed methodology for TraMOOC, we believe that our evaluation results will be of interest to the wider research community and to those in the translation industry interested in the deployment of cut-ting-edge MT systems.