Extending previous risk model backtesting literature, we construct multiple hypothesis testing (MHT) with the stationary bootstrap. We conduct multiple tests which control for the generalized confidence level and employ the bootstrap MHT to design multiple comparison testing. We consider absolute and relative predictive ability to test a range of competing risk models, focusing on value-at-risk and expected shortfall (ExS). In devising the test for the absolute predictive ability, we take the route of recent literature and construct balanced simultaneous confidence sets that control for the generalized family-wise error rate, which is the joint probability of rejecting true hypotheses. We implement a step-down method which increases the power of the MHT in isolating false discoveries. In testing for the ExS model predictive ability, we design a new simple test to draw inference about recursive model forecasting capability. In the second suite of statistical testing, we develop a novel device for measuring the relative predictive ability in the bootstrap MHT framework. The device, which we coin multiple comparison mapping, provides a statistically robust instrument designed to answer the question: Which model is the best model?' Copyright (c) 2016 John Wiley & Sons, Ltd.