... Proceedings of the 12th Conference of the European Chapter of the ACL, pages 112–120,Athens, Greece, 30 March – 3 April 2009.c2009 Association for Computational LinguisticsHuman Evaluation of a ... can only give a rough im-pression of the quality of the system output. It isunclear, however, what kind of metric would bemost suitable for the evaluation of string realisa-tions, so that, ... met-rics are, especially at the level of individual sen-tences. Using automatic evaluation metrics cannotbe avoided, but ideally, a metric for the evaluation of realisation rankers would rank alternative...