Rating-scale evaluations are common in NLP, but are problematic for a range of reasons, e.g. they can be unintuitive for evaluators, inter-evaluator agreement and self-consistency...
Software architecture (SA) evaluation is a quality assurance technique that is increasingly attracting significant research and commercial interests. A number of SA evaluation meth...
The IMIRSEL (International Music Information Retrieval Systems Evaluation Laboratory) project provides an unprecedented platform for evaluating Music Information Retrieval (MIR) a...
Accessibility auditors have to choose a method when evaluating accessibility: expert review (a.k.a. conformance testing), user testing, subjective evaluations, barrier walkthrough...
Automatic evaluation of Machine Translation (MT) quality is essential to developing highquality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as ...
Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhit...