... However, the humans and baseline 1 (lead baseline) did score in the upper range of 2 to 3 and all others had scores lower than 2.5. Some of the systems (including B2) fell into the range of 1 to ... assign a score of 4 to all, 3 to most, 2 to some, 1 to hardly any, and 0 to none. The value assignment is for convenience of computing averages, since it is more appropriate to treat these measures ... trigrams1, using the on- topic document collection as the relevant set and the off-topic document collection as the irrelevant set. Figure 1 shows the top 5 concepts with their relevancy scores...