Publication Date



Rating scales are a popular item format used in many types of assessments. Yet, defining which rating is correct often represents a challenge. Using expert ratings as benchmarks is one approach to ensuring the quality of a rating instrument. In this paper, such expert ratings are analyzed in detail taking a video-based test instrument of teachers’ professional competencies from a follow-up study to TEDS-M (the so-called TEDS-FU study) as an example. The paper focuses on those items that did NOT reach sufficient consensus among the experts and analyzes in depth their features by coding the experts’ comments on those items and additionally considering their rating outcome. The results revealed that the item-wording and the composition of the group of experts strengthened or weakened agreement among the experts.


Institute for Learning Sciences and Teacher Education

Document Type

Journal Article

Access Rights

ERA Access

Access may be restricted.