Criterion-referenced assessment in the IB and results

Why is statistical scaling still used?

All IB assessments—whether exams, internal assessments, Theory of Knowledge essays, Extended Essays, or CAS—are assessed using clearly defined criteria. These describe what a student must demonstrate to achieve a particular level of performance. For example, a criterion might state: “The student evaluates the implications of cultural dimensions on behaviour.” The task of the examiner is to judge whether, and to what extent, the student has met this criterion. This is a standards-based approach: performance is measured against fixed descriptors, not against the performance of other students.

How does the concept of criterion-based assessment match with statistical scaling of results?

This means that raw marks reflect how well a student met the specified criteria—independently of how other students performed.

The challenge arises because even though the criteria remain constant, the exams themselves vary slightly from session to session in terms of difficulty. For example, one year’s psychology paper may include case studies or questions that are more conceptually challenging than the previous year’s.

This is where statistical scaling comes in. Once all papers are marked according to the criteria, the IB uses a process called grade boundary setting, supported by expert judgement, to determine what raw mark range should correspond to each grade (1 to 7). For example, while 65/100 might earn a 6 one year, it might only earn a 5 the next year if the exam was slightly easier.

So, the criteria tell us how many marks a student earns, and statistical scaling determines how those marks map onto grades, ensuring fairness across different cohorts and exam sessions.

Reconciling the two

In short, criterion-based marking ensures validity (students are assessed on what they know and can do), while statistical scaling ensures reliability and comparability (grades mean the same thing year to year).

They can absolutely be reconciled because they operate at different stages of the assessment process:

  • Criteria are used during the marking stage to ensure objective and consistent scoring.
  • Scaling is used at the grade-setting stage to account for differences in exam difficulty across sessions.

This dual approach helps maintain both academic integrity and global consistency in the awarding of IB grades.


Posted in

Leave a comment