Shared Experience
How many times has the assessment been conducted in this format?
The current digital examination format has been conducted 22 to 24 times, including both regular and repetition exams.
What contributed to the success?
At its core, the assessment functions as a measurement instrument. Its purpose is to determine which students have truly understood the material. To ensure the assessment fulfills this role, the questions used must strike a balance.
Each year, we conduct a systematic analysis of the questions using this two key metrics:
- Facility Index: Measures how difficult a question is, with a target range of 30–75% correct responses.
- Discrimination Index: Measures how effectively a question distinguishes between more and less able students. It is defined as the correlation between the weighted scores on the question and those on the rest of the test. A value above 0.25 is considered desirable.
Questions that fall outside these target ranges are either removed or revised, and in some cases, the teaching approach itself is adapted to better convey the underlying concepts. For example, if a question consistently shows poor discrimination, it may indicate that the concept wasn’t taught clearly enough.
Over the past three years, this data-driven approach has been applied consistently. Notably, even as the difficulty of the questions increased, student evaluations of the fairness of the assessment remained stable—despite a slightly higher failure rate. This can be interpreted as evidence that individual assessment questions have been validated.
What were the challenges and how were they overcome?
One of the main challenges was the intellectual complexity of designing high-quality mulitple-choice questions. This was initially underestimated—not only the creation of the questions themselves, but also the peer review process, which involves back-and-forth iterations between the lecturers to ensure each question is clear and unambiguous. As in research, the quality of the measurement is paramount in assessment.
Nevertheless, the challenge of clearly defining what makes a multiple-choice question effective continues to persist. As we mentioned peer feedback plays a crucial role here. While feedback between lecturers can be exchanged quickly, the true validation of a question only comes after it has been used in an actual assessment. This means there is often a delay of six months to a year before the effectiveness of a question can be confirmed.
Are there any further developments planned?
No major changes to the assessment format are currently planned, as the core structure has proven effective. We intend to maintain the existing approach, while building on it with the mindset that there’s always room for improvement.
What tips would you give lecturers who are planning a similar assessment?
We encourage lecturers to examine their assessment questions carefully and to invest sufficient time in their development. Also it’s important to share questions with colleagues for peer review.
Building on this, initial feedback from students can be extremely valuable when trying out new questions or formats. These novelties can be introduced in adapted form as exercises, allowing lecturers to observe how students respond and gather meaningful data—particularly in large cohorts, where the volume of feedback provides rich insights.
It is also worth noting that we found students are able to distinguish whether an assessment is well-crafted and fair, regardless of the average grade. This means that the perceived fairness of an assessment is not solely tied to outcomes (grades), but also to the quality and transparency of its design.