Digital Examination Involving a Large Student Cohort

Each spring semester, around 600–700 bachelor’s students in Mechanical Engineering at ETH Zurich take part in the course Machine Design. As part of the first-year examinations, their understanding of machine elements and design principles is tested in a digital examination conducted through Moodle.

What makes this assessment format distinctive is the systematic validation and continuous improvement of individual exam questions. Using statistical measures such as facility and discrimination indices, the lecturers refine the questions over time to ensure they are as effective as possible in measuring student understanding. This iterative, data-driven approach has proven to be a robust solution for assessing large cohorts in a fair manner.

All Course Assessments

Overview of the Course

What is the subject context of the course?

The course Machine Design is a mandatory second-semester course in the Bachelor’s program in Mechanical Engineering at ETH Zurich. It forms part of the first-year examinations. The course is attended by a large cohort of students, typically between 600 and 700.

Main objective of this course: What should students learn and be able to do at the end of the course?

The Machine Design course introduces students to the fundamentals of the development and design of technical systems. By the end of the course, students are expected to:

Understand the working principles and basic dimensioning concepts of common machine elements.
Apply this knowledge to combine machine elements into mechanisms and calculate the required dimensions of the elements as part of a technical system.
Prepare students for the third-semester Innovation Project by enabling them to create a technical system – from idea and design through manufacturing, assembly, and testing – resulting in a fully functioning system.

In addition to these competencies, the course aims to inspire students by emphasizing the real-world relevance of the content. Through practical examples and case studies, students gain insight into how the concepts they learn are applied in actual engineering contexts.

Why was the specific assessment format chosen?

The decision to implement a digital examination format with multiple-choice questions was driven by pragmatic considerations. When Prof. Dr. Mirko Meboldt took over the course in 2012, it already had around 500 students, and the workload was managed by him and just three doctoral assistants. To ensure feasibility, the team opted for a multiple-choice format, despite initial reservations about its pedagogical value.

Over time, especially after Dr. Quentin Lohmeyer joined in 2013, the focus shifted toward high-quality question design without using negative marking. They learned that creating effective multiple-choice questions requires significant effort. A distinctive feature of question design is that some questions are developed backwards from the solution, with logically consistent distractors that reflect common misconceptions. As a result, they view the development of good multiple-choice questions as both a science and an art.

The key advantage of this format is its efficiency. Once the questions are developed, the assessment and grading process is highly streamlined, with minimal need for manual correction.

Regarding the question types, they selected mostly Kprime and Single Choice formats developed by ETH Zurich from the options available in Moodle. Initially, eight question types were used, but these two proved most effective both didactically and technically. These question types allow for clear, unambiguous questions, often supported by graphics, and are well-suited to assess conceptual understanding. They deliberately avoided questions that are open to interpretation—there should be no room for ambiguity.

How are students prepared for the assessment?

Students are supported in preparing for the assessment through voluntary exercises offered throughout the semester and during lectures. In addition, students receive a mock exam that closely mirrors the actual assessment in terms of question format, structure, and time constraints. They are instructed that the mock exam should be completed within 90 minutes, just like the final assessment. This helps set clear expectations and ensures students are familiar with the type and scope of tasks they will encounter. Feedback from course evaluations shows that this approach is well received by students, as it provides transparency and helps them feel well-prepared.

Course Description

Fact Sheet

Resources

Grading and Feedback

Staff Workload (700 Candidates)

	Time	Staff Investment
Creating Questions	20 d	2 Lecturers
Administration and Coordination	10 d	2 Lecturers
Exam	3-4 h	2 Lecturers (Examiners)
	3-4 h	24 Doctoral Teaching Assistants
Grading	1 h	1 Lecturer

Extra Information

Creating Questions
The creation of exam questions requires a specific background in educational development and cannot be delegated to doctoral students. It is typically handled by a Senior Scientist with the necessary expertise.
Administration and Coordination
The administration and coordination of the course and the exam are closely aligned—not only between the lectures and the final assessment, but also in terms of how lecture content, exercises, and the mock exam are all carefully coordinated to prepare students for the final assessment.
Exam
During the exam, 6–7 examination rooms are used.

Shared Experience

How many times has the assessment been conducted in this format?

The current digital examination format has been conducted 22 to 24 times, including both regular and repetition exams.

What contributed to the success?

At its core, the assessment functions as a measurement instrument. Its purpose is to determine which students have truly understood the material. To ensure the assessment fulfills this role, the questions used must strike a balance.

Each year, we conduct a systematic analysis of the questions using this two key metrics:

Facility Index: Measures how difficult a question is, with a target range of 30–75% correct responses.
Discrimination Index: Measures how effectively a question distinguishes between more and less able students. It is defined as the correlation between the weighted scores on the question and those on the rest of the test. A value above 0.25 is considered desirable.

Questions that fall outside these target ranges are either removed or revised, and in some cases, the teaching approach itself is adapted to better convey the underlying concepts. For example, if a question consistently shows poor discrimination, it may indicate that the concept wasn’t taught clearly enough.

Over the past three years, this data-driven approach has been applied consistently. Notably, even as the difficulty of the questions increased, student evaluations of the fairness of the assessment remained stable—despite a slightly higher failure rate. This can be interpreted as evidence that individual assessment questions have been validated.

What were the challenges and how were they overcome?

One of the main challenges was the intellectual complexity of designing high-quality mulitple-choice questions. This was initially underestimated—not only the creation of the questions themselves, but also the peer review process, which involves back-and-forth iterations between the lecturers to ensure each question is clear and unambiguous. As in research, the quality of the measurement is paramount in assessment.

Nevertheless, the challenge of clearly defining what makes a multiple-choice question effective continues to persist. As we mentioned peer feedback plays a crucial role here. While feedback between lecturers can be exchanged quickly, the true validation of a question only comes after it has been used in an actual assessment. This means there is often a delay of six months to a year before the effectiveness of a question can be confirmed.

Are there any further developments planned?

No major changes to the assessment format are currently planned, as the core structure has proven effective. We intend to maintain the existing approach, while building on it with the mindset that there’s always room for improvement.

What tips would you give lecturers who are planning a similar assessment?

We encourage lecturers to examine their assessment questions carefully and to invest sufficient time in their development. Also it’s important to share questions with colleagues for peer review.

Building on this, initial feedback from students can be extremely valuable when trying out new questions or formats. These novelties can be introduced in adapted form as exercises, allowing lecturers to observe how students respond and gather meaningful data—particularly in large cohorts, where the volume of feedback provides rich insights.

It is also worth noting that we found students are able to distinguish whether an assessment is well-crafted and fair, regardless of the average grade. This means that the perceived fairness of an assessment is not solely tied to outcomes (grades), but also to the quality and transparency of its design.

The development of good multiple-choice questions is both a science and an art.“

„

“

Prof. Dr. Mirko Meboldt

ETH Competence Framework

Subject-specific Competencies

Concepts and Theories (assessed)
Techniques and Technologies (assessed)

Method-specific Competencies

Analytical Competencies (assessed)
Decision-making (assessed)
Problem-solving (fostered)
Project Management (fostered)

Social Competencies

Cooperation and Teamwork (fostered)
Customer Orientation (fostered)
Personal Competencies (fostered)
Creative Thinking (fostered)
Critical Thinking (fostered)

Overview of the ETH Competence Framework

Prof. Dr. Mirko Meboldt Full Professor at the Department of Mechanical and Process Engineering Email

Dr. Quentin Lohmeyer Lecturer at the Department of Mechanical and Process Engineering Email

Service Digital Examinations at ETH