An evaluation of the inter-rater reliability in a clinical skills objective structured clinical examination

V de Beer; J Nel; FP Pieterse; A Snyman; G Joubert; MJ Labuschagne

doi:10.7196/AJHPE.2023.v15i2.1574

PDF

Published: 2023-05-18

DOI: https://doi.org/10.7196/AJHPE.2023.v15i2.1574

V de Beer

Clinical Simulation and Skills Unit, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

J Nel

Clinical Simulation and Skills Unit, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

FP Pieterse

Clinical Simulation and Skills Unit, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

A Snyman

Clinical Simulation and Skills Unit, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

G Joubert

Department of Biostatistics, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

MJ Labuschagne

Clinical Simulation and Skills Unit, Faculty of Health Sciences, University of the Free State, Bloemfontein, South Africa

Abstract

Background. An objective structured clinical examination (OSCE) is a performance-based examination used to assess health sciences students and is a
well-recognised tool to assess clinical skills with or without using real patients.
Objectives. To determine the inter-rater reliability of experienced and novice assessors from different clinical backgrounds on the final mark allocations
during assessment of third-year medical students’ final OSCE at the University of the Free State.
Methods. This cross-sectional analytical study included 24 assessors and 145 students. After training and written instructions, two assessors per station
(urology history taking, respiratory examination and gynaecology skills assessment) each independently assessed the same student for the same skill by
completing their individual checklists. At each station, assessors could also give a global rating mark (from 1 to 5) as an overall impression.
Results. The urology history-taking station had the lowest mean score (53.4%) and the gynaecology skills station the highest (71.1%). Seven (58.3%) of
the 12 assessor pairs differed by >5% regarding the final mark, with differences ranging from 5.2% to 12.2%. For two pairs the entire confidence interval
(CI) was within the 5% range, whereas for five pairs the entire CI was outside the 5% range. Only one pair achieved substantial agreement (weighted
kappa statistic 0.74 ‒ urology history taking). There was no consistency within or across stations regarding whether the experienced or novice assessor
gave higher marks. For the respiratory examination and gynaecology skills stations, all pairs differed for the majority of students regarding the global
rating mark. Weighted kappa statistics indicated that no pair achieved substantial agreement regarding this mark.
Conclusion. Despite previous experience, written instructions and training in the use of the checklists, differences between assessors were found in
most cases.

Downloads

Download data is not yet available.

Issue

AJHPE Vol. 15 No. 2

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

The AJHPE is published under an Attribution-Non Commercial International Creative Commons Attribution (CC-BY-NC 4.0) License. Under this license, authors agree to make articles available to users, without permission or fees, for any lawful, non-commercial purpose. Users may read, copy, or re-use published content as long as the author and original place of publication are properly cited.

Exceptions to this license model is allowed for UKRI and research funded by organisations requiring that research be published open-access without embargo, under a CC-BY licence. As per the journals archiving policy, authors are permitted to self-archive the author-accepted manuscript (AAM) in a repository.

How to Cite

An evaluation of the inter-rater reliability in a clinical skills objective structured clinical examination. (2023). African Journal of Health Professions Education, 15(2), 13-17. https://doi.org/10.7196/AJHPE.2023.v15i2.1574

References

Boursicot K, Kemp S, Wilkinson T, et al. Performance assessment: Consensus statement and recommendations from the 2020 Ottawa conference. Med Teach 2021;43(1):58-67. https://doi.org/10.1080/0142159X.2020.1830052

Schuwirth LW, van der Vleuten CP. Current assessment in medical education: Programmatic assessment. J Appl

Test Technol 2019;20(S2):2-10.

Harden RM. Outcome-based education: AMEE Guide No. 14. Part 1: An introduction to outcome-based

education. Med Teach 2009;21(1):7-14. https://doi.org/10.1080/01421599979969

Khan KZ, Ramachandran S, Gaunt K, Pushkar P. The objective structured clinical examination (OSCE): AMEE

Guide No. 81. Part I: An historical and theoretical perspective. Med Teach 2013;35(9):e1437-e1446. https://doi.

org/10.3109/0142159X.2013.818634

Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990;65(9 Suppl):S63-S67.

https://doi.org/10.1097/00001888-199009000-00045

Smee S. Skill based assessment. BMJ 2003;326(7391):703-706. https://doi.org/10.1136/bmj.326.7391.703

Schleicher I, Leitner K, Juenger J, et al. Examiner effect on the objective structured clinical exam ‒ a study at five

medical schools. BMC Med Educ 2017;17(1):71. https://doi.org/10.1186/s12909-017-0908-1

Mortsiefer A, Karger A, Rotthoff T, Raski B, Pentzek M. Examiner characteristics and interrater reliability in

a communication OSCE. Patient Educ Coun 2017;100(6):1230-1234. https://doi.org/10.1016/j.pec.2017.01.013

Mazor KM, Zanetti ML, Alper EJ, et al. Assessing professionalism in the context of an objective structured clinical examination: An in-depth study of the rating process. Med Educ 2007;41(4):331-340. https://doi.org/10.1111/

j.1365-2929.2006.02692.x

Kenny DA. PERSON: A general model of interpersonal perception. Pers Soc Psychol Rev 2004;8(3):265-280.

https://doi.org/10.1207/s15327957pspr0803_3

Park B, DeKay ML, Kraus S. Aggregating social behavior into person models: Perceiver-induced consistency. J Pers Soc Psychol 1994;66(3):437-459. https://doi.org/10.1037//0022-3514.66.3.437

Gingerich A, Regehr G, Eva KW. Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Acad Med 2011;86(10 Suppl):S1-S7. https://doi.org/10.1097/ACM.0b013e31822a6cf8

Seitz T, Raschauer B, Längle AS, Löffler-Stastka H. Competency in medical history taking ‒ the training physicians’ view. Wien Klin Wochenschr 2019;131(1-2):17-22. https://doi.org/10.1007/s00508-018-1431-z

McKenna L, Innes K, French J, Streitberg S, Gilmour C. Is history taking a dying skill? An exploration using a simulated learning environment. Nurse Educ Pract 2011;11(4):234-238. https://doi.org/10.1016/j. nepr.2010.11.009

Jönsson A, Svingby G. The use of scoring rubrics: Reliability, validity and educational consequences. Educ Res Rev 2007;2(2):130-144. https://doi.org/10.1016/j.edurev.2007.05.002

Wood TJ. Exploring the role of first impressions in rater-based assessments. Adv Health Sci Educ Theory Pract 2014;19(3):409-427. https://doi.org/10.1007/s10459-013-9453-9

Williams RG, Klamen DA, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med 2003;15(4):270-292. https://doi.org/10.1207/S15328015TLM1504_11 18. Cook DA, Dupras DM, Beckman TJ, Thomas KG, Pankratz VS. Effect of rater training on reliability and accuracy

of mini-CEX scores: A randomised, controlled trial. J Gen Intern Med 2009;24(1):74-79. https://doi.org/10.1007/ s11606-008-0842-3

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Issue

Section

How to Cite

References

Most read articles by the same author(s)