949
Views
0
CrossRef citations to date
0
Altmetric
Validation

The Balancing Act of Assessment Validity in Interprofessional Healthcare Education: A Qualitative Evaluation Study

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Received 23 Jun 2023, Accepted 27 Oct 2023, Published online: 15 Nov 2023

Abstract

Construct & Background

In order to determine students’ level of interprofessional competencies, there is a need for well-considered and thoroughly designed interprofessional assessments. Current literature about interprofessional assessments focuses largely on the development and validation of assessment instruments such as self-assessments or questionnaires to assess students’ knowledge or attitudes. Less is known about the design and validity of integral types of assessment in interprofessional education, such as case-based assessments, or performance assessments. The aim of this study is to evaluate the evidence for and threats to the validity of the decisions about students’ interprofessional performances based on such integral assessment task. We investigated whether the assessment prototype is a precursor to practice (authenticity) and whether the assessment provides valid information to determine the level of interprofessional competence (scoring).

Approach

We used a design-based qualitative research design in which we conducted three group interviews with teachers, students, and interprofessional assessment experts. In semi-structured group interviews, participants evaluated the evidence for and threats to the validity of an interprofessional assessment task, which were analyzed using deductive and inductive content analysis.

Findings

Although both evidence for and threats to validity were mentioned, the threats refuting the assessment’s validity prevailed. Evidence for the authenticity aspect was that the assessment task, conducting a team meeting, is common in practice. However, its validity was questioned because the assessment task appeared more structured as compared to practice. The most frequently mentioned threat to the scoring aspect was that the process of interprofessional collaboration between the students could not be evaluated sufficiently by means of this assessment task.

Conclusions

This study showed that establishing interprofessional assessment validity requires three major balancing acts. The first is the balance between authenticity and complexity. As interprofessional practice and competencies are complex, interprofessional tasks require build-up or guidance toward this complexity and chaotic practice. The second is that between authenticity and scoring, in which optimal authenticity might lead to threats to scoring and vice versa. Simultaneous optimal authenticity and scoring seems impossible, requiring ongoing evaluation and monitoring of interprofessional assessment validity to ensure authentic yet fair assessments for all participating professions. The third balancing act is between team scoring and individual scoring. As interprofessional practice requires collaboration and synthesis of diverse professions, the team process is at the heart of solving interprofessional tasks. However, to stimulate individual accountability, the individual performance should not be neglected.

Construct

The need for interprofessional education (IPE) and collaborative practice to improve patient outcomes has been widely recognized.Citation1 IPE is defined as “occasions when members or students of two or more professions learn with, from and about each other to improve collaboration and the quality of care and services”.Citation2 To become interprofessional (IP) practitioners, students need the relevant knowledge, skills, and attitudes relating to aspects of collaboration with students from other professions.Citation3 With the introduction of core competencies for interprofessional collaborative practice and new accreditation standards, IPE is being increasingly introduced at an early stage in many different educational programs and has been shown to have positive learning outcomes.Citation4–7 An assessment to guide students’ learning and to determine competence in IPE is fundamental.Citation8,Citation9 Assessments in IPE appear in many forms, such as simulation-based assessments, workplace-based assessments, or portfolio assessments. In these assessments, different constructs of IP collaboration are evaluated, such as knowledge, attitudes, or skills or their combination in competencies.Citation10 Although higher education institutions recognize the importance of IP assessments, inconsistencies in the literature regarding evidence-based assessments of IP tasks remain, and specifically raise questions about validity.Citation11

All assessments require evidence of validity; without it, assessments have little or no intrinsic meaning.Citation12 To make decisions about students’ IP competencies, we need to understand the evidence for the validity of the assessments upon which these decisions are based.Citation13 Literature about IP assessments focuses largely on the development and validation of assessment instruments, such as questionnaires.Citation14–16 Less is known about the design and validity of integral types of assessment in IPE, such as case-based assessments, simulation-based assessments, team assessments, or performance assessments.Citation3,Citation8,Citation10 The aim of this study therefore is to evaluate the evidence for and threats to the validity of the decisions about students’ IP performances after they participated in an assessment task. Also, with this study we aim to contribute to the call for broadening the scope of validation studies by evaluating qualitative validity evidence rather than focusing on commonly studied quantitative forms of validity evidence.Citation17

Background

Validation involves the process of building an argument for how and when an assessment’s scores accurately and adequately represent the construct of interest.Citation13 According to modern validity theories there are three important assumptions to validity: (1) Validation should be viewed as an evaluation of the specific purpose of the assessment procedure involving particular settings and learners.Citation18,Citation19 In other words, validity is an attribute of the assessment purpose and not a characteristic of the assessment instrument itself.Citation20 (2) The process of validation involves accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations based on a set of relevant claims.Citation21 (3) Depending on the proposed assessment use and interpretation of scores, specific types of evidence may be more or less relevant.Citation22 A claim that an assessment is predictive of a criterion supposedly related variable or outcome can be supported without evidence that the assessment samples a particular content domain. In contrast, a claim that an assessment covers a representative sample of a particular curriculum may be supported without evidence that the assessment predicts a certain outcome.Citation22

In the following sections, we describe two relevant validity aspects in the context of IP assessment, as investigated in the present study. These validity aspects originate from the literature by Kane and AERA et al.Citation18,Citation21,Citation22 The Approach chapter provides an overview of the specific validity claims investigated.

The first aspect of validity is scoring by which we mean the way in which all aspects of the IP performance are scored during an assessment.Citation23 Scoring affects one’s ability to make valid decisions about IP performance. The assessment must provide valuable information about the IP performance of the students as it relates to the assessment goal. Too few assessment cases, the assessment proving inappropriately difficult, or flawed rating scales, can all pose threats to validity of decisions about IP competence.Citation24 A unique challenge in IP assessment is that team-based performance is assessed alongside individual performance. Traditional models of assessment in higher education have focused mainly on tasks completed individually.Citation8 In IP assessment, the decision about team IP performance is difficult, since the outcome depends on several individuals who have different domain-specific competencies and who differ in the effort they put into the collaboration.Citation25 One team member’s performance has the potential to influence the performance of the entire team. A decision about IP learning is therefore not a sum of the separate individuals; rather, the whole is greater than the sum of its parts. In current IP literature, it is unclear how assessment should be designed to assess both the complex IP competence of an individual in an IP team and the IP team as a whole.Citation26 Another challenge specific to IP assessment involves the fact that IP assessment tasks should be fair in the sense that all involved professions should have equal opportunities for contributing to solving the specific case. In other words, it should not be easier or more difficult to draw conclusions on the level of IP competencies for one profession in relation to the others, because of unequal opportunities to contribute in the IP team process.

Authenticity is also an important aspect of validity, because in the context of this study, the IP assessment is preparation for real IP tasks undertaken during the students’ internships. Authentic assessments are needed because they provide information about how the student can apply what was learned beyond the classroom to various contexts and situations (e.g., IP internships).Citation27,Citation28 In instructional design, the design of authentic learning tasks is a first step in designing professional education and the most common understanding of authentic assessment in higher education describes it as involving ‘real-world’ tasks.Citation29,Citation30 In IPE, however, educators often start designing an IP activity by formulating IP competencies and learning goals.Citation31 As used by some health professions, the separate constructs of competencies are outcome markers in IP assessment (e.g., knowledge, skills, attitudes) that can be achieved separately and progressively along the continuum of a learner’s professional development. Inauthentic assessment tasks, cases, or fragmented assessments can pose threats to the validity of decisions about IP competence.Citation24

Studies on how to develop IP assessments that enable decision makers to draw valid conclusions about students’ IP performances remain scarce. Evidence was defined for the purposes of this study as the information that supports meaningful interpretations of assessment scores or outcomes, and threats as the information that interferes with this meaningful interpretation.Citation13 Based on the gaps in our knowledge regarding the validity aspects of authenticity and scoring, the goal of this study was to identify the evidence for and threats to the validity elements authenticity and scoring of a prototype assessment task aimed at assessing undergraduate students’ IP competencies.

Approach

Design and setting

Design

This study is part of a larger design-based research project.Citation10,Citation32 Previous studies in this project regarded a scoping review and a consensus study to understand the state of the art in IP assessment practices and to develop design guidelines for IP assessment. In the current study, an assessment procedure in IP education for second-year students was redesigned based on the results of the previous studies.Citation10,Citation32 We set up a qualitative study using group interviews, which is a technique in which participants in a social context can be interviewed simultaneously.Citation33

Setting

The current study was carried out at a University of Applied Sciences in the Netherlands. Within the domain of Health and Wellbeing, students in bachelor’s programs for various health professions (physiotherapy, nursing, occupational therapy, speech therapy, and arts therapy) participate in one mandatory IP course per educational year. The IP courses in Years One and Two are preparation for students’ internships in Year Three.

This study focused on the IP assessment of second-year students in four-year bachelor’s programs in various healthcare professions. Approximately 450 students per year participate in this course. The aim of this course is for students to develop the following specific IP competence: “to be able to draw up client-centered care plans based on the information from and the interaction with the different healthcare professionals”. Students participate in three IP classes in which they practice conducting an IP team meeting and writing an IP care plan in IP teams of approximately twelve students. The fourth meeting in the course regards the assessment task, in which students conduct an IP Team Meeting for two client cases. After this team meeting, they reflect on the process, by providing and receiving feedback on each individual student’s contribution to the team meeting. Together, they write an IP care plan, and individually they write reflection Validations. The course is followed by a pass/fail decision to which 1 educational credit (EC, 28 study hours) is linked. The learning objectives of this course are focused on the individual student who (1) Provides relevant information in an understandable way to other professionals to promote participation in decision-making, (2) Contributes to a shared problem analysis conducted by all the professionals involved, in which the personal perspective of the client is central, (3) Formulates specific goals in the interprofessional team together with other professionals, and (4) Draws up a joint plan of action that includes actions related to the treatment goals. The assessment task embedded in the IP course constituted the main material for the current study.

Participants

Three groups of participants were invited to participate (for a total of 21): teachers with experience in working in IP practice (n = 8), students with prior experience in IP education and an internship (n = 7), and experts in IP education or assessment (n = 6). We included different stakeholders, so we were provided with a comprehensive view on the IP assessment. Participants were sampled using purposive sampling, in which the researcher (HS) specifically asked individuals to participate who met our inclusion criteria (Box 1). The exclusion criteria for all participants were that they neither spoke nor understood written Dutch. We aimed to have eight participants per group, since the ideal size for a group interview is five to eight participants.Citation34 IP teachers and IP assessment experts were approached via e-mail to participate in the group interview. Students were approached via the electronic learning environment of the IP courses. Interested students contacted the researcher to participate in this study. Students participating in this study previously took part in the IP course (see ‘Setting’), meaning that they were involved in a similar assessment task as evaluated in this study. See Appendix 1 for more detailed information about the participants.

Box 1. Participants and inclusion criteria.

Materials

Our approach to data collection was informed by modern theories of validity.Citation12,Citation18,Citation21,Citation22 Based on the purpose of the IP assessment (i.e., to determine whether students were sufficiently competent in performing an IP team meeting and creating an IP care plan in a team to enter their undergraduate internships), the so-called interpretation-use argument was developed. In this argument, we formulated claims underlying the IP assessment to be evaluated, and organized these according to the two validity aspects: scoring and authenticity (). Based on these claims, the preparatory questionnaire as well as the group interview guides were developed.

Table 1. Interpretation/use arguments for the validity evaluation of the IP assessment prototype.

Preparatory questionnaire

Online preparatory questionnaires were designed as a preliminary assignment for participants, using Questback. The aim of the questionnaire was twofold: to inform participants about the IP assessment prototype, and to calculate mean scores and ranges of scores on the validity of the assessment task for all participants, which could then be presented during the group interview.

The questionnaires consisted of seven sociodemographic questions, for example “What is your profession/what do you study?”, and 13 questions regarding the validity aspects of authenticity and scoring, for example “Does this assessment task, performing an IP team meeting, reflect real IP practice?”. Participants were asked to grade each question on a scale from 1 (totally disagree) to 10 (totally agree). The questions’ wording was adjusted to the participants’ groups while the content remained the same. The preparatory questionnaire including the range of scores as scored by the participants can be found in Appendix 2.

Group interview guides

Group interview guides were developed for all three group interviews, based on the validity literature by Kane.Citation21,Citation35 These guides contained information regarding both the time path of the session and the questioning route, and had the same structure as the preparatory questionnaire (Box 2).

Box 2. Group interview guide

Prototype IP assessment

A first prototype of an IP assessment procedure was designed and was input for the validity evaluation. In the assessment task, student groups, consisting of 12 students from five professions, conduct an IP team meeting and write an IP care plan for two client cases. Afterwards, students write an individual reflection about their contribution to the IP collaboration. Finally, the student team is assessed with a pass or a fail on their IP care plan and the individual students is assessed with a pass or fail on their reflection. There are two IP assessors per group, who provide feedback on the process. They assess the group’s IP care plan and the reflection Validations independently based on the assessment criteria. Afterwards, both assessors discuss and calibrate together whether a pass or fail is allocated (see Appendix 3 for the assessment prototype).

Procedure

Students, teachers, and experts who were willing to participate received an e-mail with an information letter, the prototype of the IP assessment, and the preparatory online questionnaire. The information letter comprised information on the set-up of the study, the time investment required of the participants, and details of how the data would be treated and stored. Prior to the group interviews, everyone signed consent forms, while the preparatory questionnaire was completed by almost all participants. Two participants did not complete the questionnaire, but read the IP assessment prototype and participated in the group interview. We conducted three consecutive group interviews with students, teachers, and experts, exploring participants’ ideas regarding evidence for and threats to claims (), thus strengthening (or weakening) the decisions and ultimately the overall validity argument. To prevent research bias, an independent moderator conducted the group interviews and HS took an observing role. The moderator was selected because of their experience in moderating interviews in educational research. To guarantee trustworthiness, the moderator had not been involved in other studies in this project. We aimed to create a structured, safe and inclusive interview setting in which the moderator would listen to participants without any own beliefs or values regarding this topic. At the start of the group interview, the moderator welcomed participants, and asked them to introduce themselves. We then discussed several elements of the assessment (e.g., the assessment task and the assessment client cases). HS presented the participants’ mean scores and score range regarding the assessment elements. The moderator then asked participants to identify evidence for and threats to the validity. If answers were contradictory between participants, the moderator explored this contradiction. At the end of each interview, a short summary was given, and participants were asked whether they would like to share any other business regarding the IP assessment or this study. Data were collected in March 2022, and due the restrictions for physical meetings in reaction to the Covid-19 situation at the time, all group interviews were conducted online via MS Teams. Each interview lasted approximately 90 min.

Analysis

All group interviews were recorded, transcribed verbatim, anonymized, and uploaded on to Atlas.ti software (v. 9). For the data-analysis of the group interviews we used a combination of deductive and inductive content analysis.Citation36 To familiarize themselves with the data, HS together with researcher and coauthor LD read each transcript. A deductive approach was adopted first, for which an analysis matrix was developed. It consisted of four main codes, namely authenticity evidence, authenticity threat, scoring evidence, and scoring threat.Citation13 HS and LD independently coded one of the transcripts based on this coding scheme. Subsequently, we used inductive coding, in which we sub-coded text fragments. After that, these sub codes were grouped into categories consisting of a group of sub codes that are related to each other through content or context. New categories were added to the four main codes in the analysis matrix. HS coded the other two transcripts line by line, and LD cross-coded both independently. Both researchers discussed their doubts and differences. Together, they redefined the analysis matrix, and HS used this matrix as a final step in the analysis process to analyze all three transcripts again. Throughout the process, doubts and differences were discussed with a third researcher and coauthor (DS). presents the final analysis matrix.

Table 2. Analysis matrix.

Ethics

All participants gave written, informed consent prior to group interviews. Participants were ensured of confidentiality and anonymity. Ethical approval was granted by the research ethics committee from the Faculty of Health, Medicine, and Lifestyle at Maastricht University (number FHML-REC/2020/126).

Trustworthiness

Following the recommendations of Lincoln and Guba,Citation37 we used different strategies to ensure the trustworthiness of this study. We pursued credibility by using an analysis matrix that was based on literature and adequately represented the concepts of validity.Citation18 We also adopted investigator and analyst triangulation. Investigator triangulation involved working with multiple researchers in the sessions, and analyst triangulation required analyzing sections of the data with at least three researchers. At the end of all interviews, HS gave a short summary as a member check and participants were given the opportunity to reply. Transferability was pursued by providing a well-documented context and research process. We used thick description to allow the reader to make a transferability judgment, assessing whether our research process and findings are transferable to each reader’s educational context.Citation38

Results

In this section, we describe both the evidence for and threats to authenticity and scoring as derived from the group interviews. To provide an overview of evidence and threats, we present a visual summary and we include the total number of statements made in all group interviews regarding a certain evidence or threat (k).

Authenticity evidence and threats

Overall, the visual summary of the authenticity evidence and threats, and the number of statements per evidence and threat can be found in .

Figure 1. Evidence for and threats to authenticity.

Figure 1. Evidence for and threats to authenticity.

The authenticity validity aspect was discussed in relation to the assessment task, client cases and team composition. Team meetings appear to be an authentic part of the task, as indicated by six statements across the participant groups. One student with previous internship experience, for instance, said that “When you for example look at rehabilitation, this assessment task matches IP practice, because an IP team meeting is organized on a weekly basis” (Student M). However, it was more often (k = 14) mentioned that the IP team meeting, as presented in this assessment task, is more structured, detailed, and more perfectly performed than an IP team meeting in real IP practice. This indicates a gap between education and practice and is therefore considered a threat to authenticity. Adding to this threat is the experience (k = 2) that conducting an IP team meeting is not as common in some healthcare settings, such as home care or primary care practice. It was also stated (once) that the assessment product, writing an IP care plan, is not common in all healthcare practice settings either. More specifically, one participant mentioned that the IP care plan is often still written per profession.

Participants across all groups (k = 7) recognized the client cases as being similar to cases in healthcare practice, constituting evidence for authenticity. A threat to case authenticity, however, appeared to be the format (k = 8). In practice, healthcare professionals meet the client before having an IP meeting, while in the assessment task students only get to “meet” the client on paper. Additionally, participants twice indicated that the contexts were not clearly defined (e.g., the setting, such as the hospital), and in real practice, one would always know the context before participating in an IP team meeting.

Teachers and experts (k = 3) recognized that the unbalanced team composition in the assessment procedure, with more physical therapy and nursing students than other professions, is authentic, because in practice a similar pattern occurs. On the other hand, something considered by some (k = 8) to be a threat to the authenticity of the team composition was that some professions were missing in the student team, e.g., social workers, psychologists, or certain medical professionals such as general practitioners or rehabilitation specialists. Participants explained that the medical profession almost always participates in an IP team meeting, where they often have the role of the chairperson. A teacher elaborated that: “[this] is something paramedical or nursing students have to learn how to collaborate with, and now, [healthcare] does not match with what the students practice” (Teacher E).

Scoring evidence & threats

Overall, the visual summary of the scoring evidence and threats, and the number of statements per piece of evidence and per threat can be found in .

Figure 2. Evidence for and threats to scoring.

Figure 2. Evidence for and threats to scoring.

Regarding scoring validity, the assessment instructions, criteria, purpose, and procedure were discussed in the group interviews. Even though some comments described the instructions and criteria as clear (k = 8), the overall picture seemed to suggest that there is room to improve the clarity of the assessment purpose (k = 23), criteria (k = 12), and decision rules (k = 10). All participant groups believed that a lack of clarity in several aspects of this assessment was a threat to scoring, such as identifying which performance exactly must be assessed (e.g., the care plan or students’ behavior in the team meeting). A teacher explained: “I wondered, what is the means and what is the goal, in other words, when it is the goal that students learn to speak up in an IP team … then that should have a role in education and assessment.” (Teacher G). A student could only receive a pass or a fail; therefore, participants were confused about the existence of ‘excellent’ in the rubrics. The difference between a ‘sufficient’ and ‘excellent’ evaluation for the IP care plan lacks specificity, and was considered a threat to scoring.

Some participants indicated that the procedure and criteria allow for assessing the IP care plan (k = 2), but they mainly wondered whether the current assessment procedure actually enables the assessment of IP collaboration competencies (k = 28). More specifically, they discussed whether the current performance assessment, assessing whether students can write an IP care plan, matches the goal of determining competence before entering an internship. An expert elaborated: “starting the process is much more important than the outcome, […] and evaluating the outcome is dangerous when you want to say something about the acquisition of competence” (Expert Q). Students also believed that this performance was suboptimal for assessing team performance, because interdependency between IP team members is lacking when writing the IP care plan. Although all three participant groups commented that the assessment cases were sufficiently diverse to necessitate an IP approach and enable students from different professions to participate in the assessment task (k = 11), this did not necessarily mean that the students worked interprofessionally. A student explained how they individually performed the IP assessment task: “The nursing student started and we just added what we thought per profession, and in the end you just check if all criteria are included” (Student K).

According to students and teachers, this assessment task can serve the goal of preparation for an internship, if implemented with the formative function of practicing an IP meeting in a safe and controlled setting (k = 4). Participants across all groups believed that students need more practice in IP collaboration and more IP assessments before the IP collaboration can be assessed (k = 5), and that one course with one assessment task is insufficient to expect students to have acquired IP competencies (k = 4). They found it unfair to base a decision about students’ IP competence on merely a single written product, such as the IP care plan. Students themselves believe that more time and effort must be spent on IP collaboration in general before they are competent to enter an internship (k = 2).

Participants stated that the client cases were related to some professional scopes of practice, enabling students to have an active role in the assessment and to be assessed on their IP performance (k = 11). A speech therapy student explained “…when I look at the case with the woman, it says that the woman can’t speak, which is a very clear goal for speech therapists, with which we can work as a student.” (Student N). However, other participants indicated that students could be hindered from participation in the assessment due to little connection to the client cases, especially the students with a less biomedical professional background, such as arts therapists (k = 13). This is a threat to scoring since it could hinder some students from having an equal share in the assessment, and demonstrating IP performance. A threat to the scoring was that it shouldn’t only be teachers who assess the students, but also the students themselves, patients, or professionals from practice (k = 6).

Discussion

The aim of this study was to evaluate the evidence for and threats to the validity of the decisions about students’ IP performances made following an assessment task that comprised part of the preparation for an internship. We specifically investigated whether the assessment task and cases are in line with IP practice (i.e., authenticity) and whether the assessment enables valid decisions to be made about students’ IP competence (i.e., scoring). Results show that both evidence and threats can be found in the current prototype of the IP assessment task. The most important evidence was that several parts of the assessment task were regarded authentic, such as the team meeting and the client cases. Most mentioned threats were that the purpose of the assessment isn’t clear, and that the process of IP collaboration cannot be assessed using this assessment task. Overarching results reveal three balancing acts regarding the evaluation of the validity of the IP assessment: between authenticity and complexity, between authenticity and scoring, and between team scoring and individual scoring.

Balancing act 1: Authenticity & complexity

The IP meeting was considered an authentic assessment task, although the prototype is viewed as too structured and too idealized when compared to real IP practice. Authentic assessment tasks are important for driving learning, because they prepare students to apply knowledge in a real-world context. Authentic tasks can also enhance students’ effort due to experienced relevance.Citation39,Citation40 There are two challenges associated with designing authentic assessment tasks. First, we question whether providing second-year students with more structured tasks is problematic. From a learning perspective, complex tasks, like IP tasks, require scaffolding or guidance in dealing with high complexity and chaotic IP practice.Citation29,Citation41 According to that line of reasoning, a task more structured than real-world IP practice might provide a better opportunity for practice, and to learn before starting an internship, than providing students with unstructured tasks.

The second challenge involves the question of whether simulating current healthcare practice and its lack of structure is desirable. The work context of healthcare practice evolves over time and might become more structured through innovation. In that sense, the education and assessment practice studied here might be a forerunner of future IP practice in which professionals organize highly structured IP team meetings.Citation42 The current concept of authentic assessment implies that there is an existing world into which students need to fit, but perhaps healthcare educational institutes should be pushing the possibilities of what healthcare could be and not fearing transformative change.Citation30 Based on the two challenges, authenticity presents the first balancing act in validity evaluation, for which we need to further research the balance between authenticity and the appropriate level of IP assessment complexity. In addition to this, more research is needed regarding how we can design IP curricula and assessments to educate students to become collaboration-ready professionals.

Balancing act 2: Authenticity & scoring

Regarding the scoring validity aspect, the focus on the IP care plan in terms of performance and decision-making appeared to be the biggest threat. The focus on the end product seemed to lead to students using individualistic strategies to solve the IP task. They added their share to the IP care plan individually without any mutual collaboration or alignment to discuss the content and coordination of the care, even though an IP assessment task is supposed to drive learning to solve tasks interprofessionally. It appears that interdependency between the students was lacking, and it is questionable whether students engage in genuine collaboration when positive interdependence is not fostered.Citation25 According to participants, it is the process that should be assessed, not the end product. There seem to be difficulties with assessing both products and processes to determine IP competencies; the production of artifacts, such as care plans, may not represent the achievement of outcomes associated with collaboration.Citation43 That said, directly assessing teamwork and collaboration is also contested.Citation44 Using a group product to assess individual competence is considered a threat in terms of assessment fairness. Some students might receive a pass even though they did not contribute or collaborate sufficiently to justify the grade, while others might receive a fail even though they deployed an interprofessional strategy to generate the end product. A detriment of assessing only a team’s performance of collaborative learning is that it might, for example, elicit free-riding because individual accountability is not fostered.Citation11 On the other hand, assessing IP competencies as a process without taking the final product into account is meritless, as it is not representative of real-world experience: “we worked together well but did not help the patient” is not a successful outcome.Citation45 If the shared goal of the team (the IP care plan) becomes insignificant, it will influence the functioning of the team and consequently the validity of the assessment of collaboration competencies.Citation45 More research is needed to determine how to design IP assessment tasks that ensure interdependence among IP team members and individual accountability, what IP performances (products or processes) should best be assessed using which criteria, and how many IP assessments are needed to make a fair decision about students’ IP competencies.

The two validity aspects of authenticity and scoring appear not to be two separate entities that can be optimized independently; rather, they are intertwined.Citation46 It seems that improving one impacts the other. For example, when considering optimal authenticity, the IP student team has an unbalanced team composition, and it is possible that students with certain professional backgrounds lack connection to the IP client cases. As a consequence, some students are unable to share equally in the collaboration, and no information can be gathered about their IP performance, which is a threat to the scoring of the assessment.Citation18 Alternatively, in a fair scoring situation, all students are equally able to contribute to the IP collaboration. This way, the assessment task is simulated while being “built for” education and the acquisition of competencies, possibly leading to less authenticity. Educational institutes and designers of IP assessments are therefore challenged with a second balancing act, balancing authenticity and scoring to ensure collaboration-ready professionals who work toward a desired state of IP practice.

Balancing act 3: Team vs. individual assessment

Our findings show that in this study, the questions arose of who to assess and according to which criteria—the individual students or the team as a whole. The IP competencies that often guide the IP assessment are usually mainly focused on the individual learner and overlook any evaluation of the collaborative learning.Citation7,Citation8,Citation10 It is hardly surprising that students use individualistic strategies in the assessment task, because in most healthcare education, assessment is constructed as an individualistic process in which decisions are made about the performance of individuals instead of a team.Citation44 Individual assessments present a unique difficulty for IP assessment, because IP performance always depends on the context, the task, and the team of students.Citation47 For instance, it is possible for all individuals to score well individually on the competency of IP teamwork, yet when in a team, they perform awfully. In the prototype evaluated in this study, students had to individually pass the course, but depended on the client cases to fit their professional background to have a fair share in the collaboration, and on their team to receive a pass. Boud and BearmanCitation44 argue that any individual assessment to assess collaboration is inherently unfair to some extent, because of the context dependency, and therefore it is not sufficient to rely on one assessment task to make decisions about students’ IP competence. This dilemma also presents the third balancing act in validity evaluation: how should we balance the assessment of the team with the assessment of the individual?

Based on the identified threats to authenticity and scoring, the IP assessment prototype would benefit from revision. Results imply that revisions are desirable in the field of performance assessment, the summative character of the assessment, and the criteria used to assess the students’ performance. For example, by altering the assessment criteria we could focus more intensely on the student teams’ IP process as assessment performance.

Strengths and limitations

We acknowledge that there are strengths and limitations to our study. One strength is that we included different stakeholders, so we are provided with a detailed and nuanced view of the IP assessment. One limitation is that we did not include patients or professionals from practice in this study; they might have a better view on authentic assessment tasks or performances. It was a strength, however, that we included teachers with experience in practice who know much about authenticity, and experts who know much about scoring. A second limitation of our study regards the transferability of our findings. The study was context-specific, looking specifically at an IP task within the healthcare domain at a University of Applied Sciences, which could make the results less transferable to other contexts such as vocational education, or other domains such as social work. In the context of this study in the Netherlands, stakeholders who mainly collaborate in practice, such as nurses and doctors, are currently educated and assessed at different levels and different institutes (e.g., the medical profession at a university, the registered nurse at a University of Applied Sciences, and the licensed vocational nurses at secondary vocational education). The qualitative validation approach in this study was a strength, since there is a need for qualitative evidence, such as evaluating assessment instruments for interpretability and meaning in the population of interest.Citation17 We would recommend educators to make a validity argument for their IP assessment, and not rely on validations of certain instruments.

Implications for practice

When policymakers, educational advisors, and teachers include IP assessment to assess students’ IP competencies in their curricula, our results have several implications. To ensure positive interdependence between students, the learning goals, the assessment task, and the criteria should be designed so that collaboration between students from different professions is essential. The assessment should focus on both the team performance (using clear criteria to identify the subjects of that focus) and individual performance to stimulate individual accountability and discourage freeriding. The end goal (e.g., the IP care plan) should not be rendered insignificant in the assessment, but should always be combined with the assessment of processes. Validity should be monitored continuously to ensure that students use IP strategies instead of individual strategies to solve the assessment task.

A single assessment is insufficient to determine IP competence, and it is unfair to summatively assess students on one assessment moment. Effective IP assessments entail that the learning is represented within many collaborative experiences. The assessment requires sampling across all these experiences rather than requiring each one to be productive and successful.Citation44 If we want to educate students in IP collaboration, adding on one or two IP courses to existing curricula is not the answer; rather, learning with other students from different professions should become an important feature throughout the curricula. When designing an IP curriculum, educators should be aware of constructive alignment between the learning goals, the IPE, and the IP assessment, since IPE cannot be seen separately from the IP assessment. Lastly, it is crucial to continuously evaluate the validity and constructive alignment of the IP assessment for the assessment’s purpose, students, and context, and to adjust the assessment where necessary.

Conclusion

This study showed that validity evaluation exists of several balancing acts. The first balancing act is between authenticity and complexity. Complex tasks, like IP tasks, require a build-up, or guidance toward high complexity and chaotic IP practice, and as a result it may be best to introduce IP assessment in a structured way. The second balancing act is that between authenticity and scoring, in which optimal authenticity might lead to threats to scoring and vice versa. Having simultaneous optimal authenticity and scoring seems impossible, so it is important that validity is continuously evaluated to ensure authentic yet fair IP assessments for all participating professions. The third balancing act is between team scoring and individual scoring. In the IP context, collaboration is crucial, which implies that the group process predominates. Yet in the current context, students appear to use more individual strategies to solve the test task due to the individual focus in the assessment. The assessment should focus on both the team performance (using clear criteria to identify the subjects of that focus) and on the individual performance to stimulate individual accountability and discourage freeriding.

Supplemental material

Supplemental Material

Download PDF (84.9 KB)

Supplemental Material

Download PDF (115.2 KB)

Supplemental Material

Download PDF (192.1 KB)

Acknowledgments

We would like to thank all students, teachers, and experts who participated in this study.

Disclosure statement

The authors Validation there are no competing interests to declare.

Additional information

Funding

This work was supported by the NWO (Dutch Organization for Scientific Research) under grant number 023.011.026.

References

  • Samarasekera DD, Nyoni CN, Amaral E, Grant J. Challenges and opportunities in interprofessional education and practice. Lancet. 2022;400(10362):1495–1497. doi:10.1016/S0140-6736(22)02086-4.
  • Center for the Advancement of Interprofessional Education. Statement of Purpose. https://www.caipe.org/resource/CAIPE-Statement-of-Purpose-2016.pdf. Updated September, 2019. Accessed October 2, 2022.
  • Skinner K, Robson K, Vien K. Interprofessional education: a unique approach to addressing the challenges of student assessment. J Interprof Care. 2021;35(4):564–573. doi:10.1080/13561820.2020.1780202.
  • Canadian Interprofessional Health Collaborative. A National Interprofessional Competency Framework. CIHC-National-Interprofessional-Competency-Framework.pdf (phabc.org) Updated February 2010. Accessed September 10, 2022.
  • Interprofessional Education Collaborative. Core competencies for interprofessional collaborative practice: 2016 update. Core Competencies for Interprofessional Collaborative Practice: 2016 Update (memberclicks.net) Updated 2016. Accessed October 1, 2022.
  • Barr H, Koppel I, Reeves S, Hammick M, Freeth D. Effective Interprofessional Education: Argument, Assumption and Evidence. Hoboken, NJ: Blackwell Publishing; 2005. doi:10.1002/9780470776445.
  • Reeves S, Fletcher S, Barr H, et al. A BEME systematic review of the effects of interprofessional education: BEME Guide No. 39. Med Teach. 2016;38(7):656–668. doi:10.3109/0142159X.2016.1173663.
  • Rogers GD, Thistlethwaite JE, Anderson ES, et al. International consensus statement on the assessment of interprofessional learning outcomes. Med Teach. 2017;39(4):347–359. doi:10.1080/0142159x.2017.1270441.
  • Van der Vleuten CPM, Schuwirth LWT. Assessment in the context of problem-based learning. Adv Health Sci Educ Theory Pract. 2019;24(5):903–914. doi:10.1007/s10459-019-09909-1.
  • Smeets HWH, Moser A, Sluijsmans DMA, Janssen-Brandt XMC, Van Merrienboer JJG. The Design of Interprofessional Performance Assessments in Undergraduate Healthcare & Social Work Education: A Scoping Review. Health, Interprofessional Practice and Education. 2021;4(2):2144. eP2144. doi:10.7710/2641-1148.2144.
  • Meijer H, Brouwer J, Hoekstra R, Strijbos JW. Exploring Construct and Consequential Validity of Collaborative Learning Assessment in Higher Education. Small Group Res. 2022;53(6):891–925. doi:10.1177/10464964221095545.
  • Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ. 2003;37(9):830–837. doi:10.1046/j.1365-2923.2003.01594.x.
  • Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med Educ. 2015;49(6):560–575. doi:10.1111/medu.12678.
  • Lockeman KS, Dow AW, Randell AL. Validity evidence and use of the IPEC Competency Self-Assessment, Version 3. J Interprof Care. 2021;35(1):107–113. doi:10.1080/13561820.2019.1699037.
  • Lunde L, Bærheim A, Johannessen A, et al. Evidence of validity for the Norwegian version of the interprofessional collaborative competency attainment survey (ICCAS). J Interprof Care. 2021;35(4):604–611. doi:10.1080/13561820.2020.1791806.
  • Sick B, Radosevich DM, Pittenger AL, Brandt B. Development and validation of a tool to assess the readiness of a clinical teaching site for interprofessional education (InSITE). J Interprof Care. 2023;37(sup1):S105–S115. doi:10.1080/13561820.2019.1569600.
  • Wolf MG. The Problem with over-Relying on Quantitative Evidence of Validity [dissertation]. Santa Barbara, USA: University of California; 2023. doi:10.31234/osf.io/v4nb2.
  • Kane MT. Validating the Interpretations and Uses of Test Scores. J Educational Measurement. 2013;50(1):1–73. doi:10.1111/jedm.12000.
  • Wiliam D. Reliability, validity, and all that jazz. Education. 2001;29(3):3–13. doi:10.1080/03004270185200311.
  • Royal KD. Four tenets of modern validity theory for medical education assessment and validation. Adv Med Educ Pract. 2017;8:567–570. doi:10.2147/AMEP.S139492.
  • Kane MT. Explicating validity. Assess Educ. 2016a;23(2):198–211. doi:10.1080/0969594X.2015.1060192.
  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington DC, USA: AERA Publications Sales; 2014.
  • Clauser BE. Recurrent Issues and Recent Advances in Scoring Performance Assessments. Applied Psychological Measurement. 2000;24(4):310–324. doi:10.1177/01466210022031778.
  • Haladyna TM, Downing SM. Construct-Irrelevant Variance in High-Stakes Testing. Educ Meas. 2005;23(1):17–27. doi:10.1111/j.1745-3992.2004.tb00149.x.
  • Meijer H, Hoekstra R, Brouwer J, Strijbos JW. Unfolding collaborative learning assessment literacy: a reflection on current assessment methods in higher education. Assess Eval High Educ. 2020;45(8):1222–1240. doi:10.1080/02602938.2020.1729696.
  • O’Keefe M, Henderson A, Chick R. Defining a set of common interprofessional learning competencies for health profession students. Med Teach. 2017;39(5):463–468. doi:10.1080/0142159x.2017.1300246.
  • de Nooijer J, Dolmans DHJM, Stalmeijer RE. Applying landscapes of practice principles to the design of interprofessional education. Teach Learn Med. 2022; doi:10.1080/104013.34(2):209–214. 2021.1904937.
  • Wiggins GP. Educative Assessment: Designing Assessments to Inform and Improve Student Performance. Hoboken, NJ: Jossey-Bass; 1998.
  • Van Merrienboer JJG, Kirschner PA. Ten Steps to Complex Learning. A Systematic Approach to Four-Component Instructional Design. 3d ed. New York, NY: Taylor & Francis Ltd; 2018.
  • McArthur J. Rethinking authentic assessment: work, well-being, and society. High Educ (Dordr). 2023;85(1):85–101. doi:10.1007/s10734-022-00822-y.
  • Wagner SJ, Reeves S. Milestones and entrustable professional activities: The key to practically translating competencies for interprofessional education? J Interprof Care. 2015;29(5):507–508. doi:10.3109/13561820.2014.1003636.
  • Smeets HWH, Sluijsmans DMA, Moser A, Van Merrienboer JJG. Design Guidelines for Assessing Students’ Interprofessional Competencies in Healthcare Education: A Consensus Study. Perspect Med Educ. 2022;11(6):316–324. doi:10.1007/s40037-022-00728-6.
  • Frey JH, Fontana A. The group interview in social research. J Soc Sci. 1991;28(2):175–187. doi:10.1016/0362-3319(91)90003-M.
  • Clarke A. Focus group interviews in health-care research. Prof Nurse. 1999;14(6):395–397.
  • Kane MT. Validity as the evaluation of the claims based on test scores. Assess Educ. 2016b;23(2):309–311. doi:10.1080/0969594X.2016.1156645.
  • Elo S, Kyngäs H. The qualitative content analysis process. J Adv Nurs. 2008;62(1):107–115. doi:10.1111/j.1365-2648.2007.04569.x.
  • Lincoln YS, Guba EG. Naturalistic Inquiry. Washington DC, WS: SAGE Publications; 1985.
  • Moser A, Korstjens I. Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis. Eur J Gen Pract. 2018;24(1):9–18. doi:10.1080/13814788.2017.1375091.
  • McTighe J. What happens between assessments? Educ Leadersh. 1997;54(4):6–12.
  • Unwin CG, Caraher J. The Heart of Authenticity: Shared Assessment in the Teacher Education Classroom. Teach Educ Q. 2000;27(3):71–87.
  • Beed PL, Hawkins EM, Roller CM. Moving Learners toward Independence: The Power of Scaffolded Instruction. Read Teach. 1991;44(9):648–655.
  • van Dongen JJJ. Interprofessional Collaboration in Primary Care Teams: development and Evaluation of a Multifaceted Programme to Enhance Patient-Centredness and Efficiency [dissertation]. Maastricht, NL: Maastricht University; 2017. doi:10.26481/dis.20171215jvd.
  • Lockeman KS, Lanning SK, Dow AW, et al. Outcomes of introducing early learners to interprofessional competencies in a classroom setting. Teach Learn Med. 2017;29(4):433–443. doi:10.1080/10401334.2017.1296361.
  • Boud D, Bearman M. The assessment challenge of social and collaborative learning in higher education. Educ Philos Theory. 2022;1–10. doi:10.1080/00131857.2022.2114346.
  • Dijkstra J, Latijnhouwers M, Norbart A, Tio RA. Assessing the "I" in group work assessment: State of the art and recommendations for practice. Med Teach. 2016;38(7):675–682. doi:10.3109/0142159X.2016.1170796.
  • Rotthoff T, Kadmon M, Harendza S. It does not have to be either or! Assessing competence in medicine should be a continuum between an analytic and a holistic approach. Adv Health Sci Educ Theory Pract. 2021;26(5):1659–1673. doi:10.1007/s10459-021-10043-0.
  • Schuur K, Murray K, Maran N, Rhona F, Paterson-Brown S. A ward-round non-technical skills for surgery (WANTSS) taxonomy. J Surg Educ. 2020;77(2):369–379. doi:10.1016/j.jsurg.2019.09.011.