1,628
Views
4
CrossRef citations to date
0
Altmetric
Basic Research Article

Cross-cultural validity and psychometric properties of the International Trauma Questionnaire in a clinical refugee sample

Validez transcultural y propiedades psicométricas del Cuestionario Internacional de Trauma (ITQ) en una muestra clínica de refugiados

临床难民样本中国际创伤问卷 (ITQ) 的跨文化有效性和心理测量特性

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2172256 | Received 18 Aug 2022, Accepted 04 Jan 2023, Published online: 14 Feb 2023

ABSTRACT

Background: The ICD-11 post-traumatic stress disorder (PTSD) and complex PTSD diagnoses have been examined in several studies using the International Trauma Questionnaire (ITQ). The cross-cultural validity of the ITQ has not previously been studied using item responses theory methods focused on the issue of equal item functioning and thus comparability of scores across language groups.

Objective: To investigate the cross-cultural validity of the ITQ scales considering specifically local independence of items and differential item functioning (DIF) in a cross-cultural sample of refugees.

Method: Data from 490 treatment-seeking refugees were included, covering Danish, Arabic, and Bosnian languages and different levels of interpreter-assisted administration. Rasch and graphical log-linear Rasch models were used.

Results: There was strong local dependence among items from the same symptom clusters in the PTSD and disorders in self-organization (DSO) scales, except between affective dysregulation items. Weak local dependence was discovered between an item from the affective dysregulation cluster and an item from the disturbed relationship cluster. There was no evidence of DIF related to language or interpreter assistance. There was evidence of DIF for two PTSD items relative to gender and time since the traumatic event. The targeting of the scales to the study population was not optimal. Reliability varied from 0.55 to 0.78 for subgroups.

Conclusions: The PTSD and the DSO scales have stable psychometric properties across the Danish, Arabic, and Bosnian language versions and different levels of assisted administration. Scores are comparable across these groups. However, DIF relative to gender and time since trauma introduces considerable measurement bias. DIF-adjusted summed scale scores or estimated person parameters should be used to avoid measurement bias. Future research should investigate whether scales including more and/or alternative items that require higher levels of PTSD and DSO to be endorsed will improve targeting and measurement precision for refugee populations.

HIGHLIGHTS

  • A first cross-cultural validity study of the ITQ using IRT.

  • PTSD and DSO subscales functioned invariantly across Danish, Arabic, and Bosnian, and also across degrees of interpreter assistance. Two PTSD items did not function invariantly across gender and time since trauma.

  • The Danish, Arabic, and Bosnian ITQ can be used for screening treatment-seeking refugees, taking into account the item bias in the PTSD subscale, and suboptimal targeting and reliability, which require extensions or modification of items.

Antecedentes: El Trastorno de Estrés Postraumático (TEPT) de la CIE-11 y los diagnósticos de TEPT complejo se han examinado en varios estudios utilizando el Cuestionario Internacional de Trauma (ITQ). La validez transcultural del ITQ no se ha estudiado previamente utilizando métodos de teoría de respuesta al ítem centrados en el tema del funcionamiento igualitario de los ítems y, por lo tanto, tampoco la comparabilidad de las puntuaciones entre los grupos lingüísticos.

Objetivo: Investigar la validez transcultural de las escalas de ITQ considerando específicamente la independencia local de los ítems y el funcionamiento diferencial de los ítems (DIF en su sigla en inglés) en una muestra transcultural de refugiados.

Método: Se incluyeron datos de 490 refugiados que buscaban tratamiento, que abarcaban los idiomas danés, árabe y bosnio y diferentes niveles de administración asistida por intérpretes. Se utilizaron modelos de Rasch y de Rasch loglineal gráfico.

Resultados: Hubo una fuerte dependencia local entre los elementos de los mismos grupos de síntomas en las escalas de TEPT y Trastornos de la autoorganización (DSO en su sigla en inglés), con la excepción de los dos elementos de desregulación afectiva en la escala DSO. Además, se descubrió una dependencia local débil entre el elemento c2 del grupo de desregulación afectiva y el elemento c5 del grupo de relaciones interpersonales alteradas. No hubo evidencia de DIF en relación con el idioma ni al grado de asistencia del intérprete. Hubo evidencia de DIF para dos ítems de TEPT relacionados con el género (p6; sentirse nervioso o sobresaltarse fácilmente) y el tiempo desde el evento traumático (p2; imágenes o recuerdos poderosos que a veces vienen a su mente en los que siente el evento). La orientación de las escalas a la población de estudio no fue óptima y la confiabilidad varió de 0.55 a 0.78 para los subgrupos.

Conclusión: Las escalas TEPT y DSO tienen propiedades psicométricas estables en las versiones en danés, árabe y bosnio y en diferentes niveles de administración asistida, y las puntuaciones son comparables entre estos grupos. Sin embargo, el DIF en relación con el género y el tiempo transcurrido desde el trauma introduce un sesgo de medición considerable. Por lo tanto, se deben usar puntajes de escala sumados ajustados por DIF o parámetros de persona estimados para evitar el sesgo de medición. La investigación futura debe investigar si las escalas que incluyen más ítems y/o ítems alternativos que requieren niveles más altos de TEPT y DSO para ser respaldados mejorarán la precisión de la focalización y la medición para las poblaciones de refugiados.

背景:几项使用国际创伤问卷 (ITQ) 的研究考查了ICD-11 创伤后应激障碍 (PTSD) 和复杂性 PTSD 的诊断。以前没有使用条目反应理论方法研究ITQ 的跨文化有效性,该方法侧重于平等条目功能问题以及跨语言组分数的可比性。

目的:在跨文化的难民样本中考查 ITQ 量表的跨文化有效性,特别考虑条目的局部独立性和差异条目功能 (DIF)。

方法:包括来自 490 名寻求治疗难民的数据,涵盖丹麦语、阿拉伯语和波斯尼亚语以及不同级别的口译员协助管理。 使用 Rasch 和图形对数线性 Rasch 模型。

结果:除了 DSO 量表中的两个情感失调条目外,PTSD 和自组织障碍 (DSO) 量表中来自相同症状簇的条目之间存在强烈的局部依赖性。 此外,在来自情感失调簇的条目 c2 和来自不安关系簇的条目 c5 之间发现了弱的局部依赖性。 没有与语言或口译协助程度相关的 DIF 证据。 有两个 PTSD 条目与性别(p6;感觉神经过敏或容易受惊)和创伤事件发生后的时间(p2;强大的图像或记忆有时会出现在你的脑海,让你在感受到处于事件中)的 DIF 证据。 量表对于研究人群的目标并不是最佳的,亚组的可靠性从 0.55 到 0.78 不等。

结论:PTSD 和 DSO 量表在丹麦语、阿拉伯语和波斯尼亚语版本以及不同级别的协助管理中具有稳定的心理测量特性,并且这些组之间的分数具有可比性。 然而,与性别和创伤后时间相关的 DIF 引入了相当大的测量偏差。 因此,应使用调整了DIF的总量表分数或估计人员参数来避免测量偏差。 未来的研究应该考查包括更多和/或替代条目、需要更高水平的 PTSD 和 DSO 才能背书的量表,是否可以提高难民群体的定位和测量精度。

1. Introduction

According to the United Nations High Commissioner for Refugees, the number of refugees worldwide is increasing, having reached more than 82 million people (UNHCR, Citation2022). The physical, psychological, and social problems of treatment-seeking refugees are comprehensive, and many have considerable problems in adapting to life in their new host countries (Palic et al., Citation2014). A recent meta-analysis (Henkelmann et al., Citation2020) based on a sample of 15,000 refugees from 66 studies reported the prevalence of diagnosed and self-reported post-traumatic stress disorder (PTSD) as being between 29% and 37%. The 11th version of the International Classification of Diseases (ICD-11) (WHO, Citation2018) includes a revised formulation of PTSD and a new related, but distinct, disorder of complex post-traumatic stress disorder (CPTSD). ICD-11 PTSD is comprised of three symptom clusters: re-experiencing the trauma as if it was re-occurring in the present, avoidance of reminders of the trauma, and persistent sense of heightened threat. CPTSD is characterized by symptoms of PTSD in addition to three additional symptom clusters reflecting disorders in self-organization (DSO): affective dysregulation, negative self-concept, and disturbances in relationships. CPTSD was introduced in ICD-11 to describe more complex and debilitating symptom presentations associated with repeated or prolonged exposure to traumatic stressors (Cloitre, Garvert, et al., Citation2013), such as those endured by many refugees. A recent systematic review estimated prevalence rates of CPTSD between 16% and 38% across treatment-seeking refugees (de Silva et al., Citation2021).

Initial construct validity of the ICD-11 diagnoses has been supported among treatment-seeking (Nickerson et al., Citation2016; Palic et al., Citation2016) and resident refugees (Frost et al., Citation2019; Tay et al., Citation2019). The factor structure of the International Trauma Questionnaire (ITQ) has been investigated across cultures and across community and treatment-seeking samples, including those in Asia, Lithuania, the USA, and Israel (Redican et al., Citation2021), Denmark (Vang et al., Citation2021), and Ghana, Kenya, and Nigeria (Charak et al., Citation2022). These studies found support for the two-factor second order model, which fits well with the ICD-11 conceptualization of PTSD and CPTSD as two hierarchical diagnoses. A few studies in different countries support a first order factor model, including those in Israel and Norway (Redican et al., Citation2021). The ITQ (Cloitre et al., Citation2018) has been developed to screen for symptoms of ICD-11 PTSD and CPTSD. The only two studies to our knowledge which examined the latent structure of the ITQ using factor analysis in refugees (Vang et al., Citation2020, Citation2021, using the same sample) and Vallières et al. (Citation2018), both found support for a two-factorial hierarchical structure and a two-class latent structure (Vang et al., Citation2021) resembling that of most other ITQ studies, and thus lending support to the cross-cultural validity of the ITQ. However, the validity and precision of the ICD-11 formulations of post-traumatic symptomatology in cross-cultural populations remain understudied. Further, the influence of the degree of interpreter/clinician-assisted administration, which is a prerequisite to clinical utility with refugee populations, has never been studied.

To offer adequate treatment, there is a need for screening tests to identify the diagnostic status and the effect of treatment. As refugees come from many parts of the world to Western countries, there are extensive challenges related to producing valid measures for various cultures and languages. Specifically, while evidence suggests that ICD-11 CPTSD has been more successful in describing a ‘global nosology’ (Lewis-Fernández & Kirmayer, Citation2019) than its predecessor, DESNOS, significant differences in the expression and formation of symptoms across cultures still exist and require further research (Heim et al., Citation2022). Heim et al. (Citation2022) suggest a differentiation between emic and etic research: emic research focuses on cultural concepts of distress and how these might differ in terms of descriptions, explanatory models, and culturally specific syndromes that include symptoms from otherwise separate diagnoses in the ICD or Diagnostic and Statistical Manual of Mental Disorders (DSM) conceptualizations (an insider’s view). Conversely, etic research aims to employ universal formulations of syndromes and symptoms and study the occurrence and prevalence rates comparatively across different cultures (an outsider’s view) (Heim et al., Citation2022). As an example of etic research, Vindbjerg et al. (Citation2020) studied the construct validity and psychometric properties of the Harvard Trauma Questionnaire (HTQ), a screening instrument for PTSD, using Rasch measurement models among Arab and Iranian refugees in psychiatric treatment. Vindbjerg et al. (Citation2020) recommended a new scoring procedure, with items divided into two subscales and adjustments of scores for differential item functioning (DIF) relative to language to accommodate quantitative differences across cultures. The aim of the present study is aligned with a further exploration of the applicability of ICD-11 PTSD and CPTSD as studied from an etic perspective.

Similarly to other PTSD populations, treatment-seeking refugees commonly suffer from PTSD-related cognitive deficits (Buckley et al., Citation2000). In addition, refugees can have limited reading abilities, sometimes also in their first language (Vallières et al., Citation2018). To alleviate such difficulties in clinical practice, self-report schedules are regularly administered with assistance from an interpreter or a clinician, and interpretation can also be necessary to convey the meaning of the spoken interactions between the refugee and the therapist. In a study of 816 refugee patients, with the HTQ as the primary outcome, the subsample in interpreter-mediated psychotherapy had less improvement in their mental health status compared to the subsample without an interpreter (Jensen et al., Citation2018). Furthermore, Vindbjerg et al. (Citation2020) noticed that the language DIF in the HTQ was most pronounced when an interpreter had been used. Hence, challenges hitherto in assessing PTSD among refugees include the validity of the instruments themselves for subgroups of refugees being assessed with various language versions and with varying degrees of interpreter assistance.

Currently, the ITQ (Cloitre et al., Citation2018) is the only measure designed to assess symptoms of ICD-11 PTSD and CPTSD. Previous research has relied primarily on the use of confirmatory factor analysis (CFA) to test the dimensional structure implied in the ICD-11 proposal for PTSD and CPTSD. Studies using item response theory (IRT) approaches to investigate the psychometric properties of ICD-11 items in more detail are, however, scarce. IRT is focused on the actual item responses and thus allows for a more detailed investigation of the performance of individual items as indicators of latent constructs. Of particular importance is the issue of DIF for subgroups (e.g. patients having received different language versions or patients having received differing degrees of assistance in responding), which, if present, can lead to biased estimates of symptom severity for some subgroups, affecting the validity of the assessment based on the instrument (Holland & Wainer, Citation2009). Of equal importance is the issue of conditional independence of items given the latent construct, as locally dependent items affect the reliability of a scale negatively (Marais, Citation2012). Considering the ambition of ICD-11 to improve clinical utility and global applicability of the diagnoses (Maercker et al., Citation2013), IRT-based studies in cross-cultural samples with different language versions of the ITQ are an important step towards achieving this ambition.

To our knowledge, only three studies have used IRT methods to study the psychometric properties of the ITQ scales (Christen et al., Citation2021; Cloitre et al., Citation2018; Shevlin, Hyland, Roberts, et al., Citation2018 ). Cloitre et al. (Citation2018) used a combination of endorsement rates, item discrimination and/or item difficulty, and clinical relevance on 28 dichotomized ITQ items of the extended research version in UK clinical and community samples, to reduce the number of items to six PTSD items and six DSO items. Before reducing the number of items, Cloitre et al. (Citation2018) assessed whether a one-parameter model (item difficulty as the only parameter) or a two-parameter model (item difficulty and item discrimination) described the extended PTSD and DSO scales for a clinical and a community sample, respectively. They found that the parsimonious one-parameter model was preferable, except for the DSO scale in the community sample. Using the reduced 12-item ITQ, they then assessed the dimensionality of the ITQ, i.e. whether a two-dimensional (PTSD and DSO) or a six-dimensional (three PTSD and three DSO symptom clusters) was more appropriate, and found that one was not superior to the other. Lastly, Cloitre et al. (Citation2018) tested the six-dimensional model for configural invariance (i.e. same factor structure), scalar invariance (i.e. for the entire set of items, item intercepts/difficulties for each item are equal for subgroups), and DIF (i.e. for each grouping variable, item difficulties for each items are equal for the subgroups) across the community and clinical sample, and suggested that there was no substantial issue with lack of invariance. The psychometric findings of Cloitre et al. (Citation2018) concerning dimensionality and invariance for the 12-item ITQ were largely replicated in a German study with nationally representative data, also using dichotomized items (Christen et al., Citation2021). The latter study also found no difference in how well the one- and two-parameter models described the PTSD scale, but as for the community sample in Cloitre et al. (Citation2018), suggested that the two-parameter model described the DSO scale better. Furthermore, Christen et al. (Citation2021) called for IRT analyses using the ordinal rather than the dichotomized items in both clinical and community samples, as well as testing for DIF in clinical samples to further understand item functioning. To date, the study by Shevlin, Hyland, Roberts, et al. (Citation2018)  is the only IRT study which has analysed the ordinal ITQ items. The authors did this to reduce the number of items in the DSO scale from 16, using a nationally representative sample from North America (N = 1839), and concluded that DSO items were largely interchangeable in terms of item difficulty and discrimination.

The short-form versions of the ITQ PTSD and DSO scales resulting from the work of Cloitre et al. (Citation2018) are currently used for assessing symptoms of ICD-11 PTSD and CPTSD in multiple countries. However, to date, no studies have employed IRT methods to: (1) evaluate the psychometric properties of the five-point ordinal items of the short-form scales in clinical samples; (2) conduct IRT analyses focusing on local independence of items and DIF, as well as the impact of both on the reliability and validity of measurement; or (3) investigate the targeting of items to diverse clinical patient samples. Such studies are pertinent to evaluate the short-form ITQ as a screening tool for ICD-11 PTSD and CPTSD in cross-cultural clinical refugee populations, and its suitability as a measure for treatment monitoring. Of particular clinical importance in refugee populations are studies assessing the potential DIF associated with different language versions of the ITQ and interpreter-assisted assessment.

The current study aims to begin to fill these gaps in the research with this first investigation of the Danish, Arabic, and Bosnian translations of the ITQ, using Rasch measurement models with a multicultural sample of treatment-seeking refugees situated in Denmark, to investigate the cross-cultural validity of the ITQ scales considering specifically local independence of items and DIF using Rasch and other IRT models.

2. Method

2.1. Participants and procedure

Participants included refugees referred to the Rehabilitation Center for Trauma Survivors (RCT), a specialized treatment clinic for PTSD in Haderslev, Denmark. All participants were referred by private practitioners or psychiatrists for assessment and treatment of PTSD. Participants were offered treatment if they met the inclusion criteria: trauma-related mental health problems, often PTSD in combination with comorbid mental and physical health problems. Referred refugees were not offered treatment if they were suffering from psychotic disorders, were actively suicidal, or did not have a legal stay in Denmark. Referred patients who did not meet the inclusion criteria for treatment were subsequently referred to other relevant treatment in coordination with the patient’s private practitioner. All patients offered treatment at RCT from December 2015 to March 2022 were enrolled in an assessment programme, including the ITQ at intake (baseline). Dropouts from the assessment programme were offered the same treatment as those participating in the assessment programme. Data from 490 patients were included in this study. The mean age of the participants was 41.9 years (SD 10.47). Further demographic characteristics of the study sample are shown in , and the prevalence of types of trauma reported by the participants with the ITQ is included in .

Table 1. Demographic characteristics of the study sample.

Table 2. Type and prevalence of traumatic events in the study sample.

2.2. Instrument

The ITQ is a self-report instrument with 12 items measuring PTSD and DSO (). The PTSD scale includes two items measuring re-experiencing, two items measuring avoidance, and two items measuring sense of threat. The DSO scale includes two items measuring affective dysregulation, two negative self-concept items, and two items regarding disturbances in relationships. The ITQ also includes a further six items for tapping into functional impairment in relation to the PTSD and DSO symptoms, which contribute to the diagnosis of PTSD and CPTSD according to the ITQ guidelines but are not part of the PTSD and DSO scales, and thus they were not included in the current study. Patients rate the ITQ items using a five-point response scale from 0 to 4 (). For diagnosis, a symptom is considered endorsed if a patient scores ≥ 2. An ITQ diagnosis of PTSD requires endorsement of one of two symptoms in each of the three PTSD clusters as well as endorsement of one of three functional impairment items. The ITQ diagnosis of CPTSD requires fulfilling the criteria for PTSD, plus endorsement of one of two symptoms in each DSO cluster, as well as endorsement of one of three functional impairment items.

Table 3. Percentage distribution of responses to post-traumatic stress disorder (PTSD) and disturbances in self-organization (DSO) items of the International Trauma Questionnaire (ITQ).

The ITQ includes an open-ended question assessing the index trauma and time since the trauma. Participants are asked to self-report problems and symptoms based on the index trauma(s).

The ITQ was developed in English (Cloitre, Roberts, et al., Citation2013) and used in a Danish translation with a back-translation approved by Marylene Cloitre by Møller et al. (Citation2020). From Danish it was also translated into Arabic and Bosnian at RCT using a forward–backwards translation procedure with experienced psychologists and trained experienced interpreters with extensive understanding of the constructs. In 2019, the item translations were aligned with the English version by Cloitre et al. (Citation2018). The Danish, Arabic, and Bosnian versions are all available at https://www.traumameasuresglobal.com/itq. In the current study, we used slightly differing response categories, however, with the same meaning (‘slet ikke, lidt, noget, en hel del, rigtig meget’ rather than ‘slet ikke, en lille smule, moderat, en hel del, ekstremt meget’).

When completing the ITQ, patients were accompanied by either a clinician or an interpreter, or both. Some patients chose to complete the ITQ by themselves using the Danish, Arabic, or Bosnian language version, in accordance with the language they spoke in the room unprompted. If needed, the clinician/interpreter would read the items aloud and record the answers, thus offering various degrees of assistance and clarification. This was recorded at the end of the assessment (see ). Patients who completed the ITQ alone without any clarifying questions were recorded as having received ‘no interpreter assistance’. In cases where clarifications were provided by the interpreter in order for the patient to understand a few of the ITQ items, this was recorded as ‘few clarifications’, e.g. some patients would ask ‘please explain what upset means’. For patients who required items to be read aloud and needed clarification on several ITQ items, ‘read aloud and many clarifications’ was recorded. If patients required the ITQ to be read aloud and required extensive assistance from the interpreter or clinician to rate items, ‘read aloud and helped formulating answers’ was recorded. Some chose the Danish language version, which was then translated directly into Arabic, and thus there was a discrepancy between the spoken language (Arabic) and the ITQ version used (Danish translated into Arabic). The trained interpreters at RCT had from 8 to 30 years of experience with translation. The distribution of the different levels of interpreter assistance within language groups is provided in .

Table 4. Distribution of degree of interpreter assistance provided to patients within language groups.

2.3. Rasch measurement models

The simplest IRT model is the one-parameter logistic model, also known as the Rasch model (RM) for dichotomous items (Rasch, Citation1960). In this study, we used the partial credit model (PCM) (Masters, Citation1982), which is a generalization of the Rasch model for ordinal data, and graphical log-linear Rasch models (GLLRMs) (Kreiner & Christensen, Citation2002, Citation2004, Citation2007), which are extended Rasch models. The dichotomous RM and the PCM adhere to the same requirements for measurement (Kreiner, Citation2013; Mesbah & Kreiner, Citation2012); therefore, we hereafter just use the term ‘RM’ for the Rasch model, covering both. The five basic requirements for measurement by the RM are: (1) unidimensionality, i.e. the items of a scale measure a single underlying latent construct; (2) monotonicity, i.e. the expected item scores increase with increasing values on the latent variable; (3) local independence (or no local dependence; no LD), i.e. the item responses are conditionally independent given the latent variable; (4) absence of differential item functioning (no DIF), i.e. item responses and relevant background variables (i.e. exogenous variables) are conditionally independent given the latent variable; and (5) homogeneity, i.e. the rank order of item parameters (item ‘difficulties’) is the same for all persons regardless of their level on the latent variable.

Fulfilment of the first four requirements applies to all parametric IRT models and provides criterion-related construct validity as defined by Rosenbaum (Citation1989), while the only parametric IRT model fulfilling the requirement of homogeneity is the RM (Rasch, Citation1960, Citation1961). Fulfilment of all five requirements means that the sum score is a sufficient statistic, i.e. that no additional information on the latent variable’s score can be obtained from the response profile of the items besides the information provided by the total score. Sufficiency of the raw sum score distinguishes scales fitting Rasch models from scales fitting other IRT models (Kreiner, Citation2013). Sufficiency is desirable when wanting to use the summed raw score of a scale. However, fit to the RM also facilitates use of the so-called ‘Rasch scores’, which are the person parameter estimates expressed on the logit scale, where the distance between any two values on the scale is equal (often referred to as ‘interval-scaled scores’). Thus, either the Rasch or the raw sum scores can be used in subsequent analysis, as preferred by the individual researcher or practitioner for their specific purpose.

When fit to the RM is rejected, it is still possible to achieve close to optimal measurement if the only departures from the RM are in the form of uniform DIF and/or uniform LDFootnote1 (Kreiner & Christensen, Citation2007), as such departures can be adjusted for and tested within the extended GLLRM. If a GLLRM adjusts for uniform LD only, the sufficiency of the sum score is not affected, but the reliability of the scale will be affected negatively to some degree (Kreiner & Christensen, Citation2002, Citation2004, Citation2007). If a GLLRM adjusts for uniform DIF, the sum score is no longer a sufficient statistic for the latent score, unless DIF equating is done to resolve the issue in subsequent comparisons of subgroup scores and avoid biased results (Kreiner, Citation2007).

2.3.1. Item analyses by RM and GLLRM

The analysis included several steps in order to test the fit of the set of items in each scale to the RM or a GLLRM as rigorously as possible:

  • overall test of homogeneity of the item parameters across low and high scoring groups

  • overall tests of invariance relative to important background variables (i.e. language groups, assistance,Footnote2 time since trauma, gender, and age groups (see )

  • tests of no DIF for all items relative to the same background variables

  • tests of local independence for all item pairs

  • fit of the individual items to the RM.

The steps were not necessarily taken in the order presented above. For example, if evidence of LD or DIF turned up when testing fit to the RM, log-linear interactions were added to the model one by one and steps repeated in iteration until no further evidence against fit to the model was disclosed.

When the final iteration was completed and fit to RM or GLLRMs was established, analyses were concluded by:

  • evaluation of the effect of any DIF discovered

  • evaluation of targeting and reliability relative to the study sample

  • estimation of person parameter estimates, standard error, and bias of measurement.

Overall tests of fit (i.e. global homogeneity by comparison of item parameters in low- and high-scoring groups and overall invariance) were conducted using Andersen’s (Citation1973) conditional likelihood ratio (CLR) test. Individual item fit was assessed by conditional infit and outfit statistics (Kreiner & Christensen, Citation2013; Kreiner & Nielsen, Citation2013) and tested by comparing the observed item–restscore correlations with the expected item–restscore correlations under the specified model (Kreiner & Christensen, Citation2004). The presence of LD and DIF in GLLRMs was tested with Kelderman’s (Citation1984) CLR test of local independence as well as tests of conditional independence using partial Goodman–Kruskal gamma coefficients for the conditional association between item pairs (presence of LD) or between items and exogenous variables (presence of DIF) given scores (Kreiner & Christensen, Citation2004).

Dimensionality across the PTSD and the DSO scales was tested by comparing the observed γ correlation of the scales with the expected γ correlation under the unidimensional model, as two scales measuring different constructs will be significantly more weakly correlated than what is expected under the unidimensional model (Horton et al., Citation2013). To avoid falsely rejecting unidimensionality across the two scales, we tested this with the final scale models including any uniform LD, as unrecognized positive LD within scales can create false evidence of multidimensionality (i.e. negative LD).

The Benjamini–Hochberg procedure was used to adjust for a false discovery rate (FDR) due to multiple testing whenever appropriate (Benjamini & Hochberg, Citation1995). We followed the recommendation made by Cox et al. (Citation1977) to abstain from the deterministic decision criterion of 5% critical limit for p-values, and instead distinguished between weak to moderate evidence against the model when p-values were larger than .01, and stronger evidence when p-values were less than .01, as suggested by Poulsen et al. (Citation2018).

Reliability was estimated using Hamon and Mesbah’s (Citation2002) Monte Carlo method for reliability of Rasch scales, as this method can overcome violations of the assumption of conditionally independent items. Targeting was assessed graphically as well as numerically by two indices. For the numerical evaluation, we estimated the test information target index as the mean test information divided by the maximum test information for theta, and the root mean squared error (RMSE) target index as the minimum standard error of measurement (SEM) divided by the mean SEM for theta (Kreiner & Christensen, Citation2013). Both indices should preferably have a value close to one. In addition, we estimated the target of the observed score and the SEM for the observed score. For the graphical representation of targeting and test information, we plotted item maps with the distribution of the item threshold locations against weighted maximum likelihood estimations of the person parameter locations as well as the person parameters for the population (assuming a normal distribution) and the information function.

2.3.2. Software

All analyses were conducted using DIGRAM software (Kreiner, Citation2003; Kreiner & Nielsen, Citation2013) and item maps were created with R version 3.6.1.

3. Results

Neither the PTSD nor the DSO scale fitted the pure RM (, RM columns). However, the item analyses resulted in each scale fitting a GLLRM, although of differing complexity (, GLLRM columns; ). There was no evidence against the fit of individual items to the respective GLLRMS (Tables S1 and S2 in the Supplementary material).

Figure 1. Final graphical log-linear Rasch models for the post-traumatic stress disorder (PTSD) scale (top) and the disturbances in self-organization (DSO) scale (bottom). Note: γ-correlations are partial Goodman and Kruskal’s rank correlations for ordinal data. Lang, language versions; TimeT, time since trauma exposure.

Figure 1. Final graphical log-linear Rasch models for the post-traumatic stress disorder (PTSD) scale (top) and the disturbances in self-organization (DSO) scale (bottom). Note: γ-correlations are partial Goodman and Kruskal’s rank correlations for ordinal data. Lang, language versions; TimeT, time since trauma exposure.

Table 5. Global tests of homogeneity and invariance for the post-traumatic stress disorder (PTSD) and disturbances in self-organization (DSO) scales under the Rasch model (RM) and final graphical log-linear Rasch model (GLLRM).

The PTSD scale fitted a relatively complex GLLRM with strong LD between the two re-experiencing items p1 and p2, the two avoidance items p3 and p4, and the two sense of threat items p5 and p6 (see for item content). Also, item p6 functioned differentially in relation to gender, so that female patients were systematically more likely to indicate that they had been bothered by feeling jumpy or easily startled, compared to male patients, regardless of their level of PTSD. In addition, item p2 functioned differentially in relation to time since traumatic event, indicating that patients with shorter time since the event were systematically more likely to indicate that they had been bothered by powerful images or memories coming into their mind, compared to patients with longer time since the event, regardless of their level of PTSD.

The DSO scale fitted a simpler GLLRM, including only locally dependent items, but no DIF. There was very strong LD between the two negative self-concept items c3 (‘I feel like a failure’) and c4 (‘I feel worthless’), strong LD between the two disturbances in relationships items c5 (‘I feel distant or cut-off from people’) and c6 (‘I find it hard to stay emotionally close to people’), and weak to moderate LD between one of the affective dysregulation items c2 (‘I feel numb or emotionally shut down’) and one of the disturbances in relationships items, c6 (see above).

We found no evidence of DIF or lack of overall invariance in relation to the two variables of primary interest, i.e. the language variable defining the group of patients where the test language matched the language that patients naturally spoke in the room (Danish/Danish, Arabic/Arabic, and Bosnian/Bosnian) and where the two languages did not match (Arabic/Danish), as well as the variable defining degrees of interpreter clarification during the administration (see and , and Table S8 in the Supplementary material).

We formally tested whether the PTSD and the DSO scales were indeed separate scales by testing whether the total scale was a unidimensional scale. As expected, this was rejected, as the observed correlation between the scales was significantly weaker than the expected correlation under a unidimensional model (total sample: γobserved = 0.424, γexpected = 0.522, SE = 0.026, p < .001).

The Supplementary material includes documentation of the initial evidence against local independence of items and no DIF, when testing against the RM (Tables S3 and S4), evidence for the necessity of the included interaction terms for LD and DIF in the GLLRMS (Table S5), and information on item thresholds, locations, difficulties, and target values (Tables S6 and S7).

3.1. Effect of DIF in the PTSD scale

Two items in the PTSD subscale functioned differentially: one relative to gender and one relative to time since traumatic event. To be able to use unbiased summed scale scores or the estimated person parameters in subsequent statistical analysis (e.g. group comparisons or assessment of treatment effects), the DIF must be taken into account by adjusting both scores accordingly. In the Supplementary material, we provide tables with the necessary information for converting the summed scale scores to estimated person parameters, the estimated person parameters for all of the different subgroups affected by the DIF, and DIF-adjusted sum scores for these subgroups ( and ). Importantly, the DIF adjustments needed at the individual level are more than one point on the PTSD sum-score scale ().

Table 6. Conversion of post-traumatic stress disorder (PTSD) raw scores to weighted maximum likelihood estimates of person parameters and adjustment of the raw score for differential item functioning (DIF) relative to gender and time since trauma, in (A) men and (B) women.

Table 7. Conversion of disturbances in self-organization (DSO) raw scores to weighted maximum likelihood estimates of person parameters.

shows the observed and adjusted mean PTSD scores for the subgroups affected by DIF, as well as the bias (confounding effect) introduced with the use of the unadjusted scores. When comparing PTSD scores across genders, the total effect of DIF related to gender and time since traumatic event does not change the conclusion that there is no gender difference in mean PTSD scores; rather, it is accentuated. However, when comparing PTSD scores across time since traumatic event subgroups, the evidence for these being different becomes strong. Failing to adjust the PTSD score for DIF would thus have led us to conclude that there was only weak evidence that time since traumatic event had an impact on PTSD score, even though the evidence was, in fact, strong.

Table 8. Comparison of observed and differential item functioning (DIF)-adjusted mean post-traumatic stress disorder (PTSD) scores for groups defined by gender and time since trauma.

3.2. Targeting and reliability

Targeting of the PTSD scale to the study sample was poor, with only between 41% and 57% of the maximum information obtained on average for the subgroups defined by gender and time since traumatic event, while the targeting of the DSO scale was a little better, with 61% of the maximum information obtained (). When inspecting the item maps (Figures S1 and S2 in the Supplementary material), it is evident that most information is available at the lower end of both scales and that there is some misalignment of items to person, so that there is insufficient information at the higher end of both scales. In other words, the items are too ‘easy’ for the study sample (i.e. they require too little PTSD or DSO to be endorsed).

Table 9. Targeting and reliability of the post-traumatic stress disorder (PTSD) and disturbances in self-organization (DSO) scales.

The reliability of the PTSD scale varied from 0.55 to 0.78 across subgroups defined by gender and time since traumatic event (), with the highest reliability values for males with a maximum of 10 years since traumatic event. The reliability of the DSO scale was 0.79.

4. Discussion

The ITQ was developed to maximize the clinical utility and international applicability of brief measures of PTSD and CPTSD, and it is available in many languages. The present study is the first to assess the validity and detailed measurement properties of the short-form ITQ with methods previously recommended for complex item analyses of PTSD measures in a multicultural context (i.e. GLLRMs) (Vindbjerg et al., Citation2020). The ITQ is recommended as a valid and reliable measure of ICD-11 PTSD and CPTSD (https://www.traumameasuresglobal.com/itq). While the present results support that the PTSD and DSO scales are indeed separate scales, they also show that there are several previously undocumented measurement issues, with implications for the validity and precision of the scales.

4.1. Valid and unbiased measurement across subgroups

We found no evidence against cross-cultural validity in terms of DIF relative to language groups or the different levels of interpreter assistance offered. However, we did find evidence of DIF in the PTSD scale, for one item relative to gender and for one item relative to time since trauma.

Clinical scales for use in patients with a diverse cultural background are often translated to obtain various language forms, or the native language form is translated on site by interpreters during assessment (i.e. practice aligned with the etic approach to researching cultural differences) (Heim et al., Citation2022). Both practices give rise to concern about the transculturality of such scales (i.e. can they transcend cultures simply by translation?). To some extent, this can be investigated through analyses of cross-cultural invariance and DIF. The practice of oral translation within the assessment gives rise to a concern that the degree to which the interpreter needs to clarify, explain, or elaborate, in order for the patient to be able to fully understand a question, could also affect measurement. If questions do not function equally across languages or degrees of assistance provided, the result can be biased measurement, so that some groups might score artificially high or low owing to language or assistance effects. Thus, scores would not be valid measurements of PTSD or DSO unless they were adjusted for this bias. In the present study, we did not find any evidence against the cross-cultural validity of the PTSD and DSO scales, nor did we find any evidence for the degree of assistance provided in the assessment causing any bias. Thus, for now we recommend that the use of the Danish, Arabic, and Bosnian language versions for patients who speak these languages unprompted in a session can be continued with the current practice of interpreter assistance, as there is currently no evidence of language bias – not even for the group of patients speaking Arabic and tested with the Danish version translated into Arabic in the room – and no bias across degrees of interpreter assistance.

For the PTSD scale, we found strong evidence of DIF relative to both gender and time since trauma leading to biased mean scores for subgroups defined by gender and time since trauma. The bias varies from negligible (−0.04 scale points) to substantial overestimation for the group of female patients with 5–10 years since trauma (0.40 scale points). In particular, the latter is an issue for further investigation, as the effect size with regard to improvement in PTSD in Danish treatment-seeking refugees measured with the HTQ has been found to be 0.33 for a group receiving imagery reversal therapy compared to treatment as usual (Sandahl et al., Citation2021) – an effect that may be contaminated by gender and/or time since trauma bias. The degree to which the bias discovered in this study would affect the prevalence PTSD in females is also an issue for further research, as previous research has shown higher prevalence rates in, for example, UK and Israeli trauma-exposed females than in males (McGinty et al., Citation2021), i.e. are previously reported prevalence rates for females too high? When the ITQ is used with the screening algorithm provided with the instrument (i.e. a score of at least 2 on one item in each cluster of the PTSD scale, and so on), the bias does affect this, as it cannot be readily transferred to the algorithm. However, we do recommend taking into account that females systematically endorse one of the sense of threat items (feeling jumpy or easily startled) more readily than males, even at the same level of PTSD, and that this does affect the scores, particularly so, as previous studies have shown females to overreport on this particular item (Palm et al., Citation2009). Thus, there will be a false difference of some magnitude in the screening results between males and females. Whether this plays a role at the individual level cannot be determined from this study, but as the largest adjustment for DIF needed at the individual level is 0.59 scale points (), we recommend that further studies are undertaken to discover whether this is the case.

4.2. Measurement precision

Several items in both the PTSD and the DSO scales were found to be locally dependent. The LD between items in the PTSD scale was exclusively between items from the same symptom clusters (p1 and p2, p3 and p4, and p5 and p6), and in the DSO scale two pairs of locally dependent items were also from the same symptom clusters (c3 and c4, and c5 and c6). Such findings are not unusual in scales measuring clinical conditions – psychological or physical – by the presence or degree of symptoms, as the degree of one symptom may be systematically associated with the degree of another symptom regardless of the severity of the condition. Such LD means that some symptoms are associated beyond what can be explained by the level of the condition measured. In this case, for example, the two re-experiencing items (p1 and p2) in the PTSD scale co-determine one another beyond what can be explained by the degree of PTSD. Thus, having upsetting dreams replaying parts of the traumatic event (or not) is systematically related with having powerful images coming into your mind (or not), regardless of the degree of PTSD. The practical implication of LD is that both items in a locally dependent item pair have to be administered because of their co-determination. This does not, however, mean that all items from the same symptom cluster will be locally dependent or that items from different symptom clusters are automatically conditionally independent. In fact, in this study, we find evidence of both. First, items c2 and c6, from the affective dysregulation and the disturbed relationship clusters, respectively, are locally dependent, although only to a moderate degree. When examining the item content, this is not surprising, as being emotionally shut down and finding it difficult to stay emotionally close to other people can be seen as the same issue by patients, and because emotional problems as described in the first item can be co-dependent with relational problems as described in the second. One might, however, consider whether more reliable measurement can be achieved by rephrasing questions about emotional closeness issues that are less overlapping with ‘shutting down’ (e.g. Do you have issues with maintaining close interpersonal relationships?). The lack of LD between items c1 and c2, both from the affective dysregulation cluster, might, on the other hand, reflect that the items stem from two separate subscales in the previous 24-item research version of the ITQ, which specified two distinct emotional symptom clusters: hyper-arousal (five items) and hypo-arousal (five items) (Cloitre et al., Citation2018).

The reliability of the PTSD scale varied across gender and time since traumatic event owing to DIF, and was considered low for an instrument for individual assessment for all subgroups. The reliability of the DSO scale was 0.79 for all patients and thus closer to an acceptable level for individual assessment. Previous research has reported varying levels of reliability of the ITQ subscales for clinical and population-based samples; for instance, Choi et al. (Citation2021) reported the reliability of the PTSD and DSO subscales as 0.92 and 0.91, respectively, for the Korean ITQ with a general population sample, while in the Danish context, Vang et al. (Citation2021) reported reliabilities ranging from 0.73 to 0.91 (PTSD) and from 0.77 to 0.86 (DSO) for diverse clinical samples. The lower and varied reliabilities in our study can mainly be attributed to our taking into account the LD between items in each scale, as this is known to result in lowered reliability because LD that is not dealt with in some manner will falsely inflate reliability (Marais, Citation2012). In addition, chronicity in this refugee patient group probably contributes to reduced variability and thus reduced reliability, as in the study by Vindbjerg et al. (Citation2020) on the 16-item HTQ PTSD scale. Reliability of both the ITQ and the HTQ is most often reported without accounting for LD (Cloitre et al., Citation2018; Shevlin, Hyland, Roberts, et al., Citation2018; Vindbjerg et al., Citation2020), and thus more research is needed to establish the reliability across various target populations.

The targeting of both the PTSD and the DSO scale was poor. The best targeting was found for the DSO scale; however, only 61% of the maximum information was obtained on average. The targeting of the six-item PTSD and DSO scales has not previously been investigated. Christen et al. (Citation2021) evaluated item information within the two scales for a representative sample for the German general population; however, it was not related to the distribution of items and persons (i.e. targeting). Thus, the current finding of poor targeting of the PTSD scale and moderate targeting of the DSO scale to the study population is the first documenting the lack of items requiring higher levels of PTSD/DSO to be endorsed (i.e. items are too ‘easy’ for this study population to endorse). The reason for the seemingly too ‘easy’ items in both scales might – at least partially – be found in the high prevalence of PTSD and CPTSD symptoms in the study sample, as while the majority of person scores fall within the range of the scale covered by the items (i.e. high information), a substantial proportion scores fall above this range of the scale, and particularly so for the PTSD scale (see Figures S1 and S2 in the Supplementary material). From Figure S1, it is also a reasonable – again at least partial – explanation that the targeting of the PTSD scale to some degree depends on the time that has passed since the traumatic event, as targeting by this item appears to be the poorest for those with the longest time interval since the traumatic event, regardless of their gender. The natural next step would be to investigate whether the inclusion of more items, or exchange with other items from the longer ITQ, would improve targeting and measurement precision in the target populations.

4.3. Limitations

The study is limited by the sample size and a high prevalence of patients who would be diagnosed with either PTSD or CPTSD using the ITQ diagnostic algorithms. It should also be noted as a limitation that the ITQ has not been validated against a gold-standard instrument, such as a diagnostic interview, in refugee populations (Heim et al., Citation2022), it is not possible to assess the accuracy of the diagnoses (i.e. specificity and sensitivity), and for this reason we have not tested for DIF related to diagnosis. Lastly, the current study used trained experienced interpreters with understanding of the ITQ constructs and its use in clinical practice, working as interpreters at the rehabilitation clinic. Interpreters’ integral knowledge regarding PTSD and CPTSD may vary across studies and clinics. Even if there is no bias across degrees of interpreter assistance in the clinical context of the study, previous cross-cultural studies, e.g. on the HTQ, have reported a lack of invariance across languages (Choi et al., Citation2006; Vindbjerg et al., Citation2020). Thus, variation in the use of trained experienced interpreters across studies is to be expected, and direct comparison across studies using interpreters must always be done with caution.

4.4. Conclusions and recommendations

The gender and time since trauma bias discovered in the PTSD scale leads us to recommend that DIF-adjusted sum score or person parameters estimates are used for research purposes (i.e. group comparisons) as well as for any evaluations of treatment effects at the group level using the ITQ. At the individual level, the effect of the bias needs to be studied further, and also in relation to the screening algorithm provided with the ITQ. Within the clinical context, we recommend that the current practice of using several language versions and varying degrees of interpreter assistance is continued, as it does not introduce bias in the measurement. Interpreters in the present study were trained in administering the ITQ. If the ITQ is administered by interpreters without training and experience, results should be interpreted with caution. Patients are sometimes exposed to multiple traumas over extended periods of time, e.g. partner violence or as victims of war and genocide, which risk being overlooked without the use of a supplementary event list or a clinical interview. Furthermore, cultural idioms of CPTSD symptoms, in particular, may be overlooked without a clinical evaluation. We therefore recommend that the ITQ is used not in isolation, but with a supplementary event list and a clinical interview.

Further studies are needed to obtain adequately detailed knowledge of the psychometric properties of the six-item PTSD and DSO scales cross-culturally and diagnostically in diverse patient and target populations, for screening and research purposes. In addition, further studies aimed at improving targeting and the measurement precision of the scales are needed before it can be deemed prudent to consolidate the short-form ITQ for screening or monitoring purposes with treatment-seeking refugees.

Supplemental material

Supplemental Material

Download PDF (948.9 KB)

Acknowledgements

We thank the patients and interpreters who participated in the study.

Authors' contributions

TN and AE planned the study. SBN and MAH acquired the data. TN conducted the analyses. All authors contributed to the first draft of the article and all authors contributed to and approved the revisions.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

Data can be made available for research purposes upon reasonable request to the corresponding author.

Notes

1 Uniform/non-uniform refers to the way in which items depend on either exogenous variables (in the case of DIF) or other items (in the case of LD), i.e. uniform implies that this dependence is the same across all levels of the latent variable, while non-uniform implies that it is not.

2 As we did not have complete information for the degree of assistance by the interpreter or clinician (n = 457), the tests of overall invariance and DIF relative to this were conducted after the final models had been established in the main analyses.

References

  • Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38(1), 123–140. https://doi.org/10.1007/BF02291180
  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
  • Brewin, C. R., Cloitre, M., Hyland, P., Shevlin, M., Maercker, A., Bryant, R. A., & Reed, G. M. (2017). A review of current evidence regarding the ICD-11 proposals for diagnosing PTSD and complex PTSD. Clinical Psychology Review, 58, 1–15. https://doi.org/10.1016/j.cpr.2017.09.001
  • Buckley, T. C., Blanchard, E. B. & Trammell Neill, W. (2000). Information processing and PTSD: A review of the empirical literature. Clinical Psychology Review, 20(8), 1041–1065. https://doi.org/10.1016/S0272-7358(99)00030-6.
  • Charak, R., Cano-Gonzalez, I., Shevlin, M., Ben-Ezra, M., Karatzias, T., & Hyland, P. (2022). Dimensional latent structure of ICD-11 posttraumatic stress disorder, complex PTSD, and adjustment disorder: Evidence from Ghana, Kenya, and Nigeria. Traumatology, 28(2), 223–234. https://doi.org/10.1037/trm0000311
  • Choi, H., Lee, W., & Hyland, P. (2021). Factor structure and symptom classes of ICD-11 complex posttraumatic stress disorder in a South Korean general population sample with adverse childhood experiences. Child Abuse & Neglect, 114, 104982. https://doi.org/10.1016/j.chiabu.2021.104982
  • Choi, Y., Mericle, A. & Harachi, T.T.W. (2006). Using Rasch analysis to test the cross-cultural item equivalence of the Harvard Trauma Questionnaire and the Hopkins Symptom Checklist across Vietnamese and Cambodian immigrant mothers. Journal of Applied Measurement, 7, 16–38.
  • Christen, D., Killikelly, C., Maercker, A. & Augsburger, M. (2021). Item response model validation of the German ICD-11 International Trauma Questionnaire for PTSD and CPTSD. Clinical Psychology in Europe, 3(4), Article e5501. https://doi.org/10.32872/cpe.5501
  • Cloitre, M., Garvert, D. W., Brewin, C. R., Bryant, R. A., & Maercker, A. (2013). Evidence for proposed ICD-11 PTSD and complex PTSD: A latent profile analysis. European Journal of Psychotraumatology, 4(1), 20706. https://doi.org/10.3402/ejpt.v4i0.20706
  • Cloitre, M., Roberts, N., Bisson, J. I., & Brewin, C. R. (2013). Self-Report Community Version 1.0. Translated into Danish by Elklit, A. (2015). Unpublished manuscript.
  • Cloitre, M., Shevlin, M., Brewin, C. R., Bisson, J. I., Roberts, N. P., Maercker, A., & Hyland, P. (2018). The International Trauma Questionnaire: Development of a self-report measure of ICD-11 PTSD and complex PTSD. Acta Psychiatrica Scandinavica, 138(6), 536–546. https://doi.org/10.1111/acps.12956
  • Cox, D. R., Spjøtvoll, E., Johansen, S., van Zwet, W. R., Bithell, J. F., Barndorff-Nielsen, O., & Keuls, M. (1977). The role of significance tests [with discussion and reply]. Scandinavian Journal of Statistics, 4(2), 49–70. https://www.jstor.org/stable/4615652.
  • de Silva, U., Glover, N., & Katona, C. (2021). Prevalence of complex post-traumatic stress disorder in refugees and asylum seekers: Systematic review. BJPsych Open, 7(6), e194. https://doi.org/10.1192/bjo.2021.1013
  • Frost, R., Hyland, P., McCarthy, A., Halpin, R., Shevlin, M., & Murphy, J. (2019). The complexity of trauma exposure and response: Profiling PTSD and CPTSD among a refugee sample. Psychological Trauma: Theory, Research, Practice, and Policy, 11(2), 165–175. https://doi.org/10.1037/tra0000408
  • Gelezelyte, O., Roberts, N. P., Kvedaraite, M., Bisson, J. I., Brewin, C. R., Cloitre, M., … Kazlauskas, E. (2022). Validation of the International Trauma Interview (ITI) for the clinical assessment of ICD-11 posttraumatic stress disorder (PTSD) and complex PTSD (CPTSD) in a Lithuanian sample. European Journal of Psychotraumatology, 13(1), 2037905. https://doi.org/10.1080/20008198.2022.2037905
  • Hamon, A., & Mesbah, M. (2002). Questionnaire reliability under the Rasch model. In M. Mesbah, B. F. Cole, & M-L. T. Lee (Eds.), Statistical Methods for Quality of Life Studies (pp. 155–168). Kluwer. https://doi.org/10.1007/978-1-4757-3625-0_13
  • Heim, E., Karatzias, T., & Maercker, A. (2022). Cultural concepts of distress and complex PTSD: Future directions for research and treatment. Clinical Psychology Review, 93, 102143. https://doi.org/10.1016/j.cpr.2022.102143
  • Henkelmann, J.R., de Best, S., Deckers, C., Jensen, K., Shahab, M., Elzinga, B. & Molendijk, M. (2020). Anxiety, depression and post-traumatic stress disorder in refugees resettling in high-income countries: systematic review and meta-analysis. BJPsych Open, 6(4), e68, 1–8. https://doi.org/10.1192/bjo.2020.54
  • Holland, P. W., & Wainer, P. (2009). Differential Item Functioning. Routledge.
  • Horton, M., Marais, I., & Christensen, K. B. (2013). Dimensionality. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.) Rasch Models in Health (pp. 137–58). ISTE & Wiley.
  • Jensen, R., Laugesen, H., Skammeritz, S., Mortensen, E., & Carlsson, J. (2018). Interpreter-mediated psychotherapy with trauma-affected refugees – A retrospective cohort study. Psychiatry Research, 271, 684–692. https://doi.org/10.1016/j.psychres.2018.12.058
  • Karatzias, T., Cloitre, M., Maercker, A., Kazlauskas, E., Shevlin, M., Hyland, P., & Brewin, C. R. (2017). PTSD and complex PTSD: ICD-11 updates on concept and measurement in the UK, USA, Germany and Lithuania. European Journal of Psychotraumatology, 8(sup7), 1418103. https://doi.org/10.1080/20008198.2017.1418103
  • Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49(2), 223–245. https://doi.org/10.1007/BF02294174
  • Kreiner, S. (2003). Introduction to DIGRAM. Department of Biostatistics, University of Copenhagen.
  • Kreiner, S. (2007). Validity and objectivity: Reflections on the role and nature of Rasch models. Nordic Psychology, 59(3), 268–298. https://doi.org/10.1027/1901-2276.59.3.268
  • Kreiner, S. (2013). The Rasch model for dichotomous items. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health (pp. 5–26). ISTE & Wiley. https://doi.org/10.1002/9781118574454.ch1
  • Kreiner, S., & Christensen, K. B. (2002). Graphical Rasch models. In M. Mesbah, B. F. Cole, & M.-L. T. Lee (Eds.), Statistical Methods for Quality of Life Studies: Design, Measurements and Analysis (pp. 187–203). Springer US. https://doi.org/10.1007/978-1-4757-3625-0_15
  • Kreiner, S., & Christensen, K. B. (2004). Analysis of local dependence and multidimensionality in graphical loglinear Rasch models. Communications in Statistics – Theory and Methods, 33(6), 1239–1276. https://doi.org/10.1081/STA-120030148
  • Kreiner, S., & Christensen, K. B. (2007). Validity and objectivity in health-related scales: Analysis by graphical loglinear Rasch models. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and Mixture Distribution Rasch Models (pp. 329–346). Springer. https://doi.org/10.1007/978-0-387-49839-3_21
  • Kreiner, S., & Christensen, K. B. (2013). Person parameter estimation and measurement in Rasch models. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health (pp. 63–78). ISTE & Wiley. https://doi.org/10.1002/9781118574454.ch4
  • Kreiner, S., & Nielsen, T. (2013). Item Analysis in DIGRAM 3.04. Part I: Guided Tours. Research Report 2013/06. University of Copenhagen, Department of Public Health. https://ifsv.sund.ku.dk/biostat/annualreport/images/0/01/Research_Report_13-06-ny.pdf.
  • Lewis-Fernández, R., & Kirmayer, L. J. (2019). Cultural concepts of distress and psychiatric disorders: Understanding symptom experience and expression in context. Transcultural Psychiatry, 56(4), 786–803. https://doi.org/10.1177/1363461519861795
  • Maercker, A., Brewin, C. R., Bryant, R. A., Cloitre, M., Reed, G. M., Van Ommeren, M., & Saxena, S. (2013). Proposals for mental disorders specifically associated with stress in the International Classification of Diseases-11. The Lancet, 381(9878), 1683–1685. https://doi.org/10.1016/S0140-6736(12)62191-6
  • Marais, I. (2012). Local dependence. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health. (pp. 111–130). ISTE & Wiley. https://doi.org/10.1002/9781118574454.ch7.
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
  • McGinty, G., Fox, R., Ben-Ezra, M., Cloitre, M., Karatzias, T., Shevlin, M., & Hyland, P. (2021). Sex and age differences in ICD-11 PTSD and complex PTSD: An analysis of four general population samples. European Psychiatry, 64(1), e66. https://doi.org/10.1192/j.eurpsy.2021.2239
  • Mesbah, M., & Kreiner, S. (2012). The Rasch model for ordered polytomous items. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health (pp. 27–42). ISTE & Wiley https://doi.org/10.1002/9781118574454.ch2
  • Møller, L., Augsburger, M., Elklit, A., Søgaard, U., & Simonsen, E. (2020). Traumatic experiences, ICD-11 PTSD, ICD-11 complex PTSD, and the overlap with ICD-10 diagnoses. Acta Psychiatrica Scandinavica, 141(5), 421–431. https://doi.org/10.1111/acps.13161
  • Nickerson, A., Cloitre, M., Bryant, R. A., Schnyder, U., Morina, N., & Schick, M. (2016). The factor structure of complex posttraumatic stress disorder in traumatized refugees. European Journal of Psychotraumatology, 7(1), 33253. https://doi.org/10.3402/ejpt.v7.33253
  • Palic, S., Kappel, M. L., Nielsen, M. S., Carlsson, J., & Bech, P. (2014). Comparison of psychiatric disability on the Health of Nation Outcome Scales (HoNOS) in resettled traumatized refugee outpatients and Danish inpatients. BMC Psychiatry, 14(1), 330. https://doi.org/10.1186/s12888-014-0330-8
  • Palic, S., Zerach, G., Shevlin, M., Zeligman, Z., Elklit, A., & Solomon, Z. (2016). Evidence of complex posttraumatic stress disorder (CPTSD) across populations with prolonged trauma of varying interpersonal intensity and ages of exposure. Psychiatry Research, 246, 692–699. https://doi.org/10.1016/j.psychres.2016.10.062
  • Palm, K. M., Strong, D. R., & MacPherson, L. (2009). Evaluating symptom expression as a function of a posttraumatic stress disorder severity. Journal of Anxiety Disorders, 23(1), 27–37. https://doi.org/10.1016/j.janxdis.2008.03.012
  • Poulsen, I., Kreiner, S., & Engberg, A. W. (2018). Validation of the Early Functional Abilities scale: An assessment of four dimensions in early recovery after traumatic brain injury. Journal of Rehabilitation Medicine, 50(2), 165–172. https://doi.org/10.2340/16501977-2300. PMID: 29313872.
  • Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Danish Institute for Educational Research.
  • Rasch, G. (1961). On General Laws and the Meaning of Measurement in Psychology. The Regents of the University of California. http://projecteuclid.org/euclid.bsmsp/1200512895
  • Redican, E., Nolan, E., Hyland, P., Cloitre, M., McBride, O., Karatzias, T., Murphy, J., & Shevlin, M. (2021). A systematic literature review of factor analytic and mixture models of ICD-11 PTSD and CPTSD using the International Trauma Questionnaire. Journal of Anxiety Disorders, 79, 102381. https://doi.org/10.1016/j.janxdis.2021.102381
  • Rosenbaum, P. R. (1989). Criterion-related construct validity. Psychometrika, 54(4), 625–633. https://doi.org/10.1007/BF02296400
  • Sandahl, H., Jennum, P., Baandrup, L., Lykke Mortensen, E., & Carlsson, J. (2021). Imagery rehearsal therapy and/or mianserin in treatment of refugees diagnosed with PTSD: Results from a randomized controlled trial. Journal of Sleep Research, 30, e13276. https://doi.org/10.1111/jsr.13276
  • Shevlin, M., Hyland, P., Roberts, N. P., Bisson, J. I., Brewin, C. R., & Cloitre, M. (2018). A psychometric assessment of disturbances in self-organization symptom indicators for ICD-11 complex PTSD using the International Trauma Questionnaire. European Journal of Psychotraumatology, 9(1), 1419749. https://doi.org/10.1080/20008198.2017.1419749
  • Shevlin, M., Hyland, P., Vallières, F., Bisson, J., Makhashvili, N., Javakhishvili, J., & Roberts, B. (2018). A comparison of DSM-5 and ICD-11 PTSD prevalence, comorbidity and disability: an analysis of the Ukrainian Internally Displaced Person’s Mental Health Survey. Acta Psychiatrica Scandinavica, 137(2), 138–147. https://doi.org/10.1111/acps.12840
  • Tay, A. K., Rees, S., Tam, N., Kareth, M., & Silove, D. (2019). Defining a combined constellation of complicated bereavement and PTSD and the psychosocial correlates associated with the pattern amongst refugees from West Papua. Psychological Medicine, 49(9), 1481–1489. https://doi.org/10.1017/S0033291718002027
  • UNHCR. (2022). https://www.unhcr.org/figures-at-a-glance.html.
  • Vallières, F., Ceannt, R., Daccache, F., Daher, R. A., Sleiman, J. B., Gilmore, B., Byrne, S., Shevlin, M., Murphy, J., & Hyland, P. (2018). ICD-11 PTSD and complex PTSD amongst Syrian refugees in Lebanon: the factor structure and the clinical utility of the International Trauma Questionnaire. Acta Psychiatrica Scandinavica, 138(6), 547–557. https://doi.org/10.1111/acps.12973
  • Vang, M. L., Brødsgaard Nielsen, S., Auning-Hansen, M., & Elklit, A. (2020). Testing the validity of ICD-11 PTSD and CPTSD among refugees in treatment using latent class analysis. Torture Journal, 29(3), 27–45. https://doi.org/10.7146/torture.v29i3.115367
  • Vang, M. L., Dokkedahl, S. B., Løkkegaard, S. S., Jakobsen, A. V., Møller, L., Auning-Hansen, M. A., & Elklit, A. (2021). Validation of ICD-11 PTSD and DSO using the International Trauma Questionnaire in five clinical samples recruited in Denmark. European Journal of Psychotraumatology, 12(1), 1894806. https://doi.org/10.1080/20008198.2021.1894806
  • Vindbjerg, E., Carlsson, J., Mortensen, E. L., Makransky, G., & Nielsen, T. (2020). A Rasch-based validity study of the Harvard Trauma Questionnaire. Journal of Affective Disorders, 277, 697–705. https://doi.org/10.1016/j.jad.2020.08.071
  • WHO. (2018). https://icd.who.int/en.