611
Views
2
CrossRef citations to date
0
Altmetric
Research Articles

The Bivalent Shape Task in a Dutch primary school population: A pilot study for a first psychometric assessment

ORCID Icon, ORCID Icon, ORCID Icon, & ORCID Icon

Abstract

Objective

The Bivalent Shape Task (BST) tests the ability to suppress interfering information. The purpose of this study was to assess some psychometric properties of the BST in 5–11-year-old children, using multilevel analysis.

Methods

The present study was initiated in a Dutch primary school in October 2019. The BST was administered as part of a larger neuropsychological assessment. The outbreak of Covid-19 and the subsequential lockdown in the Netherlands led to a premature termination of the study in March 2020. Data of 38 children were available. This dataset was analyzed and labeled as pilot.

Results

Significant main effects of age, time components, levels, correct answer, and several interactions were found on the reaction time in the predicted direction. Random effects could also be modeled. A final statistical combination model is described.

Conclusion

Despite the small study sample, it seems to be justified to conclude that the BST is a potentially valuable instrument to test interference suppression in 5–11-year-old children. In the analysis of the BST, multilevel analysis has proven to be very rewarding. Since the present study only examined a small part of reliability and validity aspects, further psychometric research is desired.

Introduction

Neuropsychological tests are applied internationally for the assessment of cognitive domains and are frequently used in either screening or diagnostics. The majority of these tests is designed to investigate one specific neuropsychological domain, such as concentration, attention, or memory function. The assessment of these domains is not only applied in clinical practice, but also in research. However, the selection, construction and application of neuropsychological tests is not without its challenges. Neuropsychology has been criticized on several aspects. A lack of or an inadequate incorporation of psychometric advances in test use and construction is one of these critical comments (Cicchetti, Citation1994). Psychometric properties should describe at least reliability and validity, with an addition of sensitivity and specificity in the more clinical settings (C. Reynolds & Mason, Citation2009). In addition, the extent and nature of normative data for many measures used in the neuropsychological testing of children is often very limited (C. R. Reynolds, Citation1986).

Despite these challenges, a considerable number of neuropsychological tests is available, also specifically for children. The development of neuropsychological functioning in children is a sensitive and vulnerable process, with the childhood period of 7–11years seemingly being a critical period for cognitive development (Richardson et al., Citation2018). Therefore, it is important to detect potential impaired neuropsychological functioning, preferably in an early stage. This detection could be done by the use of neuropsychological tests. These tests often include a documented validation of the psychometric properties. This documented quality comes with a price: these tests are in general expensive and often do not provide the user with options to customize the tests. Over the years, new initiatives in the field of software development have come up with free neuropsychological software databases, containing a great variety of cognitive tests. These tests can, most of the time, be freely modified by the user by providing the opportunity to add, delete or change parameters (e.g., color of stimuli). This way, researchers can adapt the test to their liking, in order to provide a clear answer to their research question. A downside of such free databases is that they often lack documentation of its psychometric properties.

One of these free neuropsychological software databases is Psychology Experiment Building Language (PEBL) (Mueller & Piper, Citation2014). This database includes the Bivalent Shape Task (BST), which is inspired by the Stroop task and is designed by S. T. Mueller and A. Esposito (Mueller & Esposito, Citation2014). The BST is used for testing of cognitive interference and the ability to suppress interference. As explained by Esposito and colleagues, interference suppression “requires the participant to ignore salient perceptual information in a bivalent task while attending to the less salient conflicting information” (Esposito et al., Citation2013). The BST tests this ability by presenting a shape at the center of the screen, which is a circle or a square, with circles always being responded to with the left response, and squares always being responded to with the right response (by using the keyboard) (Mueller & Esposito, Citation2014). This response cues are visualized with a red circle on the left and a blue circle on the right side of the screen. The participant is asked to only focus on matching the geometric shape of the stimulus to the geometric shape of the response. To add the element of interference, color is added to the stimuli, which interferes with solely focusing on the geometric shape of the stimulus. The color of the stimulus is always irrelevant to make the right decision and should not even be included in the decision making.

The developers of the task propose the following arguments for the use and applicability of the BST: “the task is similar at a high level to a number of traditional attentional interference tests (e.g., Dimensional Change Card Sort, the color-word Stroop test, the Simon interference task, and Eriksen’s flanker task), but introduces new aspects that make it appropriate for testing children and adults in a nonlinguistic setting. In addition, the task can be used across the lifespan in developmental studies.” (Mueller & Esposito, Citation2014). The task is appropriate for testing in a nonlinguistic setting since it leaves out the required ability to be able to read. The BST can therefore, in contrary to the Stroop test, be used in the testing of young primary school children who are not able to read yet. Something else that distinguishes the BST from for example the Stroop task, is the addition of a “mixed” level designed by the PEBL developers. In this level, a combination of 10 neutral, 10 congruent, and 10 incongruent stimuli is presented to the participant in a random order. The mixed level is presented at the end of the task and can be seen as a form of replication of the previous three levels, with the incongruent level being of most interest. Despite the fact that the BST launched by PEBL has already been used directly as a research instrument in several studies (Czapka et al., Citation2019; Esposito et al., Citation2013), an extensive documentation of its psychometric properties could not be found. Since a psychometric validation is an essential item in the use of neuropsychological tasks, it is assumed that these studies have the implicit assumption that the psychometric quality of the test is adequate and has been sufficiently investigated (Sherman et al., Citation2011). Nonetheless, the developers state that “future behavioral research must be done to establish the task’s validity in these and other contexts” (Mueller & Esposito, Citation2014). A closer look into its psychometric qualities is therefore justified and formed the base of the present study.

Consequently, we decided to reassess the basic psychometric qualities of the BST. It was decided to perform this reassessment by the use of multilevel modeling, instead of for example ANOVA analysis. Multilevel analysis has been around for years but has only been usable for the last two decades because of the improvements in processor speed. Multilevel modeling is particularly of use in the field of repeated measures, which is often the case in neuropsychological testing (Snijders & Bosker, Citation2011), which therefore makes it superior to ANOVA techniques. ANOVA models the same fixed baseline level (intercept) for all individuals, including a fixed relation (slope) between the outcome variable and one or more predictor variables. Such a model is not realistic in practice or in a clinical setting since both the baseline level and the relationship between the outcome and a predictor variable can vary per individual. In case of the BST, for example, it is very realistic to assume that the general reaction time (RT) level, measured in milliseconds, differs between children (modeled by random intercepts) and that the learning processes differ from child to child. Likewise, it is very plausible that the relationship between the outcome variable and a certain predictor differs per individual (modeled by random slopes). More specifically, it is likely that interference is not expressed equally in every child. In addition, multilevel analysis allows for the modeling of a covariance structure of the repeated measures, whereas this is not the case in ANOVA. Various articles have shown that an autoregressive structure is suitable in the case of repeated measures (Snijders & Bosker, Citation2011). Thirdly, multilevel analysis allows the modeling of nonlinear effects over the repeated measures. Nonlinear effects are superior to linear effects because complex neuropsychological processes, such as interference suppression, are most likely nonlinear processes. In case of the BST, it is very likely that the RT can be modeled by two different time effects. Hypothetically, it is possible that both a learning effect and a slowing-down effect can be demonstrated in the RT on a neuropsychological task. The learning effect could potentially display itself as a significant inverse time effect within level, with the potential phenomenon of older children demonstrating a faster learning curve, since they are expected to get the hang of the task sooner. Furthermore, it is possible that the learning effect will be the least obvious in the incongruent level of the task since this is the most challenging level of the BST. This inverse time effect would not be possible to model with ANOVA. The slowing-down effect, possibly caused by mental fatigue or weariness, may be modeled by a linear time effect. A study conducted in students showed that fatigue, induced by the administration of cognitively demanding tasks, negatively affects cognitive flexibility, and led to a slower RT (Plukaard et al., Citation2015). Although a general slowing-down effect can be expected, it is hypothesized that this effect will be less obvious in the incongruent level. The incongruent level forms, as stated before, the biggest challenge for the participants. It is therefore expected that the subjects will be more focused during the incongruent level than the other three levels, leading to a longer attention span and a lesser increase in RT.

Next to these hypothesized time effects, additional factors that may influence the RT on the BST should be considered in the psychometric assessment of the BST. One basic factor of interest in psychometric assessment of neuropsychological tasks is age, with increasing age from early childhood to adulthood being associated with a decrease in RT (Kail, Citation1991; Śmigasiewicz et al., Citation2021) across a wide range of cognitive tasks (Ratcliff et al., Citation2012). For example, a study published in 2007, conducting a spatial Stroop task in participants aged 5 to 76, demonstrated that increases in age were associated with decreases in mean choice reaction time of correct trials until peak performance was reached (occurring in the group of 16–19-year-old participants) (Williams et al., Citation2007). Another study, with a study population of children aged five to seven, demonstrated a negative relation between age and RT on correct Go trials in a Go/No-go task (Torpey et al., Citation2012). In addition, it can be presumed that a correct answer will take the longest RT in the incongruent level, since these items are considered to be the most difficult to solve.

Next to age, gender is another basic variable of interest in the performance on neuropsychological tasks and the theme of gender differences in cognitive abilities has been addressed in both psychological and neuropsychological literature (e.g., Hyde, Citation1981; Weiss et al., Citation2003; Wirsén Meurling et al., Citation2000). Since the BST is inspired on the Stroop task, it is possible that the potential effects of gender on both tasks are comparable. The Stroop task has been investigated for possible gender effects, and a small overall advantage of females was found in several studies. However, this advantage seems to be caused by a female advantage with regard to verbal ability instead of inhibition skills. Since the BST does not test verbal skills, no gender effect was expected (Sjoberg, Citation2013).

Another factor of influence on the RT of the BST lies in the congruency effect, indicating that the incongruent level should show a higher RT than the neutral, congruent and mixed level because of the higher cognitive demand in the incongruent level caused by the presence of interfering stimuli (Mahr & Wentura, Citation2013). Lastly, a positive association between a correct answer and RT could be expected, since this phenomenon is known as the speed-accuracy tradeoff (SAT) (Duckworth et al., Citation2018). In addition, age could potentially display a mediating effect on the SAT, with increasing age being related to a lower SAT. A study in 2010 suggested that “young adults attempt to balance speed and accuracy to achieve the most correct answers per unit time, whereas older adults attempt to minimize errors even if they must respond quite slowly to do so” (Starns & Ratcliff, Citation2010). In addition, an example of an age-related speed-accuracy tradeoff was demonstrated in an interception task, favoring older age groups over younger children (Rothenberg-Cunningham & Newell, Citation2013). Next to a mediating role of age on the RT of a correct answer, it was hypothesized that the SAT would be greater in the incongruent level than in the other levels since this level is the most challenging of the task.

With regard to reliability, a very important test quality referring to the consistency in measurement, it was planned to perform analyses on three different reliability components: (1) reliability assessment within the task, (2) reliability assessment based on a follow-up test and (3) an inter-experimenter reliability assessment. However, due to the preliminary ending of the study (see Methods), the follow-up test could not be administered. In order to derive reliability estimates within the task, the mean reaction time and number of correct responses from the items of the incongruent level were compared to those from the incongruent items from the mixed level. In addition, although it can be seen as an indirect measure of reliability, we predicted a significant autoregressive component in the covariance structure of the test, symbolizing a (strong) association between the consecutive items. Next to this, the inter-experimenter reliability was examined by comparing the scores of the participants of one researcher to another, while ensuring no significant differences in age, gender, and moment of assessment (morning or afternoon) distribution between the two researchers. We expected to find no difference in outcomes when comparing the two researchers.

The aim of the study was to provide the reader a clear overview of its psychometric properties and to give a recommendation of the suitability of use of the BST in a primary school population. Also, it was tried to give a suggestion for a basic statistical model. In sum, the main purpose of this article was to explain as much as possible variance of RT with experiment-related variables and to provide a basic psychometric assessment of the BST.

Overview of the a priori hypotheses

Based upon the description of several factors potentially being of influence on the RT of the BST, 7 hypotheses were formulated, deepened by five sub-hypotheses. The main hypotheses were based upon a main effect, whereas the sub-hypotheses were based upon interaction effects.

Hypothesis 1:

Age is negatively associated with RT.

Hypothesis 2:

Gender is no significant predictor of RT.

Hypothesis 3:

A linear and an inverse time effect can be modeled as significant predictors of RT.

Hypothesis 3A:

The higher the age, the faster the initial inverse time effect.

Hypothesis 3B:

The incongruent level will show a slower inverse time effect than the other levels.

Hypothesis 3C:

The incongruent level will show a slower linear time effect than the other levels.

Hypothesis 4:

The incongruent level will show a higher RT than the other levels.

Hypothesis 5:

A correct answer is preceded by a longer RT.

Hypothesis 5A:

The higher the age, the faster a correct answer is given.

Hypothesis 5B:

A correct answer is preceded by a longer RT in the incongruent level than the other levels.

Two reliability hypotheses

Hypothesis 6: The variable “researcher” will demonstrate no statistically significant effect in the model of the multilevel analysis, indicating a sufficient inter-experimenter reliability.

Hypothesis 7: Reliability within the task will be confirmed by the comparison of the accuracy and RT of the items of the incongruent level to the incongruent items of the mixed level. Also, a significant autoregressive covariance structure is expected in the multilevel analysis.

Materials and methods

Approval for the study was obtained from the Medical Ethical Committee of Maastricht University (2019–1068).

Sample size considerations

Before starting the study, a number of at least 70 subjects was estimated to be sufficient to perform a psychometric assessment with multilevel analyses. Care was taken that each age category would contain a minimum of 10 subjects, since age was thought to be a crucial variable for modeling the reaction time. Furthermore, it should be noted that the within-subject power was relatively high, with a number of 90 repeated measures per subject.

Selection of the school

To be able to generalize the findings of the present study to other schools in The Netherlands, a representative school was selected. The included school was comparable to other schools in the Netherlands with respect to the following characteristics: average income of the parents, the level of performance on the final school exam in the final grade (AlleCijfers.nl, Citation2022) and degree of urbanization (CBS et al., Citation2020).

Selection of the participants

Children aged 5–11, studying at a primary school, were approached. All schoolteachers were instructed by the head of the school to hand out an information package to every child in their classroom, preferably at the end of a school day. The information package included a short explanation of the study and two informed consent forms: one for the child and one for the parents. The parents were asked to return the forms via their child to the schoolteacher, also if they did not want to participate. The schoolteacher could not see who consented and who did not when the children returned their packages. The teachers did not remind their students to return their packets. The response rate was 29.2% (92/315 forms were returned). In 58 cases, all documentation was in order and participants were included.

No questions about chronic diseases, psychological complaints and the intake of medication were asked. No financial or other forms of reward were offered for participation.

This pilot is part of a more elaborate study, which in total consisted of an interview and a more elaborate neuropsychological assessment.

Data collection and premature termination of the study

Data was collected from October 2019–March 2020. Of the 58 participants, 38 have been tested. This is caused by the outbreak of the Covid-19 virus, which led to a simultaneous shutdown of all primary schools in The Netherlands. Resuming the data collection after the Covid-19 crisis was not possible, because of two factors. First, the ethical approval of the study was limited to a single school year. Second, it was unsure how long it would take before testing could have been resumed, which would have led to children getting older and not being comparable anymore. The bottom line is that the abrupt discontinuation of the study led to a sample-size of 38 children, of which a complete dataset was retrieved. We were aware of the fact that the small study sample would lead to this study being classified as a pilot.

Materials

The BST for testing of cognitive interference and the ability to suppress interference was developed using Psychology Experiment Building Language. The software of the BST was installed on an ASUS laptop (Windows 10 software) with a 13.3-inch color touch screen. The laptop was on charge in order to prevent the screen from turning to energy saving mode. Sound was disabled.

An extensive description of the task with visual examples of the levels has been provided by an earlier published article (Mueller & Esposito, Citation2014). A summary of the basic procedure will be presented here. On the screen, there are two geometric shapes: a red circle on the left and a blue square on the right. Stimuli appear above the geometric shapes in the center of the screen. The participants were asked to pay attention to the shape of the stimulus and select the one of the two geometric shapes which matched the stimulus in shape by means of touching the correct answer option as fast and as accurately as possible. The task began with an exercise round, which existed of 6 exercises with the 6 different stimuli. The task consists of 4 levels, containing different sort of stimuli: neutral stimuli (n=20), which were empty circles and squares; congruent stimuli (n=20), which matched one of the geometric shapes in color and shape; incongruent stimuli (n=20), which matched in shape to one geometric shape, but in color to the other and mixed stimuli (n=30), which was a mix of neutral, congruent, and incongruent stimuli in a random order.

Between levels there was possibility for a short break. All tests were performed within the period of October 2019–March 2020.

Procedure

The administration of the task took place at school in a quiet, stimulus deprived room of approximately three by six meters. The researchers made sure that there were no loud noises during the procedure. Three timeslots were available for testing: the early morning, the late morning, and the early afternoon. The 38 children were tested alternately and, in this way, randomly distributed between the researchers. Beforehand, the researchers were trained in order to standardize the test procedure and received a very detailed standardized test protocol with step-to-step instructions of the procedure. The aim was to create a situation in which the child tried to achieve its potential. The researchers allowed the children to take a short break when necessary and the researchers made sure the children felt at ease. Both researchers were instructed to pay attention to the body language and mimics of the child during the task. If the child seemed nervous or anxious, a glass of water was offered.

In order to start the procedure, the researcher retrieved the child from the classroom. The researcher was seated diagonally opposite the child and the laptop was placed on a table in front of the subject. Apart from the laptop, no other items were positioned on the table. The subjects were seated in a comfortable and upright position. After a short introduction and some small talk of the researcher, the child was verbally checked for its consent. The administration of the BST was part of a more comprehensive study, including a neuropsychological test battery as well as several health-related assessments. Immediately prior to the BST, a short interview and a “time estimation task” were administered. Tests were given in a fixed order.

In order to start the administration of the BST, the child was verbally instructed in Dutch. An example exercise of the BST was used in order to visually support the instruction. The exact verbal instruction was: “We are now going to start with the task. This task exists of four rounds, and we will start with some practice exercises. You can see that there are two symbols on the screen: a circle and a square. The circle is red, and the square is blue. In every exercise, a new symbol will appear above the two symbols that you already know. The goal of the exercise is to match the symbol that appeared with one of the two symbols on the screen. To match them, it is important that you look at the shape of the symbol, and not the color. Also, it is important that you tap the answer on the screen as fast and as accurately as you can. Do you have any questions? If not, are you ready to start?.” Once the child answered affirmative, the task began. The children were also told that it was no exam and that there would be no consequences with regard to their performance or score. Completion of the BST took at maximum 10minutes. Once the task was completed, the answers were automatically saved, and the child was escorted back to their classroom by the researcher. No feedback on their performance was provided.

Data processing

All data were completely anonymized. Each child received a unique subject number, which was used as the subject key in the datafiles. Data was saved on the hard disk of the laptop by the Pebl-software as raw data in formatted excel files (.csv files). A back-up of all raw datafiles was made immediately after every test session. Raw datafiles were imported in a licensed statistical software package (SPSS version 26). The files of all subjects were added in a merged SPSS datafile. The following subject variables were added to this datafile: gender, group, age, a subject counting variable (range 1–90) and a counter within level variable (range 1–20 for levels 1–3 and range 1–30 for level 4). Since a priori inverse time effects within level were anticipated, we also added an inverse time component (1/counter within level). Every individual datafile consisted of 90 records, leading to a total count of 3420 records in the merged datafile. In order to create comparable counters within level, it was decided to not include the last 10 items from the mixed level in the analysis. During the analysis, the researchers were blinded.

Statistical analysis

Descriptive analyses were carried out with frequency analysis, Chi-square tables and T-tests.

Multilevel analysis has been used since the data were nested. A 2-level structure was applied in which consecutive stimuli were nested within the subject. The dependent variable was in all cases the RT, as RT is one of the non-introspective measures for the working of the mind (Heitz, Citation2014). Before carrying out the multilevel analysis, the dependent variable was tested for normality. Because of a slightly skewed variable in combination with an unacceptable high kurtosis, the RT was log-transformed. This log-transformed variable proved to be distributed normally (skewness=0.446 and kurtosis=0.559).

In all multilevel analyses, an attempt was made not only to model the fixed effects, but also to include random effects: a random intercept as well as random slopes. As explained in the introduction, different children may express different performances and learning curves. Therefore, the expectation of random effects is justified.

All analyses were performed with SPSS version 26. A p-value of ≤0.05 was considered statistically significant.

Description of the combination model

The combination model was computed to test all hypotheses for independence and consisted of the following variables: age, gender, a linear counter within level, an inverse counter within level, the four level dummies (with the incongruent level as the reference category), correct answer and researcher. The following random effects were modeled: intercept, age, linear counter within level and inverse counter within level.

We were unsure whether to include age or school grade in the model because they were highly correlated (correlation = .95). Since age is better comparable than grade worldwide because of different school systems, we decided to include age as the predictor variable. The variable school grade was only used for descriptive purposes. In addition, postal code was recorded for descriptive purposes as well.

Results

As described in the Method section, the response rate was low (29.2%). Before carrying out further analyses, we analyzed whether the responders significantly differed from the non-responders with respect to the following variables: age, gender, and postal code. With respect to all three variables, almost identical figures were found for the two groups. Based upon these results, we decided to perform the planned analyses.

All tasks were administered in an ambient and relaxed atmosphere. No protocol deviations and/or incidents took place.

Outlier detection

Outliers were detected with the “Outlier Detection Method.” This means that BST trials with a mean RT greater or smaller than 3 standard deviations around the mean were considered outliers and were not used for the analysis. This resulted in the deletion of RTs ≤ 355 milliseconds and ≥ 2818 milliseconds. This is in reasonable agreement with the cutoff point of 300 milliseconds used in another study (Esposito et al., Citation2013).

The deletion of the last 10 items from the mixed level (level 4) as described in Methods led to a total test trial count of 3040 trials (deletion of 380/3420=11.1% data). The RT filter then led to a deletion of 1.12% of the data. The final analyses regarding hypotheses 1–11 were performed on 3006 trials.

Sample description and task completion

Descriptive characteristics of the children who participated are provided in . All 38 participants completed the task. A summary of some basic results is given in . The number of wrong answers in the neutral, congruent, incongruent and mixed level were 21 (2.8%), 17 (2.3%), 50 (6.7%) and 29 (3.8%) respectively. These differences were statistically highly significant (all p’s < .001 assessed with a generalized linear mixed model analysis). Contrasting the incongruent level with the neutral, congruent and mixed level, yielded OR’s of 0.386, 0.295 and 0.532 respectively.

Table 1. Descriptive characteristics of study sample (n=38).

Table 2. Summary of basic results on the BST.

Table 3. Effects of the main effects as predicted by the a priori hypotheses on the RT of the BST.

Testing the a priori main hypotheses ()

Hypothesis 1:

Age was negatively related to the RT (T=−7.46, p < .001), validating hypothesis 1.

Hypothesis 2:

No effect of gender on response was observed, which confirmed hypothesis 2.

Hypothesis 3:

The linear time effect, which might represent a slowing-down at the end of each level, was significant (T = 2.08, p = .045). The inverse time effect, mimicking a fast-learning effect as defined by the inverse counter within level, was found to be significant as well (T = 5.57, p = .000), validating hypothesis 3.

Hypothesis 4:

To test this hypothesis, the RTs of the neutral, congruent, and mixed level were compared to the RT of the incongruent level. These RTs were significantly lower than the RT of the incongruent level (all p’s < .001). This finding confirmed hypothesis 4.

Hypothesis 5:

A correct answer shows a positive relation with the RT (T = 9.99, p < .001). The response when a correct answer is given increases on average with 1.2 milliseconds, thus confirming hypothesis 5.

Hypothesis 6:

Researcher. No main effect of the researcher was observed, which validated hypothesis 6.

Testing the a priori hypotheses concerning interaction effects

To test the five a priori hypotheses with regard to interactions for independence, the interactions were added to the combination model as described in the previous paragraph, which consisted of all expected main effects. The two non-significant main effects “researcher” and “gender” could be removed, but it was decided to leave them in the interaction model. Three out of five interactions were found to be independent, and removal of the two non-significant interactions did not change the effect of the other three ().

Table 4. Effects of the interactions as predicted by the a priori sub-hypotheses on the RT of the BST.

Hypothesis 3A: Inverse time effect within level*age

A negative relation between the inverse time effect within level*age interaction was expected: the older the child, the faster the leaning effect. A negative relation between age and the inverse time component was found (T = −2.38, p = .023). For every year of age, the inverse time component decreases with a mean RT of 1 millisecond. Hypothesis 3A was therefore confirmed.

Hypothesis 3B: Inverse time effect within level*incongruent level

The hypothesis that a more difficult and challenging level (incongruent level) would show a lesser learning curve than an easier level (neutral, congruent, and mixed), could not be confirmed (all p’s > .314), rejecting hypothesis 3B.

Hypothesis 3C: Linear time effect within level*incongruent level

This hypothesis stated that the incongruent level would show a lesser slowing-down effect than the neutral, congruent, and mixed level, which was indeed found: the interaction was positive for all levels (p’s .002–.0015), thereby validating hypothesis 3C.

Hypothesis 5A: Correct answer*age

This interaction is proven to be significant and shows a negative relation (T = −2.33, p = .020), validating hypothesis 5A.

Hypothesis 5B: Correct answer*incongruent level

The expectation that a more difficult and challenging level (incongruent level) would show a longer RT when a correct answer is given than an easier level (neutral, congruent, and mixed), could not be confirmed (all p’s > .059), rejecting hypothesis 5B.

Plots of the RT were made with the age effects, time effects, level effects and interaction between those variables, as described above, taken into account. shows the plot for 5-year-old children, for 11-year-old children.

Figure 1. Interaction model for 5-year-olds.

Figure 1. Interaction model for 5-year-olds.

Figure 2. Interaction model for 11-year-olds.

Figure 2. Interaction model for 11-year-olds.

Reliability

Inter-experimenter reliability was examined by comparing the scores of all participants between two independent researchers. With respect to the scores, moment of assessment, gender and age, there were no significant differences between the two researchers (all p’s > 0.327), validating hypothesis 6. The autoregressive component of the interaction model was highly significant (Wald Z = 36.01, p < .0001). The mean RT of the incongruent level was 1143 milliseconds. The RT of the incongruent items of the mixed level was 1,119 milliseconds. Multilevel analysis showed that this difference of 24 milliseconds was not significant (T = .945, p = .341). With regard to the number of correct responses, 93.3% of the items was answered correctly in the incongruent level, whereas 91.0% of the incongruent items was answered correctly in the mixed level. This difference, analyzed with a multilevel model for a binary dependent variable, was also not statistically significant (T = .632, p = .527). These findings confirmed hypothesis 7.

Modeling of random effects

The modeling of a random intercept was very significant, as well as the modeling of all three random slopes. Random effects were therefore of value in this analysis. The linear and inverse time effect indicate that the shape of the fast-initial learning and slowing-down curves differ between the children. Age also created a significant random slope, which shows that not only learning effects differ inter-individually, but also children of the same age show different slopes.

Most parsimonious model after testing for independence of hypotheses 1–6

Since gender and researcher as main effects and the two interactions inverse time effect within level*levels and correct answer*levels interaction were not significant in the final model, they were not included in the most parsimonious model. The most parsimonious model then consists of the following main effects: the three level dummies, the linear time effect within level, the inverse time effect within level, correct answer, and age. In addition, the following interactions should be added: inverse time effect within level*age, linear time effect within level*levels and correct answer*age. As random effects, the intercept, linear time effect within level, inverse time effect within level and age should be modeled.

Post-hoc analysis

Post-hoc, it was believed to be of interest to see whether the stimuli were comparable. Six different stimuli were used in the task and to investigate a possible effect of stimulus, levels one to three were analyzed individually. No influence of the sort of stimulus on the RT was found. Therefore, it can be stated that the stimuli in itself do not have an effect on the speed of reaction.

Discussion

The main goal of this study was to psychometric assess the Bivalent Shape Task in a way where the nested aspect of the stimuli was considered. The present study demonstrated that age, a learning effect, a slowing-down effect, a level effect, and a correct answer as well as several interactions, were significant predictors of the RT of the BST. Furthermore, the reliability seemed sufficient.

The negative relation between increasing age and RT found in the present study is in line with the findings of previous studies and literature in which is stated that RT decreases from early childhood to adulthood (e.g., Bucsuházy & Semela, Citation2017; Ratcliff et al., Citation2012; Śmigasiewicz et al., Citation2021). No effect of gender was found, which indicates that the performance on the BST is not significantly influenced by the gender of the subject.

A potential learning effect—sometimes also referred to as a practice effect—has been the object of interest in several studies, e.g., (Bartels et al., Citation2010; Oliveira et al., Citation2014), but has been more frequently studied as an effect between completely repeated tasks instead of between repeated exercises in the task itself, as is the case in the present study. A learning effect within the task could be demonstrated in the present study, which could be visualized as a non-linear (inverse) curve. This learning effect was significantly mediated by age, with increasing age being related to a faster learning curve and a lower base model RT than younger children. A learning effect between completely repeated tasks can potentially be (partly) explained by increased familiarity with the testing environment, procedural learning (Oliveira et al., Citation2014) and increased self-esteem of the subject. A possible explanation for the learning effect mediated by age, found in the present study, may lie in the fact that older children get to know the hang of the task sooner and are therefore sooner capable of accelerating in their RT than younger children.

Interestingly, as shows, there was also a slowly increasing linear component: the late slowing-down effect. Although not definitely sure, this can be interpreted as a “fatigue” or “weariness” effect. Furthermore, we showed that the incongruent level is the only level that does not show a “late slowing-down effect.” A hypothesized reason for this is the fact that the incongruent level is where the interference takes place, and the participants face the biggest challenge of the task. This can lead to a longer attention span, possibly in combination with the motivation of the child to perform well, and therefore a more constant RT throughout the level and a reduced attention loss. Next to the absence of a slowing-down effect in the incongruent level, the incongruent level displayed the highest RT of all levels, whereas for example the neutral and the congruent level were comparable to one another. The phenomenon of the incongruent level leading to a higher RT than the congruent level, a response delay, is known as the congruency effect. The results of our study are consistent with other studies (Cothran & Larsen, Citation2008; Mahr & Wentura, Citation2013; Yuan et al., Citation2013).

Figure 3. Basic model with inverse and linear time effects.

Figure 3. Basic model with inverse and linear time effects.

Our study showed that a correct answer is preceded by a longer RT than a wrong answer. A possible explanation for this is the speed-accuracy tradeoff, which describes the phenomenon of slowly made decisions leading to a high accuracy and quick decisions leading to a high error rate (Duckworth et al., Citation2018). Consensus on the theoretical background of the speed-accuracy tradeoff has been reached and states that people make choices based on a sequential analysis of sensory evidence. Faster responses entail less accumulated evidence, and hence less informed decisions (Heitz, Citation2014). Slower RTs gives the participant the chance to collect sensory evidence, which leads to a more informed decision and therefore a greater chance of a correct answer. In addition to this main effect, the correct answer*age interaction showed a negative estimate, which indicates that every year of age leads to a decrease of the time needed to give a correct answer. A possible explanation for this may lie in the reasoning that younger children attempt to minimize the number of errors, even if that means that their RT is slower, whereas older children attempt to achieve the most correct answers per unit time by balancing their RT and accuracy. Since the present study demonstrated that each year of age leads to a decrease in the SAT, it is proposed that this finding may represent a process that develops and becomes clearer over the years.

With regard to reliability, the present study demonstrated that the inter-experimenter reliability is sufficient and that the items of the incongruent level were highly comparable to the incongruent items of the mixed level, which can be seen as a form of reliability. Furthermore, the highly significant autoregressive component of the covariance structure underlined the reliability between consecutive items of the task.

Putting the findings of the present study in meta-perspective, three issues are worthwhile to be mentioned. Firstly, this study shows that multilevel analysis is very rewarding in neuropsychological research, especially in tasks in which repeated measures are offered to the subject: both significant random slopes and intercepts were demonstrated. Secondly, the present study demonstrated clear (non-linear) time effects on the RT of the BST that may represent or symbolize psychological processes such as weariness, fatigue, boredom, or learning. It is important to include these time effects in the analyses of neuropsychological tasks since they may significantly influence the outcome variable of interest. Thirdly and lastly, the present study demonstrated that several components of the validity and reliability of the BST, a free and modifiable interference task, seem to be of sufficient quality for the BST to be considered a valuable instrument in children aged 5 to 11.

Limitations

The present study is not without limitations. First, there were some issues concerning sampling and inclusion. The response rate of this study was 29.2%, which may be indicative for a selection bias. In addition, 58 of the 92 responses contained all needed informed consents. However, a comparison of the characteristics of the responders and the non-responders, as described under Results, showed no significant differences in age, gender, and postal code. Finally, because of the premature termination of the study due to the Covid-19 virus, only 38 children have been tested. This undoubtedly had an unwanted consequence for the statistical power of the study, particularly on the between-subjects power. However, the within-subjects power, was considered to be relatively high. All in all, most of the findings of the statistical model pointed in the expected direction and several psychometric qualities could be demonstrated, despite not having reached the desired sample size. Unfortunately, not all psychometric properties could be examined. For example, convergent validity, ecological validity, and discriminant validity could not be determined. Convergent validity could have been examined by the administration of another interference task, for example the Stroop task, but it was decided to keep the burden for the subject to a minimum and therefore no second task was administered.

Second, it might be challenging to compare the results of this study with children “before Covid-19” with later studies with children “after Covid-19.” The Covid-19 crisis may be of influence in the way children spent their time and therefore may be of influence on their development and performance.

Third, the test has been administered by two researchers. One could argue that the results may differ from researcher to researcher. However, addition of the variable of researcher in the regression models did not change the results and both researchers followed the same strict step-to-step protocol. The possible effects of the researchers are therefore thought to be very minimal. A benefit of two different researchers conducting the protocol was the possibility to make a statement with regard to the inter-experimenter reliability.

Furthermore, one can imagine that other factors can be of influence in how children perform. For example, whether a child has had a good night of sleep and the level of intelligence are factors that can lead to a different outcome than expected. In particular with children, a lot of factors can increase or decrease their performance. Since literature states that intelligence, measured as IQ score, may be of influence on and correlates highly with the performance on the Stroop task (Imbrosciano & Berlach, Citation2005), this can be an interesting addition in the (psychometric) analysis of the BST in future research.

Next to these factors, as stated in the Method section, the present study did not exclude children who have for example ADHD, who take medication, or who suffer from a neuromuscular disorder. It is theoretically possible that part of the variability in the test performance could be explained by one of more of these variable(s). However, it is very unlikely that the main effects, which should be independent of the above variables, would disappear after correction for these variables. Furthermore, significant random intercepts and slopes were found, which already take interindividual differences into account. In addition, color blindness and visual acuity were not medically assessed before administration of the BST. However, a practice round was performed, to see whether the child understood the assignment. In case a child would have been color blind or had a decreased visual acuity, the researcher would have noticed this immediately and testing would not have been continued.

In conclusion, more research needs to be done in order to explain more of the variance of the task and to be able to psychometric assess other aspects of validity. When interpreting the results of this study, one must be aware of the fact that there are a lot of factors which may be of influence, that may have been underexposed due to this study being a pilot.

Conclusion

In conclusion, this article contributes to the documentation of the psychometric properties of the BST. It is capable of showing the result of the confrontation of the brain with interference and its reliability and several components of its reliability seem to be of good quality, which makes the task a valuable instrument to use in order to investigate the ability to suppress interfering information in primary school children of ages 5–11. However, more research needs to be done in order to explain more variance and to describe more factors that are of influence in a pediatric population.

Ethics statement

The study was approved by the medical ethics committee of Maastricht University (2019–1068).

Consent form

Written informed consent was obtained for all participants and their parents.

Acknowledgements

The authors would like to thank primary school “De Wereldster” for their cooperation and enthusiasm.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • AlleCijfers.nl. (2022). Basisschool De Wereldster in Meerssen: Héél véél informatie uit officiële onderwijs bronnen over de leerlingen, het personeel en de resultaten van de school. https://allecijfers.nl/basisschool/de-wereldster-meerssen/
  • Bartels, C., Wegrzyn, M., Wiedl, A., Ackermann, V., & Ehrenreich, H. (2010). Practice effects in healthy adults: A longitudinal study on frequent repetitive cognitive testing. BMC Neuroscience, 11(1), 118. https://doi.org/10.1186/1471-2202-11-118
  • Bucsuházy, K., & Semela, M. (2017). Case study: Reaction time of children according to age. Procedia Engineering, 187, 408–413. https://doi.org/10.1016/j.proeng.2017.04.393
  • CBS, PBL, RIVM, & WUR. (2020). Bevolkingsgroei, 2015-2020 (indicator 2102, versie 07, 20 oktober 2020). https://www.clo.nl/indicatoren/nl2102-bevolkingsgroei-nederland-
  • Cicchetti, D. V. (1994). Multiple comparison methods: Establishing guidelines for their valid application in neuropsychological research. Journal of Clinical and Experimental Neuropsychology, 16(1), 155–161. https://doi.org/10.1080/01688639408402625
  • Cothran, D. L., & Larsen, R. (2008). Comparison of inhibition in two timed reaction tasks: The color and emotion Stroop tasks. The Journal of Psychology, 142(4), 373–385. https://doi.org/10.3200/JRLP.142.4.373-385
  • Czapka, S., Klassert, A., & Festman, J. (2019). Executive functions and language: Their differential influence on mono- vs. multilingual spelling in primary school. Frontiers in Psychology, 10, 97–97. https://doi.org/10.3389/fpsyg.2019.00097
  • Duckworth, R. A., Potticary, A. L., & Badyaev, A. V. (2018). Chapter one – On the origins of adaptive behavioral complexity: developmental channeling of structural trade-offs. In M. Naguib, L. Barrett, S. D. Healy, J. Podos, L. W. Simmons, & M. Zuk (Eds.), Advances in the study of behavior (Vol. 50, pp. 1–36). Academic Press.
  • Esposito, A. G., Baker-Ward, L., & Mueller, S. (2013). Interference suppression vs. response inhibition: An explanation for the absence of a bilingual advantage in preschoolers’ Stroop task performance. Cognitive Development, 28(4), 354–363. https://doi.org/10.1016/j.cogdev.2013.09.002
  • Heitz, R. P. (2014). The speed-accuracy tradeoff: History, physiology, methodology, and behavior. Frontiers in Neuroscience, 8, 150. https://doi.org/10.3389/fnins.2014.00150
  • Hyde, J. S. (1981). How large are cognitive gender differences? A meta-analysis using! w? and d. American Psychologist, 36(8), 892–901. https://doi.org/10.1037//0003-066X.36.8.892
  • Imbrosciano, A., & Berlach, R. G. (2005). The stroop test and its relationship to academic performance and general behaviour of young students. Teacher Development, 9(1), 131–144. https://doi.org/10.1080/13664530500200234
  • Kail, R. (1991). Developmental change in speed of processing during childhood and adolescence. Psychological Bulletin, 109(3), 490–501. https://doi.org/10.1037/0033-2909.109.3.490
  • Mahr, A., & Wentura, D. (2013). Time-compressed spoken word primes crossmodally enhance processing of semantically congruent visual targets. Attention, Perception & Psychophysics, 76, 575–590. https://doi.org/10.3758/s13414-013-0569-z
  • Mueller, S. T., & Esposito, A. G. (2014). Computerized testing software for assessing interference suppression in children and adults: The Bivalent Shape Task (BST). Journal of Open Research Software, 2(1), e3. https://doi.org/10.5334/jors.ak
  • Mueller, S. T., & Piper, B. J. (2014). The Psychology Experiment Building Language (PEBL) and PEBL Test Battery. Journal of Neuroscience Methods, 222, 250–259. https://doi.org/10.1016/j.jneumeth.2013.10.024
  • Oliveira, R. S., Trezza, B. M., Busse, A. L., & Jacob-Filho, W. (2014). Learning effect of computerized cognitive tests in older adults. Einstein (Sao Paulo, Brazil), 12(2), 149–153. https://doi.org/10.1590/s1679-45082014ao2954
  • Plukaard, S., Huizinga, M., Krabbendam, L., & Jolles, J. (2015). Cognitive flexibility in healthy students is affected by fatigue: An experimental study. Learning and Individual Differences, 38, 18–25. https://doi.org/10.1016/j.lindif.2015.01.003
  • Ratcliff, R., Love, J., Thompson, C. A., & Opfer, J. E. (2012). Children are not like older adults: A diffusion model analysis of developmental changes in speeded responses. Child Development, 83(1), 367–381.
  • Reynolds, C. R. (1986). Clinical acumen but psychometric naivete in neuropsychological assessment of educational disorders. Archives of Clinical Neuropsychology, 1(2), 121–137. https://doi.org/10.1016/0887-6177(86)90012-0
  • Reynolds, C., & Mason, B. (2009). Measurement and statistical problems in neuropsychological assessment of children. In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Handbook of clinical child neuropsychology (pp. 203–230). Springer.
  • Richardson, C., Anderson, M., Reid, C. L., & Fox, A. M. (2018). Development of inhibition and switching: A longitudinal study of the maturation of interference suppression and reversal processes during childhood. Developmental Cognitive Neuroscience, 34, 92–100.
  • Rothenberg-Cunningham, A., & Newell, K. M. (2013). Children’s age-related speed–accuracy strategies in intercepting moving targets in two dimensions. Research Quarterly for Exercise and Sport, 84(1), 79–87. https://doi.org/10.1080/02701367.2013.762307
  • Sherman, E., Brooks, B., Iverson, G., Slick, D., & Strauss, E. (2011). Reliability and validity in neuropsychology. In M. R. Schoenberg & J. G. Scott (Eds.), The little black book of neuropsychology. A syndrome-based approach. Springer Science + Business Media, LLC. https://doi.org/10.1007/978-0-387-76978-3_30
  • Sjoberg, E. (2013). Gender differences in cognitive inhibition: Results from a meta-analysis, a negative priming Stroop task, and a stop-signal task. https://www.researchgate.net/profile/Espen-Sjoberg/publication/286869652_Gender_differences_in_cognitive_inhibition_Results_from_a_meta-analysis_a_negative_priming_Stroop_task_and_a_stop-signal_task/links/566ea04708ae430ab5002e62/Gender-differences-in-cognitive-inhibition-Results-from-a-meta-analysis-a-negative-priming-Stroop-task-and-a-stop-signal-task.pdf
  • Śmigasiewicz, K., Servant, M., Ambrosi, S., Blaye, A., & Burle, B. (2021). Speeding-up while growing-up: Synchronous functional development of motor and non-motor processes across childhood and adolescence. PloS One, 16(9), e0255892. https://doi.org/10.1371/journal.pone.0255892
  • Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. SAGE Publications.
  • Starns, J., & Ratcliff, R. (2010). The effects of aging on the speed-accuracy compromise: Boundary optimality in the diffusion model. Psychology and Aging, 25(2), 377–390. https://doi.org/10.1037/a0018022
  • Torpey, D. C., Hajcak, G., Kim, J., Kujawa, A., & Klein, D. N. (2012). Electrocortical and behavioral measures of response monitoring in young children during a Go/No-Go task. Developmental Psychobiology, 54(2), 139–150. https://doi.org/10.1002/dev.20590
  • Weiss, E. M., Kemmler, G., Deisenhammer, E. A., Fleischhacker, W. W., & Delazer, M. (2003). Sex differences in cognitive functions. Personality and Individual Differences, 35(4), 863–875. https://doi.org/10.1016/S0191-8869(02)00288-X
  • Williams, B. R., Strauss, E. H., Hultsch, D. F., & Hunter, M. A. (2007). Reaction time inconsistency in a spatial Stroop task: Age-related differences through childhood and adulthood. Neuropsychology, Development, and Cognition. Section B, Aging, Neuropsychology and Cognition, 14(4), 417–439. https://doi.org/10.1080/13825580600584590
  • Wirsén Meurling, A., Tonning-Olsson, I., & Levander, S. (2000). Sex differences in strategy and performance on computerized neuropsychological tests as related to gender identity and age at puberty. Scandinavian Journal of Psychology, 41(2), 81–90. https://doi.org/10.1111/1467-9450.00175
  • Yuan, K., Cheng, P., Dong, T., Bi, Y., Xing, L., Yu, D., Zhao, L., Dong, M., von Deneen, K. M., Liu, Y., Qin, W., & Tian, J. (2013). Cortical thickness abnormalities in late adolescence with online gaming addiction. PLoS One, 8(1), e53055. https://doi.org/10.1371/journal.pone.0053055