Full article: High Stakes Assessments in Primary Schools and Teachers’ Anxiety About Work

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

High-stakes assessments are a common feature of many education systems. One argument often made against their use, however, is that they have a negative impact on wellbeing across the education sector, including teachers. We present new evidence on this matter by examining how the Statutory Assessment Tests (SATs) conducted in England’s primary schools are linked to how anxious teachers feel about work. Drawing on unique panel data from around 1,000 primary school teachers, we illustrate how the SATs are associated with a short increase in teachers’ anxiety levels during the week the tests take place. Yet there is little evidence that those most exposed to the pressures of SATs suffer from especially prolonged periods of high anxiety. We thus conclude that, although there may be other reasons to lower the stakes attached to primary school assessments, the benefits for teachers’ anxiety levels are likely to be marginal.

Introduction

High-stakes assessments are a prominent feature of many education systems. These measure young people’s academic competencies in key areas, with the results potentially having important consequences for pupils, schools, and/or staff. Data from such assessments are increasingly used to hold educators to account. This includes via the publication of school “league tables,” performance-related pay initiatives and threats to change the school management structure in response to results. Although high-stakes tests are now common in secondary schools, they are increasingly used in the primary sector as well. England – the empirical setting for this paper – is a prime example. Here, 10/11-year-olds sit Key Stage 2 Statutory Assessment Tests (SATs) at the end of primary education, with the results potentially having important consequences for schools, leaders, teachers and – to a lesser extent – individual pupils.

The proponents of high-stakes tests argue that assessments such as England’s SATs help to monitor and maintain standards in schools, leading to improvements in achievement (Saminsky, Citation2011). Yet it is widely recognized they can have unintended negative impacts as well. This includes increasing teacher workload (Perryman & Calvert, Citation2020), narrowing the curriculum to focus on only those subjects tested (Berliner, Citation2011), incentivizing schools to “cream-skim” the best pupils (West & Pennell, Citation2006) and excluding those deemed at greatest risk of failure (Ofsted, Citation2019). Schools may then not always make decisions in the best interest of pupils (Ofsted, Citation2019). Such issues have led some to call for high-stakes assessments to be scrapped (National Education Union, Citation2019).

One particularly pertinent topic that has recently come to the fore is how high-stakes assessments impact mental health and wellbeing in the education sector. Although much work in this area has focused on pupils, concern has been shown about the effects of high stakes testing on staff as well. For instance, by encouraging teaching-to-the-test, and subsequent narrowing of the curricula, it has been argued that teachers may become less satisfied in their jobs (Smith & Holloway, Citation2020). Given they and/or their school may suffer negative consequences if the results are poor, high-stakes tests may lead teachers to become more anxious or stressed about work. It may also increase teachers’ workloads – and thus subsequently reduce their wellbeing. Such arguments have been used as a further platform to support the case for ending high-stakes assessments, particularly in primary schools (Bradbury, Citation2019).

Why might high-stakes assessments impact the wellbeing of teachers?

Several reasons have been put forward as to why high-stakes assessments may be linked to teacher wellbeing. For instance, test scores are sometimes used by authority figures within schools (e.g. governors, managers, senior leaders). Although they may not be designed – and indeed may be ill-suited – for this purpose, these authority figures may use such data to make judgments about the performance of individual teachers. This may in-turn lead to increased anxiety about such tests if teachers feel they are being monitored (Page, Citation2017) – particularly if negative consequences are attached. Relatedly, in some countries – including England – the government publishes “league tables” facilitating comparisons of performance across schools. This may cause stress for those directly involved in such tests (e.g. teachers, school leaders) whose “performance” will be publicly known. The results may then also have consequences for teachers’ colleagues as well, with previous research suggesting that individuals may become anxious if they feel they may let others down (Tucker & Horton, Citation2019). For instance, teachers may believe that poor performance in a high-stakes test could trigger a school (Ofsted) inspection, or that this could lead to a downgrade in the school’s inspection judgment, leading to added pressure from colleagues as well.

Given these consequences, previous research has argued that headteachers put their “best” or most trusted staff to teach the pupils in the academic year that high-stakes tests take place (Bradbury, Citation2019). Yet being chosen to teach what is often deemed to be the most important class also comes with pressure – potentially leading to anxiety about needing to live up to the expectations of others. It is also likely to mean an increase in workload, particularly in the buildup to the assessments when they are helping young people to prepare. All these issues will be worsened if teachers do not believe outcomes from the tests are reliable. It will lead staff to believe they have only limited control over the outcomes on which they (and their school) are judged, leading them to feel anxious or stressed. Unfortunately, many primary school teachers in England feel this way about the Year 6 SATs (Teacher Tapp, Citation2022a).

Previous literature

Much of the previous work on the link between high stakes assessments and staff wellbeing has focused on the secondary education sector. For instance, using data from a survey of 145 Texan teachers – most of whom taught middle or high school pupils – Gonzalez et al. (Citation2017) found there to be a “significant mean difference in overall stress among high school teachers who taught high-stakes subject matter compared with their non-high-stakes subject matter counterparts.” Focusing on assessment reforms in Ontario, Hargreaves (Citation2020) argues that even “mid-stakes” tests are perceived to have a range of negative consequences, including leading to emotional ill-health amongst both students and staff. Qualitative interviews with 13 middle school mathematics teachers by Demir and Keleş (Citation2021) found high-stakes testing to be associated with teachers’ anxiety and stress levels, which in-turn impacted their instruction. Using data from a survey of 18 elementary school teachers, Gunn et al. (Citation2016) detailed how most teachers feel pressured by high-stakes tests, with this stemming from demands within the local community and by managers, as well as individuals putting pressure on themselves. In a survey of more than 900 teachers about the Texas Assessment of Academic Skills, Reese et al. (Citation2004) conclude that high stakes testing leads to the entire educational system becoming stressed. In a survey of 300 teachers in Georgia, Brockmeier et al. (Citation2014) argue that teachers perceive the pressures of high stakes testing to be closely related to their stress levels at work. Discussing the evidence on high stakes testing and teacher stress, Mathison and Freeman (Citation2006) suggest that such assessment regimes are likely to lead to stressed teachers who either suffer from illness or choose to leave the profession. Turning to England, a survey commissioned by the National Education Union claimed that 90% of teachers believe the end-of-primary SATs negatively impacts the mental health of teaching staff (TES, Citation2018). Similarly, in a survey completed by 188 teachers in England conducted by Bradbury (Citation2019) and funded by the interest group More Than a Score, 92% agreed or strongly agreed that SATs have a negative impact on teachers’ wellbeing.

The unintended positive benefits of high-stakes assessment

While the narrative around high-stakes assessments often tends to be negative – particularly in debates about wellbeing – they may also have some positive benefits in this area as well. For instance, Pekrun (Citation2006) notes how anxiety – the focus of this paper – is an “activating emotion.” That is, anxiety can spur an individual into action, encouraging them to address the situation they feel anxious about. A prime example is a student (or teacher) who feels anxious about an upcoming test. Rather than being a negative, this feeling may encourage them to work harder to make sure they (or their pupils) achieve the grade they want. This could, in turn, lead to later benefits for their wellbeing as a result. Thus, while excessive anxiety about assessments is clearly problematic, some level of anxiety may actually be desirable.

There is some evidence in the literature that high-stakes testing can trigger motivation amongst teachers – at least some of the time. Nichols and Harris (Citation2016) provides one such example, discussing how accountability labels assigned to underperforming schools (e.g. those with low test scores) leads to renewed efforts amongst staff – at least initially. This includes through the introduction of new approaches, enhanced professional development opportunities and greater collaboration in tackling the challenges faced. Likewise, the same authors note the work of Au (Citation2007) whose review suggests that a quarter of studies into high-stakes testing found they led to positive curriculum changes being made (though they also note that most studies highlight negative outcomes – like curriculum narrowing – as well). Likewise, Klinger and Rogers (Citation2011) remark how large-scale assessments can positively influence teaching and learning in schools.

More broadly, Cizek (Citation2001) attempts to bring more balance to what is often quote a negative literature on the unintended consequences of high stakes testing, pointing to 10 often underappreciated benefits. These include potential positive impacts upon teacher professional development, teacher’s intimacy with the discipline and improved learning outcomes. These could plausibly have spillover benefits for teacher wellbeing.

The present study

Despite the insights provided by the aforementioned studies, there continues to be gaps in our knowledge about the relationship between high stakes tests and the wellbeing of teachers. Four notable issues stand out. First, much existing research is small-scale and cross-sectional. There is a notable dearth of large-scale longitudinal quantitative evidence into how high-stakes tests are associated with teachers’ wellbeing. Second, there has been little attempt to investigate how the wellbeing of teachers who are most affected by such tests compare against a reasonable “comparison group” (teachers less likely to be affected by such tests). Third, no existing study investigates whether staff become more anxious about work around the time that high-stakes tests take place, as the “threat” from them draws near (Lotz & Sparfeldt, Citation2017). Finally, in general, less attention has been paid to the link between high stakes tests and staff wellbeing in primary schools than in secondary schools.

This paper provides new insights into such issues. Using longitudinal data gathered from around 1,000 teachers in England, we examine how work-related anxiety fluctuated over the 2021/22 academic year. Particular attention is paid to Year 6 teachers – the group most affected by the high-stakes SATs regime in England – in comparison to those providing instruction to other primary school classes (e.g. teachers of Year 3/5 pupils who do not sit any national assessments). This includes an analysis of how the work-related anxiety of these groups compare in the buildup to – and during – when the SATs take place. In doing so, we provide unique new insights into how high-stakes assessment is linked to the work-related anxiety of staff in primary schools. Specifically, the following research questions are addressed:

Research question 1. Do Year 6 teachers – whose pupils sit the high-stakes SATs – have higher levels of work-related anxiety than primary teachers providing instruction to other school year groups (whose pupils do not sit any national assessments)?
Research question 2. Does the work-related anxiety of Year 6 teachers increase in the buildup to – and during – SATs test week? How does this compare to teachers providing instruction to other school year groups?

Data and methodology

Primary school assessment in England

Primary school in England runs from Year R (when children are age 4/5) through to Year 6 (age 10/11), after which they move into secondary school. National tests and assessments are conducted at several points during primary school, but these are mostly low stakes for both pupils and schools. For instance, upon entry into Year R, 4/5-year-olds complete a reception baseline test – measuring their early abilities as they start school. In Year 1 (age 5/6) pupils complete an assessment of their early reading (phonics) skills, with this followed by English and mathematics tests in Year 2 (age 6/7) and their knowledge of multiplication tables in Year 4 (age 8/9). The Key Stage 2 SATs are then taken during one week at the end of Year 6 (age 10/11), totaling around four hours of assessment material taken over four days. There are no direct consequences of these tests for pupils, although they are shared with secondary schools who may use them to assign pupils into different ability groups. They are, however, high stakes for schools, with their performance published in league tables and used as part of the school inspection process.

Comparison groups

If one wants to know more about how high-stakes assessments (such as the Year 6 SATs) impacts teacher wellbeing, what groups should be compared? Those leading Year 6 classes – whose pupils sit the SATs – are likely to be amongst the most affected. They will be tasked with preparing pupils for the tests and then held accountable for the results. Moreover, if schools’ drill pupils for SATs – as is often claimed – then this is likely to impact the workload of Year 6 teachers the most. Likewise, if SATs lead to a narrowing of the curriculum, then this is likely to be most apparent in Year 6. They will also be working with pupils taking the SATs on a day-to-day basis, and thus be the staff who are most exposed to their emotions if they show any signs of distress. This is backed by up the qualitative research of Bradbury (Citation2019), who notes that Year 6 teachers face “different pressures” to those teaching other year groups.

Who, then, should Year 6 teachers be compared to? The most logical comparators – similar in spirit to the approach of Gonzalez et al. (Citation2017) - are teachers less exposed to the pressures of the SATs (at least directly). Our primary comparison group is therefore primary school teachers leading Year 3 or Year 5 classes. Children in these year groups do not sit the SATs tests, nor any other national assessments in that academic year. In contrast, Year 1 students have a phonics screening test, Year 2 pupils the Key Stage 1 SATs and Year 4 pupils the multiplication tables check. These are all much lower stakes assessments for teachers and schools than the Year 6 SATs. However, to get the cleanest possible comparator group, we focus on teachers of Year 3 and Year 5 pupils when no national assessments take place. Indeed, even More Than a Score (an anti-assessment interest group) notes that there are no high-pressure tests in Year 3/Year 5 (More Than a Score, Citation2022). Year 3 and Year 5 teachers will thus clearly be less affected by high-stakes assessments – and the SATs in particular – than Year 6 teachers. As a comparator, Year 3 and Year 5 teachers also have the advantage of working in primary schools, and thus otherwise have similar working environments as their Year 6 counterparts.

To test the robustness of our findings, we broaden our comparator group to also include primary teachers providing instruction to other school year groups. First, we add Year 4 teachers into our comparison group; although pupils in this academic year take a “multiplication tables check,” this test takes just five minutes and is a lot lower stakes than the SATs (Department for Education, Citation2022). Second, we draw comparisons between Year 6 teachers and primary teachers working with all other school years. This has the advantage of maximizing the sample size of our comparator group, though now includes teachers working with pupils of very different ages (between ages 4 and 10).

Data

The data we use were collected via the Teacher Tapp survey app (https://teachertapp.co.uk/). This organization is based in England – though now also operates in other countries – with it possible for researchers and other organizations to ask their panel of teachers up to three questions each day. We commissioned Teacher Tapp (via a research grant) to collect data from teachers on our behalf and to supply us with an anonymized data file, stripped of any personal or school identifying information.

Each day at 1530, teachers received a notification asking three short questions. On most days, around 5,000–8,000 teachers responded, and thus provides longitudinal information over time. Although demographic characteristics of the Teacher Tapp panel is broadly in line with the population of teachers in England (see Teacher Tapp, Citation2020 for further details), participants are self-selecting – and hence best considered a convenience sample. The survey company (Teacher Tapp) that collected the data also provided a set of survey weights. These are post-stratification weights (Kolenikov, Citation2016) that balance the sample in terms of gender, age, leadership status, primary/secondary phase and independent/state sectors. See Teacher Tapp (Citation2022b) for further details.

At 16 points during the 2021/22 academic year, the Teacher Tapp panel were asked about their work-related anxiety. This was based on the UK Office for National Statistics general anxiety question, which has been subject to extensive validation and is widely used:

On a scale where 0 is “not at all anxious” and 10 is “completely anxious,” overall, how anxious did you feel about work today?

We have added “about work” to the standard question used by the ONS to ensure that teachers focus solely on their anxiety about work. Teachers were consistently asked this question on Tuesday afternoons to avoid possible day-of-week effects. We did not ask directly about SATs so (a) to not lead teachers into certain responses; (b) to ensure the question was also clearly relevant to our comparison groups and (c) to capture the impact on work-related anxiety overall. Appendix A provides further details on the precise dates this question was asked. As the SATs took place between 9th–May 12, 2022, we ensured our work-related anxiety question was asked on May 3rd, 10th and 17th to capture variation just before, during and after the test week.

As with any longitudinal data collection – and particularly intensive panel surveys – not all teachers responded to the work-related anxiety question at all timepoints. There are various ways this challenge can be handled, each with their own strengths and limitations. At one extreme, one could base the analysis on only those teachers that responded every time the question was asked. This would ensure that the sample has a high degree of consistency over time (i.e. that the analysis includes the same group of teachers at each time point) but at the cost of significant information loss (i.e. many teachers will be dropped from the analysis) and a reduction in statistical power. On the other hand, one could include teachers in the analysis whenever they provided a response, even if this was only once. This would maximize the information available, but also lead to inconsistencies in the sample over time (i.e. a different group of teachers included in the analysis each time the question was asked).

To manage these trade-offs, the analysis presented in the main body of the paper is based upon the sample of teachers that responded to the work-related anxiety question on at least eight of the 16 occasions it was asked during the 2021/22 academic year. In other words, if a teacher answered the work-related anxiety question eight or more times they were included in the analysis, while if they responded seven or fewer times they were dropped. This balances the desire for there to be a reasonably consistent sample at each timepoint while also not excluding too many data points. The analytic sample thus includes up to 1,500 teachers, with specific details regarding sample sizes reported in . Note that teachers who work in independent schools have been excluded from the analysis, due to their very different work environments to state schoolteachers (including many not administrating SATs).

Table 1. Descriptive statistics for the analytic sample.

Download CSV Display Table

We note, however, there are different approaches one could take to addressing this issue. It is hence important to test the robustness of our results. Appendix B is dedicated to this concern. Specifically, it presents four sets of alternative estimates: (a) using all available data at all time-points; (b) restricting the sample to teachers that responded on at least 8 occasions; (c) restricting the sample to teachers that responded on at least 13 occasions and (d) using multiple imputation by chained equations to impute missing information at each timepoint. Very similar results are obtained under each of these different approaches.

Descriptive information about the sample is presented in . This includes information previously collected about Teacher Tapp respondents such as demographic characteristics (e.g. age, gender, children at home, job role) and about the school in which they work (e.g. percent of disadvantaged pupils, Ofsted inspection rating). Figures are presented separately for each primary school year group. There are some clear differences in terms of experience, job role (e.g. whether the teacher holds a leadership position) and whether the teacher provides instruction across multiple year groups. This is consistent with the qualitative work of Bradbury (Citation2019); certain types of teachers – most notably those who are more senior and experienced – are more likely to be assigned to teach Year 6 classes.

Relatedly, provides some descriptive information about how responses to the work-related anxiety question compares across Year 6 (our group of interest) and Year 3/5 (our main comparison group) teachers.

Table 2. Mean, standard deviation and number of observations at each timepoint for year 3, year 5 and year 6 teachers.

Download CSV Display Table

Methodology

Our analysis begins by plotting average work-related anxiety scores for our key comparator groups (e.g. Year 6 versus Years 3/5 teachers) at each survey date, with particular interest paid to variation around the SATs test week (9th–May 12, 2022). Appendix B illustrates how the response rate to the work-related anxiety question is similar across our comparison groups across the academic year. These results provide a first descriptive account of how work-related anxiety compares across groups of teachers with different exposures to the SATs.

These initial findings will then be formalized via regression modeling. Specifically, we will estimate OLS regression models of the form:

(1)

An x_{i} = α + β . Yr_6_{i} + γ . D_{i} + δ . S_{i} + ε_{i}

(1)

Where:

$An x_{i}$ = Scores on the 0–10 work-related anxiety scale, averaged over the period under investigation (see below for further details).

$Yr_6_{i}$ = A dummy variable capturing whether the teacher provides instruction to Year 6 pupils. Separate models will be estimated using Year 3/5, Year 3/4/5 and all other primary teachers as the reference group.

$D_{i}$ = A vector of controls for teacher background characteristics. This includes age, gender, experience, job role, children at home and whether they teach across multiple year groups.

$S_{i}$ = A vector of controls for school background characteristics. This includes Ofsted inspection rating and the percentage of pupils eligible for Free School Meals.

$ε_{i}$ = Error term. Adjustments are made to the estimated standard errors to account for potential heteroskedasticity.

The parameter of interest is $β$ . This captures the difference in average work-related anxiety between teachers most affected by the high-stakes SATs (Year 6 teachers) and those that are likely to be affected much less (the reference group – e.g. Year 3/5 teachers).

The model will be estimated several times, with the outcome ( $An x_{t}$ ) measured over different periods. To begin, for each teacher, we take the average anxiety score across each time they answered the question during the 2021/22 academic year (up until pupils take the SATs). The $β$ estimate will thus capture whether Year 6 teachers generally feel more anxious about work throughout the academic year than our comparator groups. We then subsequently narrow the dates used to those closer to the SATs test week. For instance, we will re-estimate model (1) but with our work-related anxiety measure only being taken across the mid-February to mid-May period – i.e. capturing the immediate build up to and during the SATs test week. This is of course likely to be one of the busiest and most stressful periods for Year 6 teachers as they hurriedly prepare pupils for the assessment. Likewise, we will also present separate estimates using only responses to the anxiety question in the week before, during and after the SATs take place. Together, these estimates reveal whether teachers who are most exposed to the pressure of SATs differ in their anxiety levels to those groups with less exposure, and when in the academic year any difference is most acute.

All data analysis was conducted using Python 3.8 and STATA 17.

Robustness tests

Several robustness tests will be conducted – reported in the online supplementary material – to explore the sensitivity of the results. First, we will exclude headteachers and senior leadership teams members from the sample, thus focusing on how the SATs are related to the work-related anxiety of class teachers and middle leaders only. Second, we will exclude from the sample any teacher who teachers across multiple year groups. This will allow us to focus on teachers who only teach Year 6 pupils, and how they compare to teachers who only teach (for instance) Year 3 or Year 5 pupils. Third, as noted above, in Appendix B we conduct multiple sensitivity analyses regarding sample selection criteria and missing data. Finally, alternative descriptive results will be presented when we dichotomize the outcome variable, investigating how the SATs are associated with the percentage of teachers who report high anxiety levels (scores of 7 and above on the 0–10 anxiety scale). This follows the cutoff for high anxiety levels used by the UK’s Office for National Statistics (ONS) using this scale.

Results

compares mean work-related anxiety scores for Year 6 teachers (black solid line with square markers) to Year 3/5 teachers (dashed gray line with circular markers) at 16 points during the 2021/22 academic year. There are three key points to note.

Figure 1. Average work-related anxiety scores of year 6 and year 3/5 teachers during the 2021/22 academic year.

First, in general, Year 6 teachers had slightly lower levels of work-related anxiety than Year 3/5 teachers at most timepoints. In other words, teachers who were most exposed to the pressures of SATs were slightly less anxious about work than their peers with less exposure. The difference stands at around ½ of one point on the 0–10 anxiety scale, with a mean score of approximately 4 for Year 6 teachers and 4.5 for Year 3/5 teachers. This is equivalent to an effect size (Cohen’s d) of around 0.2 of standard deviations (the difference of around 0.5 divided by a standard deviation of around 2.8 produces an effect size of 0.18). Caution is needed when interpreting these unconditional results, however, because they could be driven by selection. For instance, as illustrates, only 16% of Year 6 teachers in the sample have five years of experience or less, compared to 29% of Year 3 or Year 5 teachers. Indeed, one plausible explanation for the generally lower levels of anxiety of Year 6 teachers in is that headteachers are assigning less anxious staff to teach this year group.

Second, the only point where the opposite hold true – i.e. where Year 6 teachers have higher levels of anxiety than our Year 3/5 comparison group – is during the week the SATs take place. The magnitude of the difference is similar to before – around ½ an anxiety point (Cohen’s d ~ 0.2 standard deviations) – but now in the opposite direction. This provides the first suggestion that the SATs negatively impact the wellbeing of Year 6 teachers, but only during the narrow window when the tests take place.

Finally, the week after the SATs have finished – and through to the end of the academic year – the status quo returns. The work-related anxiety of Year 6 teachers has quickly returned to “normal” levels, being slightly lower than the Year 3/5 comparison group. This in-turn suggests that the negative impact of the SATs on the anxiety levels of Year 6 teachers is relatively short-lived; there is a temporary increase when the tests take place, but this is then quickly reversed.

formalizes these results by presenting estimates from our regression models. Figures capture the difference in average anxiety scores between Year 6 teachers and each comparator group, conditional on the background characteristics controlled. In other words, these estimates refer to the (unstandardized) $β$ parameters from the regression model presented in Equationequation 1(1) $An x_{i} = α + β . Yr_6_{i} + γ . D_{i} + δ . S_{i} + ε_{i}$ (1) . The top row refers to differences in average anxiety levels across groups throughout the 2021/22 academic year (up until the week of May 10th when the SATs took place), while the second row takes average anxiety levels during the three-month period building up to the test week. The bottom three rows present analogous estimates of teacher anxiety the week before, during and after the SATs took place.

Table 3. OLS regression model estimates of differences in work-related anxiety between year 6 teachers and selected comparison groups.

Display Table

On average, there is no evidence that Year 6 teachers felt more anxious about work throughout the academic year than any of the three comparator groups. Although the point estimates are negative (suggesting, as per , Year 6 teachers might have slightly lower anxiety levels than Year 3/5 teachers) these do not reach statistical significance at the five percent level. The same holds true with respect to the three-month period leading up to the SATs. Thus, overall, there is no evidence that those teachers most exposed to the pressures of the SATs are generally more anxious about work than their peers with less exposure during most of the academic year, including in the three-month period leading up to the tests.

Turning to the bottom three rows – focusing on the week immediately before, during and following the SATs – a clear result emerges. There is no obvious difference in work-related anxiety of Year 6 teachers and any of our comparator groups in the week prior and following the SATs. However, during the SATs week, a substantial difference is found. The anxiety of Year 6 teachers is around 0.7–0.8 anxiety score points above those of our comparator groups, which is equivalent to an effect size (Cohen’s d) around 0.25 standard deviations (calculated as 0.7 divided by a standard deviation of approximately 2.8). These are the only estimates that are statistically significant at the five percent level. This is consistent with the descriptive results presented in .

To conclude, presents estimates of the conditional difference in anxiety levels between Year 6 and Year 3/5 teachers at each survey date. Positive values indicate where Year 6 teachers have higher conditional average anxiety levels, with solid markers indicating where the difference is statistically significant at the five percent level. Thin gray bars running through each data point represent the estimated 95% confidence interval.

Figure 2. Difference in conditional work-related anxiety scores of year 6 and year 3/5 teachers during the 2021/22 academic year.

Notes: Figures refer to the difference in work-related anxiety between Year 6 teachers and Year 3/5 teachers. Positive figures are where Year 6 teachers report higher levels of work-related anxiety than Year 3/5 teachers. Circular markers with solid fill illustrate where the difference is statistically significant at the five percent level (hollow markers indicate where the difference is not statistically significant). Thin gray bars running through each data point represent the estimated 95% confidence interval.

Overall, the story from these results is similar to before. Through most of the academic year, Year 6 teachers anxiety levels are around 0.3 points (on the 0–10 scale) below that of Year 3/5 teachers. However, during the week of the SATs, the anxiety level of Year 6 teachers are 0.7 anxiety points above those of Year 3/5 teachers. The p-value for this 0.7 difference in anxiety points is 0.0059. This is well below the usual 0.05 threshold, but slightly above the threshold of 0.0031 if one adjusts for the 16 comparisons being presented in using a Bonferroni correction. Although Bonferroni corrections are known to be overly conservative (Bender, Citation1999), we note the difference between Year 3/5 and Year 6 teachers only reaches statistical significance in at the 0.10 threshold if such a correction is made. Together, this provides evidence that the SATs have a short, sharp shock on teachers’ anxiety levels during the test week of the test, but little sign of there being a prolonged effect.

Heterogeneity across sub-groups

repeats the analysis presented in , but now with separate estimates presented for six sub-groups: (a) male teachers; (b) female teachers; (c) teachers who believe the SATs are reliable; (d) teachers who believe SATs are unreliable; (e) teachers who work in schools holding SATs booster lessons and (f) teachers who work in schools that do not hold SATs booster lessons. Note that in this table we focus on comparisons between Year 6 and Year 3/4/5 teachers to ensure these sub-group estimates are based on a sufficient sample size. These estimates again refer to the (unstandardised) $β$ parameters from the regression model presented in equation 1.

Table 4. OLS regression model estimates of differences in work-related anxiety between year 6 and year 3/4/5 teachers. Estimates for sub-groups.

Display Table

Overall, the pattern of results is very similar across each of these sub-groups. In each case, there is no difference in anxiety levels between Year 6 and Year 3/4/5 teachers, except during the SATs test week. This includes in the weeks and months leading up to when the SATs take place. Thus, overall, it seems that our key finding – that the SATs increase anxiety levels but only during the test week – holds across each of the sub-groups considered.

Sensitivity analyses

Appendix B presents alternative estimates where we use different sample selections or use multiple imputation to impute missing observations. Results are very similar to those presented above. Appendix C and D provide alternative versions of but now drawing comparisons between Year 6 teachers and (a) Year 3/4/5 teachers (b) all other primary teachers. These demonstrate that our key results are robust to whichever comparison group is used. When headteachers and senior leaders are removed from the analytic sample in Appendix E, an even sharper peak in work-related anxiety is observed during the SATs test week. Otherwise, the patterns observed remain unchanged. Likewise, our findings remain intact when the sample is restricted to only those teachers who provide instruction to a single school year group (see Appendix F). Finally, Appendix G provides alternative descriptive results focusing on the percent of teachers reporting high anxiety levels (rather than looking at mean scores). This demonstrates how the percent of Year 6 teachers with high levels of work-related anxiety increases from around 25% during most of the academic year to around 35% during the week of the SATs.

Discussion

High-stakes assessments are an important part of education accountability systems around the world. Yet their use is not without controversy, particularly in primary schools. Proponents of high-stakes tests argue that they provide the information needed to hold schools, managers and education leaders to account, raising standards as a result (Supovitz, Citation2009). Their detractors however point toward the many unintended negative consequences they believe more than offsets any of the potential gains. One such claim is that high-stakes assessments negatively impact mental health and wellbeing in the education sector (Högberg & Horn, Citation2022). This not only includes those pupils taking the tests, but the staff that teach them as well (Gonzalez et al., Citation2017).

Yet there are some key issues with existing evidence in this area. Much of the previous work is small scale, cross-sectional and only captures opinions about such tests from school staff. Few have attempted to measure mental health objectively using large-scale longitudinal data, or attempted to look at differences in wellbeing levels across meaningful comparison groups (e.g. those teachers more versus less impacted by the tests). This paper has started to make inroads into this gap in the evidence base, presenting new empirical insights into the link between the Year 6 SATs and the anxiety levels of teachers in England.

Our results provide some evidence that high-stakes assessments such as SATs are related to how anxious teachers feel about work. In particular, there seems to be a short, sharp shock to teachers’ anxiety levels during the week when the tests take place. At the same time, there is little evidence of a substantial increase in work-related anxiety in the period leading up to the tests. Hence the negative impact of SATs on teachers’ anxiety levels seems to be over relatively quickly. This could be due to either the SATs not inherently being stressful to teachers – outside of the test week – or to headteachers managing the pressures through how they allocate staff. Either way, the result seems to be that the “impact” of SATs on teacher wellbeing is probably quite limited.

Our findings contribute to the broader body of literature into teacher stress and wellbeing. Previous studies have noted how teachers face several different interconnected sources of stress in their job, including marking, lesson planning and managing challenging pupil behavior (Agyapong et al., Citation2022; Nwoko et al., Citation2023). Amongst these, teachers often point toward accountability – and being held responsible for pupil achievement – to be amongst the most stressful work-related factors (Jerrim & Sims, Citation2019). Our findings – that there is only a relatively weak and short-lived association between national assessments and teacher anxiety – run somewhat counter to these claims, and thus suggest other sources of stress identified in the literature (such as workload) may play a more important role. One potential explanation for this discrepancy is that our outcome measure captures teachers work-related anxiety in general, rather than directly asking about their views of high-stakes tests (as is common in the existing literature). This is arguably both a strength and limitation of our approach; while we are less likely to lead teachers into providing a particular response, we are unable to directly measure teacher’s opinions of the tests and how they change over the course of an academic year.

Limitations/Threats to validity

While we believe that our approach has some important advantages over previous work – most notably in thinking about and drawing explicit comparisons to meaningful “control” groups – we also recognize there remains some limitations. For instance, although the 2021/22 academic year had largely returned to normal, schools still faced some disruption between December 2021 and mid-February 2022 from the Omicron COVID wave. Although schools remained open, there were quite high levels of staff and pupil absence during this period. Despite our comparison groups probably being affected in similar ways to Year 6 teachers, one cannot completely discount this as having some impact on our results. Importantly, though, the period leading up to the SATs – from around mid-February onwards – schools in England had largely returned to normal.

Perhaps more important was the lasting impact that COVID had on how the results from the SATs were to be used. In July 2021 – almost a year before the tests took place – the Department for Education in England announced that it would not publish SATs performance tables in 2022. This means that one of the key factors potentially make SATs stressful – the open publication of results – did not take place. The results, however, were still accessible to school leaders, managers, governors and Ofsted (the school inspectorate). Nevertheless, in 2022, SATs were arguably slightly lower stakes than in a “normal” year.

As noted above, teachers are not randomly assigned to year groups. Rather, senior leaders allocate staff based on their attributes and skills. It could be that headteachers choose to not ask staff to teach Year 6 if they are prone to anxiety issues or do not think they can cope with the pressure surrounding the SATs. The controls included in our regression models are only likely to partially address the potential confounding effect of teachers with different characteristics being deployed to teach different year groups. Thus, like other existing studies into the link between high-stakes assessments and teacher wellbeing, our estimates capture conditional associations rather than establishing cause and effect. Further work – both qualitative and quantitative – is needed into understanding who school leaders deploy to teach those pupils who are about to take high-stakes tests.

Our outcome measure is also based upon a single question that focuses on work-related anxiety in general, rather than specifically asking teachers about the SATs. Although this has certain advantages – e.g. by not mentioning the SATs explicitly, teachers are unlikely to be led into providing a particular responses – it is possible that asking a question more directly about the anxiety induced by this assessment would lead to a different pattern of responses. More generally, we have focused on one specific dimension of teachers’ wellbeing at work (anxiety). Although this is one of the dimensions of wellbeing most likely to be impacted by high-stakes assessments, one cannot rule out the association being different for other areas (e.g. happiness, life-satisfaction). Future work should thus seek to generalize our results using other (and ideally multiple) questions and measuring other aspects of teacher’s wellbeing.

Finally, like most previous studies in this area, participants are self-selected and are not drawn from a national probability sample. Future work using data from such a sample design – and achieving a high response rate – would further increase one’s confidence in the generalizability of the results.

Conclusion

What might our findings nevertheless imply for the future of high-stakes assessments in primary schools? On the one hand, one may argue that we have shown there to be a link between the presence of such tests and teachers’ anxiety levels, even if the negative effects appear relatively short-lived. On the other hand, one might argue that we all experience stressful periods at work, and that in this respect teachers are little different from other professionals. For accountants it is the end of the financial year, for those working in the emergency services it is Friday and Saturday nights and – as we have shown – for Year 6 teachers in England it is the week the SATs take place. There are hence likely to be stronger cases to be made against the use of high stakes assessments in primary schools than the limited link observed with teacher wellbeing.

Supplemental material

Supplemental Material

Download Zip (81.8 KB)

Acknowledgments

The Nuffield Foundation is an independent charitable trust with a mission to advance social wellbeing. It funds research that informs social policy, primarily in Education, Welfare, and Justice. It also funds student programs that provide opportunities for young people to develop skills in quantitative and scientific methods. The Nuffield Foundation is the founder and co-funder of the Nuffield Council on Bioethics and the Ada Lovelace Institute. The Foundation has funded this project, but the views expressed are those of the authors and not necessarily the Foundation. Visit www.nuffieldfoundation.org. We are grateful for their support. Helpful comments have been received on the draft from our project advisory group, whom we would like to thank for their input and support.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/10627197.2024.2350961.

Additional information

Funding

The work was supported by the Nuffield Foundation.

References

Agyapong, B., Obuobi-Donkor, G., Burback, L., & Wei, Y. (2022). Stress, burnout, anxiety and depression among teachers: A scoping review. International Journal of Environmental Research and Public Health, 19(17), 10706. https://doi.org/10.3390/ijerph191710706
PubMed Web of Science ®Google Scholar
Au, W. (2007). High stakes testing and curricular control: A qualitative meta synthesis. Educational Researcher, 36(5), 258–267. https://doi.org/10.3102/0013189X07306523
Google Scholar
Bender, R. (1999). Multiple test procedures other than Bonferroni’s deserve wider use. BMJ, 318(7183), 600. https://doi.org/10.1136/bmj.318.7183.600a
PubMed Web of Science ®Google Scholar
Berliner, D. (2011). Rational responses to high stakes testing: The case of curriculum narrowing and the harm that follows. Cambridge Journal of Education, 41(3), 287–302. https://doi.org/10.1080/0305764X.2011.607151
Web of Science ®Google Scholar
Bradbury, A. (2019). Pressure, anxiety and collateral damage. The headteachers’ verdict on SATs. More than a score. https://www.morethanascore.org.uk/wp-content/uploads/2019/09/SATs-research.pdf
Google Scholar
Brockmeier, L., Green, R., Pate, J., Tsemunhu, R., & Bochenko, M. (2014). Teachers’ beliefs about the effects of high stakes testing. Journal of Education & Human Development, 3(4), 91–104. https://doi.org/10.15640/jehd.v3n4a9
Google Scholar
Cizek, G. J. (2001). More unintended consequences of high-stakes testing. Educational Measurement Issues & Practice, 20(4), 19–27. https://doi.org/10.1111/j.1745-3992.2001.tb00072.x
Google Scholar
Demir, C., & Keleş, O. (2021). The impact of high-stakes testing on the teaching and learning processes of mathematics. Journal of Pedagogical Research, 5(2), 119–137. https://doi.org/10.33902/JPR.2021269677
Google Scholar
Department for Education. (2022). Information for parents: 2022 multiplication tables check. Department for Education, England. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1031901/2022_Information_for_parents_Multiplication_tables_check.pdf
Google Scholar
Gonzalez, A., Peters, M., Orange, A., & Grigsby, B. (2017). The influence of high-stakes testing on teacher self-efficacy and job-related stress. Cambridge Journal of Education, 47(4), 513–531. https://doi.org/10.1080/0305764X.2016.1214237
Web of Science ®Google Scholar
Gunn, J., Al-Bataineh, A., & Abu Al-Rub, W. (2016). Teachers’ perceptions of high-stakes testing. International Journal of Teaching and Education, 4(2), 49–62. https://doi.org/10.20472/TE.2016.4.2.003
Google Scholar
Hargreaves, A. (2020). Large-scale assessments and their effects: The case of mid-stakes tests in Ontario. Journal of Educational Change, 21(3), 393–420. https://doi.org/10.1007/s10833-020-09380-5
PubMed Web of Science ®Google Scholar
Högberg, B., & Horn, D. (2022). National high-stakes testing, gender, and school stress in Europe: A difference-in-differences analysis. European Sociological Review, 38(6), 975–987. https://doi.org/10.1093/esr/jcac009
Web of Science ®Google Scholar
Jerrim, J., & Sims, S.(2019). The teaching and learning international survey (TALIS) 2018: June 2019. Department for Education (DFE).
Google Scholar
Klinger, D., & Rogers, T. (2011). Teachers’ perceptions of large-scale assessment programs within low-stakes accountability frameworks. International Journal of Testing, 11(2), 122–143. https://doi.org/10.1080/15305058.2011.552748
Google Scholar
Kolenikov, S. (2016). Post-stratification or a non-response adjustment? Survey Practice, 9(3), 1–12. https://doi.org/10.29115/SP-2016-0014
Google Scholar
Lotz, C., & Sparfeldt, J. (2017). Does test anxiety increase as the exam draws near? – Students’ state test anxiety recorded over the course of one semester. Personality and Individual Differences, 104, 397–400. https://doi.org/10.1016/j.paid.2016.08.032
Web of Science ®Google Scholar
Mathison, S., & Freeman, M. (2006). Teacher stress and high stakes testing. How using one measure of academic success leads to multiple teacher stressors. Understanding Teacher Stress in an Age of Accountability, 43–63. https://www.infoagepub.com/index.php?id=18&p=1-59311-474-5
Google Scholar
More Than a Score. (2022). A parents’ guide to primary testing. https://www.morethanascore.org.uk/what-we-do/parents-guide/year-5/?scrollto=content
Google Scholar
National Education Union. (2019). Too much testing. https://NEU1303 Scrap the SATs ballot crib sheet for web v1_0.pdf
Google Scholar
Nichols, S. L., & Harris, L. R. (2016). Accountability assessment’s effects on teachers and schools. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 40–56). Routledge.
Google Scholar
Nwoko, J. C., Emeto, T. I., Malau-Aduli, A. E. O., & Malau-Aduli, B. S. (2023). A systematic review of the factors that influence teachers’ occupational wellbeing. International Journal of Environmental Research and Public Health, 20(12), 6070. https://doi.org/10.3390/ijerph20126070
PubMedGoogle Scholar
Ofsted. (2019). Exploring the issue of off-rolling. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/936524/Ofsted_offrolling_report_YouGov_090519.pdf
Google Scholar
Page, D. (2017). Conceptualising the surveillance of teachers. British Journal of Sociology of Education, 38(7), 991–1006. https://doi.org/10.1080/01425692.2016.1218752
Web of Science ®Google Scholar
Pekrun, R. (2006). The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. Educational Psychology Review, 18(4), 315–341. https://doi.org/10.1007/s10648-006-9029-9
Web of Science ®Google Scholar
Perryman, J., & Calvert, G. (2020). What motivates people to teach, and why do they leave? Accountability, performativity and teacher retention. British Journal of Educational Studies, 68(1), 3–23. https://doi.org/10.1080/00071005.2019.1589417
Web of Science ®Google Scholar
Reese, M., Gordon, S., & Price, L. (2004). Teachers’ perceptions of high-stakes testing. Journal of School Leadership, 14(5), 464–496. https://doi.org/10.1177/105268460401400501
Google Scholar
Saminsky, A. (2011). High-stakes standardized testing: A panacea or a pest? Inquiries Journal/Student Pulse, 3(1). http://www.inquiriesjournal.com/articles/373/high-stakes-standardized-testing-a-panacea-or-a-pest
Google Scholar
Smith, W., & Holloway, J. (2020). School testing culture and teacher satisfaction. Educational Assessment, Evaluation and Accountability, 32(4), 461–479. https://doi.org/10.1007/s11092-020-09342-8
Web of Science ®Google Scholar
Supovitz, J. (2009). Can high stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(2–3), 211–227. https://doi.org/10.1007/s10833-009-9105-2
Google Scholar
Teacher Tapp. (2020). Who is on Teacher Tapp? (And why?!). https://teachertapp.co.uk/not-like-others-teacher-tapp-who-sample-representative-reweights/
Google Scholar
Teacher Tapp. (2022a). All things SATs and what’s going on with study leave? This, and more … https://teachertapp.co.uk/all-things-sats-and-whats-going-on-with-study-leave-this-and-more/
Google Scholar
Teacher Tapp. (2022b). How we ‘weight’ the Teacher Tapp sample. https://teachertapp.co.uk/articles/some-tappers-are-worth-more-than-others-how-we-weight-the-teacher-tapp-sample-2/
Google Scholar
TES. (2018). Exclusive: Sats take toll on teachers’ mental health. Retrived from https://www.tes.com/magazine/archive/exclusive-sats-take-toll-teachers-mental-health
Google Scholar
Tucker, F., & Horton, J. (2019). “The show must go on!” Fieldwork, mental health and wellbeing in geography, earth and environmental sciences. Area, 51(1), 84–93. https://doi.org/10.1111/area.12437
Web of Science ®Google Scholar
West, A., & Pennell, H. (2006). Market-oriented reforms and “high stakes” testing: Incentives and consequences. In A. Vinokur Ed. Pouvoirs et Mesure En Éducation. Cahiers de la recherche sur l’education and savoirs (1/2005) (pp. 181–199). Maison Des Sciences De L’homme.
Google Scholar

High Stakes Assessments in Primary Schools and Teachers’ Anxiety About Work

ABSTRACT

Introduction