2,733
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Towards more robust evaluation of policies and programmes in education: identifying challenges in evaluating DEIS and Reading Recovery

ORCID Icon & ORCID Icon
Received 21 Aug 2023, Accepted 21 Dec 2023, Published online: 29 Mar 2024

ABSTRACT

In recent years, countries including the UK and USA have seen advancements in the use of Randomised Controlled Trials in education, progress that has not been mirrored in Ireland. Ireland does not have a strong tradition of using experimental or quasi-experimental evaluation designs for monitoring and evaluation of education policy despite well-articulated commitments to the latter activities. An example is the evaluation of the Delivering Equality of Opportunity in Schools (DEIS) programme where little is known about the causal impact of DEIS on student outcomes, despite considerable State investment over almost 20 years. There appears to have been comparatively little critical review of the literacy and numeracy programmes offered to DEIS schools, with those included at the outset largely still on offer. Here, we draw on the wider literature on the evaluation of complex interventions and consider what lessons may be learned for Ireland, illustrating challenges and opportunities by focusing on DEIS and Reading Recovery (a literacy support programme widely used in DEIS schools). We advocate for greater use of experimental designs where feasible and appropriate; further gathering of comparable measures of achievement at pupil level to facilitate monitoring; and detailed consideration of quasi-experimental methods that can support causal conclusions.

Introduction

Policy makers in Ireland and elsewhere are placing increased focus on evidence-informed policy-making, which emphasises the need for robust evaluation that can provide casual evidence of policy impact (Department of Education and Skills [DES] Citation2017a; Global Commission on Evidence to Address Societal Challenges Citation2022; Government of Ireland Citation2019; Organisation for Economic Co-operation and Development [OECD] Citation2022a; Citation2022b).Footnote1 In the education sphere internationally, use of high-quality research to inform policy is supported by the activities of organisations such as the Education Endowment Foundation (EEF Citation2023) and the What Works Clearinghouse (WWC Citation2023) (see also Edovald and Nevill Citation2020; Hedges and Schauer Citation2018).

A number of recent developments reflect the Irish government’s commitment to evidence-informed policy and high-quality research and evaluation of policy. The recently published Impact 2030: Ireland’s Research and Innovation Strategy (Government of Ireland Citation2022) highlights the importance of research and innovation to addressing Ireland’s societal, economic and environmental challenges. It aims to support better links between policymakers and researchers. A series of Evidence in Policy Guidance Notes has been developed in recent years by the Department of Children and Youth Affairs (Citation2019) and its successor (Department of Children, Equality, Disability, Integration and Youth Citation2021a; Citation2021b; Citation2021c) through the Research and Evaluation Unit’s Evidence in Policy Programme, designed to support evidence-informed policy making. In 2012, the Irish Government Economic and Evaluation Service was established to enhance the role of economics and value for money analysis in public policy making (Government of Ireland, Citation2023).

The Department of Education (DoE), for its part, similarly emphasises the need for evidence-informed policies (Government of Ireland Citation2019). In the Action Plan for Education 2018, a commitment was made to developing a research-based framework for the evaluation of the impact of teachers’ professional learning in Ireland (Government of Ireland Citation2018), which has since been published (Gilleece, Surdey, and Rawdon Citation2023). A strong commitment to evaluation has also been evident from the outset of the Delivering Equality of Opportunity in Schools (DEIS) programme (Department of Education and Science [DES] Citation2005) – Ireland’s main policy response to educational disadvantage and a focus of this paper – although deficits in the availability of data and information were recognised in the updated 2017 plan (DES Citation2017a).

While nationally and internationally there is a strong emphasis on evidence-informed policy and the role of evaluation, the challenges of evaluating complex interventions (Skivington et al. Citation2021), and educational interventions in particular (Berliner Citation2002), have been recognised. A decrease over time has been documented in the use of experimental methods in educational psychology internationally (Brady et al. Citation2023) as well as a lack of replication in education and psychology (Plucker and Makel Citation2021). In education, the lack of methodologically rigorous evaluation of policies and initiatives has been highlighted in Europe (European Commission Directorate-General for Education Youth Sport and Culture Citation2022) while a dearth of experimental research in the United States was observed from the 1980s to the early 2000s, although the situation there has improved since 2002 (Hedges and Schauer Citation2018).

In the United States, policy-makers have been criticised for displaying a lack of recognition that ‘educational science is unusually hard to do’ and it has been claimed that ‘the government may not be serious about wanting evidence-based practices in education’ (Berliner Citation2002, 18). However, it may be argued that these views reflect an oversimplification of the role of research and evidence in informing the policy-making process (Cairney Citation2016) and the socio-political challenge is recognised as one important influence on educational evaluation (Golden Citation2020). Regarding the use of research or evaluation findings in Irish education specifically, one critique has argued that policy makers

want answers, solutions, and to know ‘what works’ or to be told that what is already in play is working, and researchers that do this, regardless of how devoid of context their research is, are more likely to be valued by the makers of policy. (Skerritt Citation2023,Q4] 10)

In this paper, we examine some current challenges in Irish educational evaluation and present some suggestions for future strengthening in this area. Specifically, we advocate for the greater use of Randomised Controlled Trials (RCTs) where appropriate and highlight current data gaps that limit opportunities to employ quasi-experimental methods. The paper firstly introduces key issues associated with evaluation, including evidence hierarchies, RCTs, and the evaluation of complex interventions. We then turn to the evaluation of the DEIS programme and one of its components, Reading Recovery, to illustrate some of the issues associated with evaluating a complex intervention in the Irish context. Finally, we outline some suggestions for more robust evaluation of educational policy in Ireland. This analysis is timely given the recently announced OECD review of DEIS and the stated intention of the DoE (Citation2023a) to consider ‘what has worked well, and what we need to continue to improve’.

Monitoring, evaluation, and evidence

Monitoring refers to

a continuing function that uses systematic collection of data on specified indicators to provide management and the main stakeholders … with indications of the extent of progress and achievement of objectives and progress in the use of allocated funds. (OECD Citation2002, 27–28)

In contrast, evaluation goes beyond monitoring and refers to ‘the process of determining the merit, worth and value of things’ (Scriven Citation1991, 139). In the domain of public policy evaluation, which is the main focus of this paper, the OECD (Citation2020, Section: Defining policy evaluation) describes policy evaluation as a

structured and objective assessment of an ongoing or completed policy or reform initiative, its design, implementation and results. Its aim is to determine the relevance and fulfilment of objectives, efficiency, effectiveness, impact and sustainability as well as the worth or significance of a policy.

Evaluation takes place for many purposes, including accountability; the generation of new knowledge about what works; and to improve agency capability through embedding a culture of improvement (Chelimsky and Shadish Citation2006). Challenges with policy evaluation include issues related to evaluation timing (the temporal challenge); resourcing; establishing causality; and socio-political considerations (Golden Citation2020). In addition, understanding the context in which reform is implemented is highlighted as a key consideration for evaluators (Golden Citation2020).

When considering the findings of a policy evaluation, it is important to assess the quality of the available evidence. With a strong history in medical literature, evidence hierarchies have been widely used to support judgements about the level of certainty of evidence provided by various research methodologies (Atkins et al. Citation2004; Guyatt et al. Citation1995; Oxford Centre for Evidence-Based Medicine Citation2011; Schünemann et al. Citation2022). In general terms, more rigorous approaches provide greater levels of certainty and a lower risk of bias (Murad et al. Citation2016), and as a consequence lend themselves to the drawing of stronger conclusions. Weaker study designs (such as case series) are typically placed at the bottom of the evidence hierarchy (often represented by a pyramid), and stronger designs (such as RCTs) feature higher up. Reviews or composites of other studies – e.g. high-quality systematic reviews and meta-analyses – are regarded as among the strongest forms of evidence (Murad et al. Citation2016).

Of relevance to the current discussion from the perspective of policy-makers, the New South Wales Department of Communities and Justice (Citation2020) indicate that evidence derived from a source near the bottom of an evidence hierarchy, such as a case study or expert opinion, can support conclusions in the vein of ‘there are signs that … ’ or ‘experts believe … ’. The authors note that ‘while evidence in this category cannot establish the effectiveness of programs, it can provide an indication of “what might work” when there is a lack of more robust research evidence’ (New South Wales Department of Communities and Justice Citation2020, 3). A higher ‘second level’ of the evidence pyramid comprises quasi-experimental designs such as regression discontinuity, instrumental variables, fixed effects, or difference-in-difference analyses. Cohort studies and case-controlled studies are also placed at this level, with findings used to report that ‘it is likely that … ’ (New South Wales Department of Communities and Justice Citation2020).

The highest levels of evidence (often deemed the ‘gold standard’) include RCT designs for single studies (Shadish, Cook, and Campbell Citation2002; Torgerson and Torgerson Citation2008), as well as high-quality meta-analyses and systematic reviews. Meta-analyses and systematic reviews combine results from many studies and are thus generally considered the strongest form of evidence. Individual RCTs – featuring random assignment of participants to a treatment or control group – are placed below high-quality meta-analyses and systematic reviews because their findings are based only on a single study. Findings from meta-analyses, systematic reviews and RCTs are said to support statements such as ‘it is shown that … ’ (New South Wales Department of Communities and Justice Citation2020).

While evidence hierarchies can provide users with general information about the risk of bias (with greater risk of bias likely in studies using methods towards the base of the pyramid), it is recognised that the pyramid is a simplistic representation that does not fully take into account more nuanced problems that may be associated with various methods (Murad et al. Citation2016). For example, systematic reviews and meta-analyses rely on the quality of included studies, the appropriateness of combining those studies, and the methods used in the review (Hattie, Rogers, and Swaminathan Citation2014; Higgins et al. Citation2022; Newman and Gough Citation2020; Paul and Leibovici Citation2014). A particularly pessimistic view suggests that as few as 3 in every 100 systematic reviews have adequate methods and are clinically useful (Moore, Fisher, and Eccleston Citation2022).

Recognising limitations with the traditional pyramid, various alternatives have been proposed (Evans Citation2003; Ho, Peterson, and Masoudi Citation2008; Petticrew and Roberts Citation2003). The GRADE approach is intended to support clinicians grading the quality of evidence and strength of recommendations on which clinical guidelines are based. It advocates for judgements about the strength of a recommendation to take into account the balance between benefits and harms, the quality of the evidence, translation of evidence into specific circumstances and the certainty of the baseline risk. It also highlights the need to take into account costs. Looking in more detail at the quality of evidence, this is said to be related to study design (e.g. observational or RCT), study quality (methods and execution of a study), consistency (in estimates of effects across studies) and directness (in the extent to which people, interventions and outcome measures are similar to those of interest) (Atkins et al. Citation2004). While evidence based on RCTs begins as ‘high quality’, confidence in the evidence can be reduced because of factors such as study limitations, inconsistency of results, indirectness of evidence, imprecision or reporting bias. In contrast, observational studies may be graded upwards under certain circumstances (Guyatt et al. Citation2008). A key feature of GRADE is the separation of judgement regarding the quality of evidence from decisions regarding the strength of recommendations, recognising that high-quality evidence does not necessarily imply strong recommendations and vice versa (Guyatt et al. Citation2008).

In considering the types of evidence being used to inform Irish educational policy, a welcome increase is noted in the use of systematic reviews to inform recent curricular and policy developments (e.g. French and McKenna Citation2022; Kennedy et al. Citation2023; Leavy et al. Citation2022). It may be noted that systematic reviews and meta-analyses rely on bringing together an existing body of evidence. In contrast, the focus of this paper is primarily on generating new knowledge on the effectiveness or otherwise of educational policies or programmes in Ireland, where primary research is typically required. For this reason, we do not consider in further detail specific issues associated with conducting or understanding systematic reviews or meta-analyses, but interested readers may wish to consult Hattie, Rogers, and Swaminathan (Citation2014), Higgins et al. (Citation2022) or Newman and Gough (Citation2020).

Randomised controlled trials

RCTs are currently not a common feature of the evaluation landscape in Irish education, although some recent exceptions to this are provided by Hickey et al. (Citation2017) and Ruttledge et al. (Citation2016). An early example of a large-scale experimental study conducted in Ireland by the Educational Research Centre examined the impact of standardised testing at primary level (Kellaghan, Madaus, and Airasian Citation1982). Internationally, too, it is recognised that the use of RCTs is relatively rare in wider programme evaluation (McDavid, Huse, and Hawthorn Citation2018) although influential and widely cited examples from the USA include the Perry Preschool Project (Schweinhart et al. Citation1985) and the Tennessee class-size experiment (Project STAR) (Mosteller Citation1995). These are of relevance to the current paper given the focus of the former on improving equity through the provision of high-quality pre-school education and of the latter on reduced class sizes, a key element of supports for Urban DEIS primary schools. Increased usage of RCTs in educational research has been documented in the period 1980–2016, with over half of RCTs in the period conducted in North America and under a third in Europe (Connolly, Keenan, and Urbanska Citation2018). A particular increase in the use of RCTs in the USA has been observed from 2002 onwards (Hedges and Schauer Citation2018).

From an evaluation perspective, RCTs hold several appealing characteristics which enable stronger (ideally, causal) conclusions to be drawn (for a more technical discussion, see e.g. Angrist and Pischke Citation2015). These include design features such as randomisation, blinding, prospective data collection, and pre-registration of analyses and outcomes of interest (Torgerson and Torgerson Citation2008). Briefly, these features are valuable to the evaluator for the following reasons.

Randomised allocation of participants to (one or more) treatment and control groups

The random assignment of participants to groups means that chance is (should be) the only difference between the groups before the treatment or intervention begins. This means that challenges to causal inference associated with selection bias that frequently occur in other study designs are obviated (Torgerson and Torgerson Citation2008). Where randomisation is not possible or not implemented, there are statistical methods which try to retrospectively simulate randomisation to ensure balance across groups – e.g. propensity score matching – but these techniques cannot avoid the risk of systematic differences on unmeasured variables and so are inferior to true randomisation (Deeks et al. Citation2003). It should be noted that randomisation in itself is not a silver bullet to evaluation as important features such as statistical power, sample size and balance across groups must still be considered carefully, especially if more than one intervention is being tested. Control groups can be assigned a placebo intervention to mirror the type of activities taking place in the treatment group, or can be delivered the same treatment on a delayed basis (after comparison with the ‘initial’ treatment group).

Blinded group assignment

This refers to the idea that participants should not know whether they are assigned to a ‘treatment’ or ‘control’ group. Ideally, trials are double-blinded (meaning that the experimenter or evaluator is also unaware which group is which), although this can sometimes be difficult to accomplish in the social sciences.

Prospective data collection

This means that relevant data can be collected from both groups at baseline – at a time pre-intervention – and then again at specified intervals during and/or after the intervention. The relevant timespan may depend on the nature of the intervention, its targeted or expected effects, and expectations for ‘fade-out’ or longer-term effects, as well as administrative and budgetary concerns. In practice, many trials or evaluations tend to cease data collection at the end of or relatively soon after the end of the intervention, leading to uncertainty about any longer-term effects. (For an example, see the discussion of Reading Recovery, below.)

Pre-registration

It should be possible for evaluators and sponsoring bodies to specify the goals of the RCT – including outcomes of interest, comparisons of interest, and planned analyses – at the beginning of the trial before ever collecting data. This clarity of purpose encourages reporting of negative or non-significant findings, making it more difficult to ‘hide’ findings that may be unexpected or unwanted from the perspective of a funder, and promotes fidelity to the planned trial and clear reporting of adaptations by making it easy for reviewers or outside readers to spot and seek explanation for any deviations from the original plan. For studies funded by the EEF, there is a requirement to publish on the EEF website a pre-specified protocol and statistical analysis plan for every trial. Also, the trial is registered in a specified registry and all EEF findings are published, reducing the risk of publication bias (Edovald and Nevill Citation2020).

In an educational setting, it is also particularly important to bear in mind the special features of cluster-randomised RCTs which apply when classes or schools, rather than individuals, are assigned to an intervention. These require specific analytical adjustments to account for the clustering (Higgins et al. Citation2022). Other challenges that can arise with RCTs in educational settings include low statistical power, limitations to the generalisability of the study and availability of suitable outcome measures. Issues with statistical power can arise when too few students, teachers or schools are sampled to detect the expected or realistic effects of an intervention. Problems can occur with the generalisability of a study when a small-scale intervention with specific resourcing and/or input from the intervention’s designer is evaluated, the findings of which are unlikely to be directly comparable to a scaled-up version of the same programme. A further challenge lies in the availability or collection of suitable outcome measures by which to evaluate the impact of an intervention given that, in educational settings, such outcomes often involve concepts or constructs (e.g. wellbeing, engagement, literacy) that have no clear shared definition, no clear shared metric for assessment, or are otherwise contested or contentious.

Critics of RCTs suggest that the approach fails to give sufficient consideration to how different intervention components interact with each other and with the local context (Berliner Citation2002). The importance of context has received particular attention in studies examining the impact of teachers’ professional development on student learning and other outcomes. Findings from qualitative studies highlight variation across schools in the implementation and sustainability of learning from professional development opportunities. Such variation may be associated with factors including levels of principal support, access to resources, collegial support, teacher personal commitment and systemic factors such as the influence of school policy on school autonomy (Haymore Sandholtz and Ringstaff Citation2016; Moynihan and O’Donovan Citation2022).

A simple example can be used to illustrate how intervention components might interact with each other (the evaluation of complex interventions is discussed in more detail in the next section). Suppose a literacy intervention targets comprehension skills and vocabulary knowledge for students, as well as parent outreach to support enjoyment of reading in the home. An evaluation of the intervention might usefully examine how outcomes varied between students who benefited from support with vocabulary and comprehension and also home support, compared to those for students who received different combinations of supports. Walsh et al. (Citation2023) provide a sophisticated means of examining multiple components of literacy interventions to assess how these may work to improve student outcomes.

In response to criticisms of RCTs, some suggestions have been advanced outlining proposed extensions (Bonell et al. Citation2012; Marchal et al. Citation2013). The importance of embedding a process evaluation to verify the extent to which an intervention was implemented as planned, alongside the ‘core’ evaluation of targeted outcomes, is recognised (Edovald and Nevill Citation2020; Siddiqui, Gorard, and See Citation2018; Skivington et al. Citation2021). Furthermore, while RCTs are good for assessing if an intervention ‘works’ in a narrow sense – i.e. as defined by a specific targeted outcome – it is acknowledged that there is a need to also ask a broader range of questions, including complex questions with uncertain answers (which may be those most useful to policy-makers or practitioners) rather than only narrower questions that can be answered with greater certainty (McDavid, Huse, and Hawthorn Citation2018; Skivington et al. Citation2021). At the same time, it remains the case that clear answers to narrow questions also have a role to play, and are often lacking in the education and broader social science evidence bases.

Complex interventions

In practical terms, it is not always feasible to design and implement an RCT, especially in an educational setting. This therefore necessitates the use of other forms of evidence to inform decision-making. In their guidance on conducting reviews of the effects of interventions, CochraneFootnote2 recognises that large-scale public health interventions or organisational changes represent occasions where randomised studies are unlikely to be available (Higgins et al. Citation2022, Section i-2-2-1). It is also recognised that there are particular complexities and challenges involved in evaluating complex interventions (Skivington et al. Citation2021; Thomas et al. Citation2022). In the UK, the Medical Research Council (MRC) and National Institute for Health Research have recently updated their guidance on developing and evaluating complex interventions, which they describe as follows:

An intervention might be considered complex because of properties of the intervention itself, such as the number of components involved; the range of behaviours targeted; expertise and skills required by those delivering and receiving the intervention; the number of groups, settings, or levels targeted; or the permitted level of flexibility of the intervention or its components. (Skivington et al. Citation2021, Section: What are complex interventions?)

Thomas et al. (Citation2022) prefer to refer to ‘intervention complexity’, recognising that no intervention is simple and thereby suggesting that complexity should be viewed as a multidimensional continuum. In this paper, we retain the term ‘complex intervention’ for consistency with the MRC guidance. Skivington et al. (Citation2021) recognise the need for the evaluation of complex interventions to go beyond asking whether or not the intervention achieves its intended outcome(s) to asking a broader range of questions, such as identifying any other impacts of the intervention, assessing its value relative to the resources required to deliver it, theorising how it works, taking account of its interaction with the context in which it is implemented, and its contribution to system change. For discussion of some of these and related issues in educational evaluation, see Anders et al. (Citation2017). The MRC framework describes four common perspectives underpinning most complex health intervention research projects:

  • Efficacy: the extent to which the intervention produces the intended outcomes in experimental or ideal settings;

  • Effectiveness: the extent to which the intervention produces the intended outcomes in real world settings;

  • Theory based: the circumstances under which the intervention works and how it works; and

  • Systems: the ways in which the system and the intervention adapt to one another.

Cochrane identifies three ways of understanding intervention complexity (Thomas et al. Citation2022):

  • the number of components in the intervention;

  • interactions between intervention components or interactions between the intervention and its context, or both; and

  • the wider system within which the intervention is introduced.

Importantly for the example of DEIS discussed below, Thomas et al. (Citation2022) note that having higher numbers of components in the intervention can make it difficult to understand which are most important and which (if any) are responsible for intervention effects. They also highlight that components can have synergistic (rather than additive) effects which should be considered, complexities in implementation that may be relevant, and note that the intervention and the system within which it operates may have bi-directional impacts on each other. Such complexities have given rise to modified versions of the classic RCT for behavioural interventions, including differentiation of a range of possible ‘active’ or ‘inactive’ control groups (with the former engaging in some activities or tasks during the intervention period, whereas the latter may be on a waiting list for delayed exposure to the intervention or may simply follow their usual patterns of behaviour) (Tock, Maheu, and Johnson Citation2022). In addition, multi-arm RCTs – i.e. testing more than one intervention condition at the same time – are common in the medical field, but require careful analysis and reporting for valid conclusions to be drawn on the relative efficacy of one intervention relative to, or in combination with, another (Juszczak et al. Citation2019).

Also relevant to the DEIS example discussed next, it has been recognised internationally that in educational programme evaluations, insufficient time is often given to understanding the evaluation problem and context, with data collection often taking place before there is a good understanding of the nature of the project to be evaluated; the key audiences and stakeholders; why the evaluation is taking place; the kinds of information that should be collected; and the criteria to be used for evaluation (Nevo Citation2006).

An Irish example of evaluating a complex intervention: lessons from the evaluation of DEIS

In this section, we draw on the evaluation of the DEIS programme in Ireland to illustrate some of the issues associated with the evaluation of complex interventions. We first describe key elements of DEIS (DES Citation2005; DES Citation2017a) and then consider aspects of its evaluation.

Introduced in 2005/2006 and updated in 2017, DEIS is the main policy response of the DoE to educational disadvantage in Ireland (DES Citation2005; DES Citation2017a). A key aim of DEIS was to integrate under one umbrella an assortment of earlier schemes and programmes that were in existence. These schemes included Early Start; Giving Children an Even Break; the Support Teacher Project (primary level); aspects of the Early Literacy Initiative including Reading Recovery, the Junior Certificate School Programme Literacy Strategy and Demonstration Library Project; the Home-School Community Liaison (HSCL) Scheme; the School Completion Programme (SCP); and the Disadvantaged Areas Scheme for second-level schools (DES Citation2005). In this way, DEIS is a complex intervention which may be viewed as comprising several complex interventions in their own rights, with varying degrees of complexity associated with each.

Originally, two separate systems for school identification were used at primary and post-primary levels, but these have been harmonised since 2017 using an identification system based on centrally-held data. The identification system underwent further refinement in 2022 and now includes adjustments for the percentages of students from Traveller or Roma backgrounds, living in International Protection Accommodation Services centres, or experiencing homelessness (DoE Citation2022c). At primary level only, a distinction is made between urban and rural schools, with the most disadvantaged urban schools assigned to Urban Band 1 and disadvantaged urban schools assigned to Urban Band 2. Primary schools in rural areas with the highest concentrations of pupils from disadvantaged backgrounds are described as DEIS Rural. There is some variation between the supports provided to primary schools in different DEIS bands, with the highest levels of support provided to Band 1 schools. One key support provided to Band 1 schools is the provision of reduced class sizes, while both Urban Band 1 and Band 2 schools have access to HSCL services. There is no banding of DEIS post-primary schools, where supports include access to HSCL, the SCP, and more favourable staffing arrangements and funding; see DoE (Citation2023b) for the full list of supports. With limited exception (e.g. in the event of school closure), all schools included in DEIS at the outset remain in the programme regardless of current levels of disadvantage. The programme was substantially extended in 2022 and now includes about 25% of the school-going population. Prior to the 2022 extension, approximately 20% of the overall school population were enrolled in DEIS schools, either at primary or post-primary level (DoE Citation2022c).

Although both the 2005 and 2017 versions of the DEIS plan were supported by literature reviews (Archer and Weir Citation2005; Weir et al. Citation2017), there has been comparatively limited articulation of programme theory which ‘describes how an intervention is expected to lead to its effects and under what conditions’ (Skivington et al. Citation2021, Section: Programme theory) or use of visual representations such as logic models. Skivington et al. (Citation2021, Section: What are complex interventions?) describe the mechanisms of change as ‘the causal links between intervention components and outcomes; and contextual factors, which determine and shape whether and how outcomes are generated’. In contrast to other aspects of DEIS, the evaluation of the School Meals Programme outlines the theory of change underpinning the programme (i.e. it articulates how the programme is intended to yield the desired effects) using a logic model that specifies inputs, activities, outputs, outcomes and impacts, which can be used to trace how specific inputs and activities are expected to give rise to specific outcomes or impacts (RSM Citation2022).

Extensive work has been undertaken to examine levels of student achievement (Gilleece et al. Citation2020; Kavanagh and Weir Citation2018; Nelis and Gilleece Citation2023; Weir and Kavanagh Citation2018; Weir, Archer, and Millar Citation2009), retention (Weir and Kavanagh Citation2018), attendance (e.g. Denner and Cosgrove Citation2020; Denner, Cosgrove, and Millar Citation2022; Millar Citation2017) and some non-cognitive outcomes (Nelis et al. Citation2021) in DEIS schools. Consideration has also been given to implementation issues such as reduced class sizes (Department of Education Citation2021; Kelleher and Weir Citation2017; Weir and McAvinue Citation2012), DEIS planning (Weir et al. Citation2014) and the work of HSCL coordinators (Weir et al. Citation2018).

A central feature of the DEIS programme is schools’ local flexibility (within specified parameters) in spending their DEIS grant. Schools are also required to establish locally relevant targets under several specified domains such as literacy, numeracy and attendance (DES Citation2017a). This leads to accepted variation in the local implementation of DEIS across participating schools.

Challenges in evaluating outcomes of DEIS

While monitoring and evaluation were presented as central to DEIS, an inability to draw causal conclusions has been a long-standing feature of evaluation work in this area (Denny Citation2015). The 2017 DEIS plan provided for the development of a comprehensive Monitoring and Evaluation framework, intended to gather information on, and assess, all aspects of the programme. It was intended to ‘improve transparency and to determine which interventions are having the greatest impact in terms of delivering better outcomes for learners’ (DES Citation2017a, 19). At the time of writing, this Monitoring and Evaluation framework is unpublished. This ‘causality’ challenge is not unique to the evaluation of DEIS or to educational evaluation in Ireland, being recognised as a challenge to the evaluation of educational policy initiatives internationally (European Commission Directorate-General for Education Youth Sport and Culture Citation2022; Golden Citation2020).

In considering the evaluation of DEIS, there are several relevant issues to note. Firstly, there is known variation in the student profile across schools in Ireland; i.e. not all disadvantaged students attend DEIS schools and conversely, not all students in DEIS schools are disadvantaged (Fleming and Harford Citation2021). This issue is relevant when specifying indicators of success for DEIS (Smyth, McCoy, and Kingston Citation2015): for example, is the aim of the programme to improve outcomes for all students in DEIS schools, or specifically for students from disadvantaged backgrounds? There appears to be increasing recognition by the Department of Education of the need to support disadvantaged students in non-DEIS schools, who are not currently targeted by the DEIS programme (DoE Citation2023a). A change in this regard would represent a move away from the rationale articulated in the 2017 DEIS Plan which referred to the ‘multiplier effect’ (the idea that students tend to have poorer academic outcomes in the context of concentrated disadvantage, even taking into account individual social background). The sixth challenge below considers the implications of this issue for data analysis.

Secondly, the length of time spent in the DEIS programme varies across schools, with some current DEIS schools also having participated in earlier schemes such as Giving Children an Even Break. It is known that organisational change and school improvement take time to be reflected in better outcomes (Smyth, McCoy, and Kingston Citation2015) so a school’s length of time in receipt of additional supports may be expected to be associated with outcomes. This is described as the temporal challenge of evaluation by Golden (Citation2020, 17) who notes that evaluators need to measure ‘the right outcome in the right place at the right time’. There are a number of implications of this challenge for the purposes of evaluation related to data availability and the enactment of DEIS in schools. While the length of time in DEIS could be considered as variation in ‘dosage’, it would be also necessary to take into account school participation in earlier or complementary schemes such as the HSCL scheme, as many of these components subsequently became part of the DEIS supports. With reference to the period prior to the introduction of DEIS, Weir and Archer (Citation2005) show that while there was considerable overlap of schools participating in HSCL, Disadvantaged Areas Scheme and the SCP, there was substantial variation in the numbers of schools with one, two or all three of these supports. There is also potential for spill-over effects of professional development and experiences of DEIS supports caused by movement of staff between schools. These issues mean that it is unrealistic to consider length of time in DEIS as a simple indicator of exposure to the intervention.

Thirdly, it is recognised that school self-evaluation is enacted differently across schools (Skerritt et al. Citation2023). In DEIS schools, the school self-evaluation process is replaced by DEIS action planning, so it is reasonable to anticipate similar variation in DEIS action planning across participating schools. Indeed, findings from Inspectorate reports show some variation across DEIS schools in the effectiveness of data usage and target setting as part of DEIS action planning (DoE Citation2022b). From an evaluation perspective, variation in DEIS planning across participating schools may be considered to reflect implementation variation. Variation in implementation across schools was also identified in a review of the SCP, where findings highlighted variation across participants in the perceived aims of the programme, variation in the proportion of the school population targeted across clusters, and variation across clusters in the emphasis placed on in-school, after-school, holiday and out-of-school supports (Smyth et al. Citation2015).

Fourthly, DEIS provides for several different supports for schools, with some variation between supports provided to schools in different DEIS bands. It is recognised in the 2017 DEIS Plan that there is very little information for analysis on the individual elements of the plan (DES Citation2017a, 53) and the challenge of disentangling which elements may work best has been highlighted (Smyth, McCoy, and Kingston Citation2015). It has also been noted that there has been a stronger focus on evaluating some components compared with others (Smyth et al. Citation2015). In addition to the range of supports provided directly under DEIS, school DEIS status is often used by other organisations such as universities or voluntary sector organisations to select schools or students for inclusion in extra-curricular opportunities or university access programmes (Bray, Hannon, and Tangney Citation2022; Fenwick, Kinsella, and Harford Citation2022; Gilleece and Nelis Citation2023). Thus, students in DEIS schools may benefit from supports that are not directly provided through the DEIS programme. To our knowledge, the issues of additive versus synergistic effects outlined by Thomas et al. (Citation2022) have not been considered in work on the DEIS evaluation.

Fifthly, we are not aware of detailed analysis of the role of wider (e.g. the National Literacy and Numeracy Strategy; DES Citation2011; Citation2017b) or complementary (e.g. the National Traveller and Roma Inclusion Strategy 2017–2021; Department of Justice and Equality Citation2017) system developments on DEIS, or vice versa. Karakolidis et al. (Citation2021a; Citation2021b) and Duggan et al. (Citation2023) examined educational inequalities in Ireland on national and international assessments at primary level before and after the introduction of the National Literacy and Numeracy Strategy (DES Citation2011). Mixed findings emerged, with results from the National Assessments of Mathematics and English Reading (NAMER) suggesting greater improvements in equity than results from international assessments. While their work takes into account the timing of the National Literacy and Numeracy Strategy in a general sense, it is not designed for causal interpretations of impact. In the absence of a more detailed focus on the potential interplay between the DEIS scheme and other policies, strategies and supports affecting the education system, it is difficult to disentangle any effects that may be attributable to DEIS on its own or in conjunction with other schemes.

A sixth challenge for monitoring and evaluation of DEIS relates to limitations with existing data. For example, there are currently no achievement data at the individual level for the population of primary pupils. While the Department of Education receive aggregated data from standardised testing in primary schools (DoE Citation2023c), schools are free to choose tests from different providers, the results of which are unlikely to be directly comparable (DES Citation2016). Furthermore, current data do not allow consideration of the achievement levels of important pupil subgroups, such as pupils from Traveller or Roma backgrounds or other pupil subgroups known to be at risk of educational disadvantage. This issue has been raised recently by the UN Committee on the Rights of the Child (Citation2023, 12) which recommended that Ireland

collect and analyse data disaggregated by ethnic origin, socioeconomic background and residence status on attendance and completion rates, educational outcomes, use of reduced timetables and participation in after-school activities to inform policies and programmes aimed at ensuring equal access of children in disadvantaged groups to quality education.

Similarly, at post-primary level, currently available data were not designed for monitoring achievement in DEIS schools. For example, data from international large-scale assessments are available for samples of students rather than the population (Gilleece et al. Citation2020), while State examination data do not reflect the full range of achievement documented in the Junior Cycle Profile of Achievement, which describes State-certified examination grades as well as achievement in other assessment modes and areas of learning (DoE Citation2022a). Since 2020, due to the onset of COVID-19, examination fees have not been charged for State examinations for any student which means that the examination fee-waiver indicator (related to family medical card possession) is not available for more recent datasets. This means that it is not possible to explore the interaction between individual socio-economic status and school profile using recent examination data. A related concern is that focusing on average achievement in DEIS schools provides an incomplete picture as there may be considerable variation across students in DEIS schools (Flannery, Gilleece, and Clavel Citation2023; Smyth, McCoy, and Kingston Citation2015).

In contrast to the situation with achievement data, Ireland has made greater progress with analysing attendance and retention data for subgroups of pupils at risk of educational disadvantage; recent analyses drew on linked data to examine the educational attendance and attainment of children in care (Central Statistics Office Citation2023). Other relevant considerations with respect to data that have been identified include the limitations of self-reported data (Fleming and Harford Citation2021), the need for greater availability of longitudinal data (RSM Citation2022; Smyth et al. Citation2015), and the need for further parent data (Nelis and Gilleece Citation2023; O’Toole et al. Citation2019).

As an illustration of some of the challenges identified here – in particular, the fourth and sixth challenges – we next take a closer look at the Reading Recovery (RR) programme which is used in many in DEIS schools and draw some contrasts with two other programmes that are also offered under DEIS (FRIENDS For Life, henceforth FRIENDS, and Incredible Years Teacher Classroom Management; IYTCM).

Reading Recovery, FRIENDS For Life, and Incredible Years Teacher Classroom Management

In this section, we turn to RR, FRIENDS and IYTCM as examples of programmes offered to DEIS schools. RR is one of the programmes specified in the 2005 DEIS plan intended to tackle literacy and numeracy problems in DEIS primary schools (others listed were First Steps, Mathematics Recovery and Ready, Steady, Go Maths; DES Citation2005). The 2017 DEIS plan and Action Plan for Education 2016–2019 provide for the roll-out of both FRIENDS and IYTCM to all DEIS primary schools and FRIENDS to all DEIS post-primary schools. FRIENDS is a cognitive behavioural based programme designed to reduce childhood anxiety and promote emotional resilience (Friends Resilience, Citation2023). IYTCM is a prevention programme to strengthen teacher classroom management strategies and promote children’s prosocial behaviour and school readiness (The Incredible Years Citation2023). Both programmes are delivered by the National Educational Psychological Service (NEPS) in Ireland. Findings from NAMER 2021 show that by Spring 2021, all Sixth class pupils in Urban Band 2 schools, 87% in Urban Band 1 schools and 77% in Urban Non-DEIS schools had principals who indicated that the FRIENDS programme was available in their school; corresponding percentages for IYTCM were 82%, 81% and 50% respectively (Gilleece and Nelis Citation2023).

In this section, we contrast the level of evaluation applied to RR in Ireland with that applied to FRIENDS and IYTCM. The rationale for focusing on RR in particular here is that its merits are the focus of ongoing debate in the international literature (e.g. May et al. Citation2023) and that there is a dearth of Irish-specific evidence. Notwithstanding the difficulties inherent to evaluating the effects of any individual components of the DEIS programme, the long-standing use of the RR programme as a key literacy support within DEIS schools provides an illustrative example demonstrating the need for such evaluative efforts.

RR is a programme which delivers one-to-one literacy instruction to identified pupils for 30 minutes per day for 12–20 weeks. It is expected to be available in DEIS schools, with ‘every school identified as designated disadvantaged under the DEIS initiative [being] asked to nominate one member of staff to train as the specialist Reading Recovery teacher’ (Professional Development Service for Teachers [PDST] Citation2023). Pupils are selected for participation in RR following testing and observation by the RR teacher, who identifies the lowest-performing pupils in Senior Infants and First Class (DES Citation2009). DEIS schools’ implementation of RR has been described as being ‘very successful’ in a report by the Inspectorate of the DoE on effective literacy practices (DES Citation2009, 106). While RR was specifically named in the 2005 DEIS plan, no particular programmes are named in the 2017 plan. Rather, schools are advised to ‘give particular consideration to currently provided evidence-informed provision which is well evaluated and is delivering measurable improvements in the outcomes sought for the pupil cohort’ (DES Citation2005, 48).

Data from NAMER 2021 support the contention that RR is widely used in DEIS primary schools and is more widely available in DEIS schools than in non-DEIS schools. Findings show that nearly 70% of participating Second class pupils in urban DEIS schools had principals who reported that RR was available in their school, compared to about 40% in urban non-DEIS schools (authors’ unpublished analyses). In schools where RR was reported to be available, it was generally considered by principals to be of high value for pupils. For example, focusing on Urban Band 1 schools where RR was available, over four-fifths of Second class pupils had principals who indicated that the value of the programme to pupils was high.

RR has been in use for several decades in New Zealand, Australia, the UK, the USA, Canada, and Denmark, as well as in Ireland. Over this period many studies on the impacts of RR have been conducted which have found positive short-term impacts of RR on aspects of literacy such as decoding (D’Agostino and Harmey Citation2016; Harmey and Anders Citation2023). Critiques of this literature have noted methodological limitations to some studies, such as the lack of an appropriate comparison group or the intervention not being implemented as planned (U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse Citation2013), as well as relatively weak evidence of any long-term positive impact (D’Agostino and Harmey Citation2016; May et al. Citation2023). Other critiques have focused on potential biases within the RR programme itself. These include the exclusion of the lowest-performing readers, and a misalignment between what is counted as success in RR and the skills that children need to read successfully outside RR (Cook, Rodes, and Lipsitz Citation2017; Grossen, Coulter, and Ruggles Citation1997).

Nonetheless, of 202 studies examined by the What Works Clearinghouse in 2013, 3 were deemed to meet the requisite standards and, overall, these studies showed small positive or potentially positive effects of RR on student outcomes (U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse Citation2013). Similarly, the Irish Government’s WhatWorks hub gives RR a cautiously positive ratingFootnote3 while noting that their verdict on RR ‘does not include evaluation conducted in Ireland’ (What Works Citation2023).

The latter observation – the absence of any high-quality evaluations on the literacy outcomes targeted by RR in Ireland – is noteworthy given the status of RR as a component of the DEIS programme’s efforts (inter alia) to improve literacy standards, almost 20 years after the initiation of DEIS.Footnote4 RR is included in a guide to effective interventions for struggling readers compiled by the National Educational Psychological Service (NEPS Citation2019), albeit that the guide indicates that inclusion in the publication is not an endorsement of a programme. Nonetheless, school staff and parents may expect that, wherever possible, consideration has been given to the demonstrated or likely impacts of the programme in Ireland, particularly in DEIS schools.

This absence of local evidence may be of some concern to policymakers given some of the questions that have been raised about RR. A recent study by May et al. (Citation2023) in the USA tested the long-term impacts of RR by examining literacy outcomes in Third grade and Fourth grade.Footnote5 The conclusions of this study are pessimistic: ‘Results suggest that the long-term impact of Reading Recovery … in 3rd and 4th grades is statistically significant and substantially negative’ (1). Specifically, May et al. (Citation2023) found that students who had participated in RR while in First grade subsequently had reading scores that ‘were, on average, .19 to .43 standard deviations (about one-half to one full grade level) below the state test scores of similar students who did not participate in Reading Recovery’. Similarly, an evaluation in Australia reported that although their findings showed

short-term positive effects of RR on reading outcomes for the lowest performing students, they do not support the effectiveness of the intervention on other aspects of literacy achievement or the longer-term sustainability through the early years of school. (Bradford and Wan Citation2015, 21)

In Ireland, a state-of-the-art review of literacy in early childhood and primary education commissioned by the NCCA (Kennedy et al. Citation2012) noted the, at that point, ‘limited evidence’ for the effectiveness of any literacy intervention programmes in early childhood in Ireland. The authors called for greater focus in this area and specifically highlighted a paucity of evidence on the literacy outcomes of interventions in DEIS schools, noting that

no evidence on the impact of First Steps on the reading and writing development of children in urban DEIS schools has been published to date. It seems important that initiatives such as this would be evaluated intensively and their findings made known. (194)

The Government’s own WhatWorks hub provides evidence that, as of November 2023, no suitably high-quality evaluations of RR in Ireland have been established in the decade since the NCCA’s review (WhatWorks Citation2023).

The purpose of highlighting RR’s position as one among many components of the DEIS programme in this paper is not to single out RR specifically. As noted above, the international research on RR is extensive and contested, and merits a more complete review than can be provided here for a fuller discussion. However, this brief overview is enough to illustrate the point that the evidence that is currently available on one of the key components of DEIS – one of the DoE’s flagship policy initiatives – is of variable quality and somewhat mixed. Equally importantly, such evidence is largely lacking in the specific context which policymakers and stakeholders in Ireland care about (i.e. schools in Ireland). The best way to address that gap in knowledge would be to design and carry out a suitable, high-quality evaluation of the implementation and outcomes of RR that can draw conclusions about the effectiveness of the programme in DEIS schools or in schools in Ireland more broadly.

In contrast to RR, FRIENDS and IYTCM – introduced to support pupil wellbeing (Friends Resilience, Citation2023) and classroom management (The Incredible Years Citation2023) respectively – have been subject to greater levels of evaluation in Ireland (Davey and Egan Citation2021; Hickey et al. Citation2017; Kennedy et al. Citation2021; McGilloway et al. Citation2009; Citation2010; Citation2012; Ruttledge et al. Citation2016).

A key strength of the study of FRIENDS undertaken by Ruttledge et al. (Citation2016) was the use of an RCT design, designed to replicate international evaluations in an Irish setting. Schools that applied to participate in the programme were allocated to either an intervention group or a waiting-list control group. Fidelity of implementation was examined through use of a checklist for teachers. Findings show that the programme was implemented successfully by teachers and positive outcomes for students were observed. However, an important limitation of the study was that the clustering of students within schools (a key consideration, as noted above) was not accounted for at the analysis stage, with the authors recognising the need for hierarchical linear modelling in future work to adequately account for school effects.

An Irish-based study of IYTCM also used random assignment of teachers to either an intervention group or waiting-list control group, with a further strength being consideration of the costs of the programme (Hickey et al. Citation2017). A recent review by the European Commission Directorate-General for Education Youth Sport and Culture (Citation2022) noted that comparison of the costs of different interventions and programmes are still comparatively rare in the education literature but they emphasise that focusing on costs is as important as focusing on effectiveness, given limited public resources.

Despite the wide national roll-out of the FRIENDS and IYTCM programmes, some of the evaluations have been limited by small numbers of participants, or by unrepresentative samples. For example, work by Leckey et al. (Citation2016) was based on a sample size of 11 teachers and the authors noted that ‘ … participants were exceptionally motivated and driven, both towards continuing professional development and, crucially, towards the creation of positive child outcomes’ (51). The study was therefore limited both in terms of statistical power to detect any effects (due to the small sample) and in terms of generalisability to the wider population of teachers (due to the unrepresentative sample).

Concluding remarks: moving forward with educational evaluation in Ireland

Golden (Citation2020) examines recent trends in educational evaluation and contextual factors that can promote robust education policy evaluation in OECD countries. She also outlines building blocks for a strong evaluation culture. The first of these relates to understanding complex contexts and includes reference to microdata infrastructures for contextual analysis. The second calls for evaluative thinking across the education policy cycle. The third building block relates to regulation, funding and guidance. We use these building blocks to frame our concluding remarks.

Building block 1: understanding complex contexts

With regard to working to understand complex contexts, we suggest that the evaluation of complex interventions would be supported by greater articulation of the theory of change underpinning the intervention, routine gathering of data to support monitoring, and use of high-quality experimental methods where appropriate. With regard to DEIS, limitations in the availability of achievement data at the individual student level, at both primary and post-primary levels, is a limiting factor which could usefully be addressed, particularly amidst ongoing Senior Cycle reform and a foreseeable need to evaluate the eventual impacts of that reform. Addition of achievement measures (that are comparable across pupils) to the Primary Online Database and its post-primary counterpart for research purposes would go some way towards supporting robust evaluation. These challenges are not unique to DEIS; for example, similar issues regarding a lack of theoretical clarity and limited monitoring of outcomes have previously been raised as challenges to evaluation of the Transition Year programme, another long-standing and prominent feature of the Irish education system (Clerkin Citation2018; Citation2020; Clerkin, Jeffers, and Choi Citation2022).

In the absence of strong evidence for a particular programme or initiative, it may be useful to revisit the design used in the 1980s to examine the impact of standardised testing (Kellaghan, Madaus, and Airasian Citation1982) whereby, at the population level, schools were randomised and an RCT was conducted. This requires not only funding but also sustained political will, related to the ‘socio-political challenge’ described by Golden (Citation2020). The use of RCTs in education has been shown to be feasible in the UK and the USA, from where important lessons are available (Edovald and Nevill Citation2020; Hedges and Schauer Citation2018). Siddiqui, Gorard, and See (Citation2018) outline some issues related to the feasibility of aggregated trial studies in the UK with trials managed and conducted by school staff. They suggest that school participation in a school-led trial can offer a valuable alternative to action research for teachers; this approach may also be worthy of consideration in Ireland.

When RCTs are not feasible, consideration should be given to research methods at the highest levels of the evidence hierarchy that are feasible. Ideally, any programme evaluation should use study designs that can support causal inferences, including quasi-experimental designs such as fixed effects, regression discontinuity, instrumental variables or difference-in-differences approaches (Angrist and Pischke Citation2015; European Commission Directorate-General for Education Youth Sport and Culture Citation2022), although it is recognised that all of these methods rely on having access to appropriate datasets. Also, in common with all statistical approaches, these methods have underlying assumptions and support the examination of narrow questions, again underscoring the need for effective communication between evaluators and policymakers on the purpose of evaluation. Skivington et al. (Citation2021, Section: Summary points) emphasise that

a trade-off exists between precise unbiased answers to narrow questions and more uncertain answers to broader, more complex questions; researchers should answer the questions that are most useful to decision makers rather than those that can be answered with greater certainty.

In addition, issues of ethics are often raised in the context of RCTs (Edovald and Nevill Citation2020; Torgerson and Torgerson Citation2008). It has been argued in Ireland that it would not have been ethical to conduct an RCT of DEIS as that would have involved withholding ‘treatment’ from pupils who could benefit from that treatment (Weir and Kavanagh Citation2018). A further practical obstacle to evaluating DEIS using an experimental design was the existence of a range of earlier schemes – involving different sets of schools for varying lengths of time – which were all subsequently incorporated under a single umbrella with DEIS. Thus, defining a clear ‘treatment’ group is difficult, given that some DEIS schools had received supports under earlier schemes but not all had received all of the supports. A regression-discontinuity (RD) design can serve as an alternative when randomised designs are not implemented and it has been noted that RD may be relevant to the evaluation of DEIS given that assignment to DEIS depends on a school’s score on a continuous variable (Denny Citation2015). While the sample sizes required for RD have been shown to limit their viability for impact evaluations of educational interventions (Schochet Citation2008), there would nonetheless be considerable merit in further examining the potential of these and similar methods, and requisite data, in Irish educational evaluation.

Building block 2: evaluative thinking

The second building block for a strong evaluation culture refers to the need for evaluative thinking across the education policy cycle (Golden Citation2020). Irish educational evaluation may benefit from wider use of evaluability assessment to determine whether and how an intervention can usefully be evaluated (Skivington et al. Citation2021) and/or more detailed ex-ante evaluation. For example, in their evaluation of JUMP Math, Eivers et al. (Citation2014) indicated that the programme materials were a good match for the Irish curriculum but that the amount of professional development fell far short of what was needed to effect behaviour change in the classroom. Ex-ante evaluation would likely have identified these shortcomings prior to implementation of the programme.

Undoubtedly, educational evaluation would benefit from greater focus on evaluation at the outset of policy initiatives, as well as clear definition of the key metrics associated with success. Positive examples of a focus on evaluation include the roll-out of the IYTCM and FRIENDS programmes, which were designed to support evaluation (as described above) by using the waiting-list group as a control group. This model may usefully be applied to other programmes. The evaluation of the School Meals Programme included financial modelling (RSM Citation2022) which is rare in the Irish educational evaluation literature. The need for cost–benefit analysis and economic evaluation such as this has been identified by international scholars (European Commission Directorate-General for Education Youth Sport and Culture Citation2022; Skivington et al. Citation2021) and may merit further consideration in Irish educational evaluation alongside wider efforts at evaluation.

An important point is the need to build in a periodic review of the evidence supporting the use of various interventions. As Berliner (Citation2002) describes, social and cultural changes give rise to a need for updated research and in some instances, new knowledge and evidence may support adapting or replacing interventions that were once deemed appropriate but which may have served their time. Also, given limited resources, it is important to examine instances where it could be appropriate to divert resources from less effective programmes to those that are more promising (Edovald and Nevill Citation2020). At the time of writing (August 2023), a revised Literacy, Numeracy and Digital Literacy Strategy is currently under development (DoE Citation2023d). We strongly recommend that the final strategy includes explicit reference and support for appropriate evaluation of any initiatives or programmes – such as RR – that are tasked with yielding stronger literacy, numeracy, or digital literacy outcomes in Irish schools.

Building block 3: resources, funding and guidance

Finally, Golden (Citation2020) refers to the need for resources, regulation and guidance as key considerations for evaluation efforts, while the European Commission Directorate-General for Education Youth Sport and Culture (Citation2022) references the need for a common framework for policy evaluation for EU Member States. The latter would support comparisons of evaluation outcomes across EU countries to support in the identification of successful policies. In the United States and Canada, educational evaluation is supported by evaluation standards pertaining to utility, feasibility, propriety, accuracy and evaluation accountability (Joint Committee on Standards for Educational Evaluation Citation2018; Yarbrough et al. Citation2011). In Ireland, considerable work has been done by the Department of Children, Equality, Disability, Integration and Youth (Citation2021a; Citation2021b; Citation2021c) and its predecessor, the Department of Children and Youth Affairs (Citation2019), in developing practical guides to support high-quality evaluation in Ireland. Adherence to appropriate standards and guides should promote high-quality evaluation.

A further example of a specific framework intended to support high-quality evaluation in education in Ireland is the recently published evaluation framework for teachers’ professional learning (Gilleece, Surdey, and Rawdon Citation2023). A new Monitoring and Evaluation framework for DEIS is also planned (DES Citation2017a). The publication of evaluation frameworks such as these is a useful step towards more robust and informative methods of evaluating policies and educational innovations, but must also be supported by a concerted effort to plan (and resource) suitable evaluative efforts at an early stage in the lifecycle of initiatives as they are introduced. From a range of perspectives – including effectiveness, efficiency, and improvement – it is important that policy makers and school leaders can have access to the most reliable evidence that can be provided in order to adequately meet the needs of pupils, their teachers, and their families.

International research has highlighted the issue of conflicts of interest potentially weakening evaluative efforts (Grossen, Coulter, and Ruggles Citation1997; Macnamara and Burgoyne Citation2023) and has emphasised the need for independent, impartial research and evaluation (Edovald and Nevill Citation2020; Hedges and Schauer Citation2018). In recent years, this issue has been raised particularly with respect to digital technologies in the classroom where it is argued that much of the current evidence on educational outcomes is provided by technology companies (UNESCO Citation2023). While the increased use of RCTs, as argued for here, would not eliminate potential bias, it is argued that the forms of bias that may threaten the validity of RCTs are even more likely to occur in non-randomised trials (Torgerson and Torgerson Citation2008). The example of the US shows how increasing capacity amongst education researchers is central to improving the quality of evidence, and how upskilling the scientific workforce has been an important part of the expansion of RCTs there in recent years (Hedges and Schauer Citation2018).

Acknowledgements

An earlier version of this paper was presented by the first author at the Randomised Controlled Trials in Education workshop held in Dublin City University in May 2023. The authors gratefully acknowledge helpful feedback received at the workshop and from anonymous reviewers of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Lorraine Gilleece

Lorraine Gilleece is a Research Fellow at the Educational Research Centre (ERC). She currently oversees work on educational disadvantage at the Centre, in particular work related to the Delivering Equality of Opportunity in Schools (DEIS) programme. She has recently led the ERC’s work on the development of an evaluation framework for teachers’ professional learning. Her research interests centre on using large-scale assessment data to explore equity in education.

Aidan Clerkin

Aidan Clerkin is a Research Fellow at the Educational Research Centre (ERC). He oversees aspects of ERC’s work on large-scale assessment, such as the international TIMSS study, and leads other strands of work including longitudinal study of outcomes associated with the Transition Year programme. He recently led the development of an online platform for delivery and reporting of standardised tests to schools. His research interests include social-emotional development, student engagement and wellbeing, and their relationships with academic achievement; intervention and programme evaluations; and longitudinal research methods.

Notes

1 The name of the Government department with responsibility for education has changed on a number of occasions and is currently known as the Department of Education. It was most recently renamed in October 2020, when it was renamed from the Department of Education and Skills. From 1997 to 2010, it was known as the Department of Education and Science.

2 Cochrane (Citation2024) is a global network that gathers and analyses the best available evidence to support informed decisions about health and healthcare. While its focus is on systematic reviews in health, its guidance has wider relevance and application and is informative when considering evaluation in education.

3 From https://whatworks.gov.ie/hub-search/report/45/Reading%20Recovery:

‘Level 3 indicates evidence of efficacy. This means the programme can be described as evidence-based: it has evidence from at least one rigorously conducted RCT or QED demonstrating a statistically significant positive impact on at least one child outcome. This programme does not receive a rating of 4 as it has not yet replicated its results in another rigorously conducted study, where at least one study indicates long-term impacts, and at least one uses measures independent of study participants’ (as of 24th November 2023).

4 Professional development for teachers in Reading Recovery has been provided by the Professional Development Service for Teachers (PDST) for several years; see https://www.pdst.ie/primary/literacy/reading-recovery. From 1st September 2023, the PDST has become part of the new integrated support service Oide.

5 Equivalent to Third Class and Fourth Class in Ireland.

References