1,653
Views
0
CrossRef citations to date
0
Altmetric
Review Article

Qualitative evaluations of women’s leadership programs: a global, multi-sector systematic review

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2213781 | Received 25 Jan 2023, Accepted 09 May 2023, Published online: 24 May 2023

ABSTRACT

Objective

The contribution of women’s leadership programs to gender change in organisations is controversial, and evidence of programs’ effectiveness is siloed across countries, sectors and industries. This systematic review aimed to provide a summary of current global efforts to evaluate women’s leadership programs.

Method

A systematic review protocol was registered with Open Science Framework prior to data extraction. Eight databases from multidisciplinary fields including (but not limited to) behavioural, social, physical, health and life sciences, management and business, and gender and women’s studies were searched for academic papers examining the outcomes of women’s leadership programs. Twenty-four studies were appraised for methodological quality using Joanna Briggs Institute guidelines and 16 studies (11 peer reviewed articles and five theses) were included in the review.

Results

Data were synthesized using an updated Kirkpatrick typology with seven categories used to classify evaluation outcomes. Subjective outcome levels were addressed more frequently than objective levels. Promotion to a leadership position was the sole objective outcome addressed, but methodological limitations of the included studies mitigate a link between programs and women’s career advancement.

Conclusions

Currently, the global evidence of women’s leadership programs’ impact on individuals and organisations is inconclusive. This systematic review emphasises the need for enhanced methodological and theoretical rigour to guide the development of future women’s leadership programs and their evaluation.

Key Points

What is already known about this topic:

  1. Reasons for the persistent underrepresentation of women in leadership are multiple and complex, and women leaders face both structural and individual barriers in their efforts to ascend to the top levels of organisations.

  2. Research has largely moved away from a micro-level focus on how best to assimilate women into existing, male-dominated workplaces, to a focus on dismantling structural barriers to women’s leadership such as meso-level organisational policies and practices and macro-level national and societal factors such as culture and legislation. In practice, however, micro-level strategies to address gender imbalance in leadership are frequently employed, and there is much debate regarding their capacity to contribute to gender change in organisations.

  3. Our knowledge about the impact of women’s leadership programs is limited and fragmented, as is our understanding of how this impact is assessed.

What this paper adds:

  1. This systematic review is the first to map the quality and nature of women’s leadership program evaluations globally, demonstrating the applicability of a systematic review methodology to leadership, management, and organisational psychology research.

  2. Our paper challenges the premise that individual level strategies can affect organisational and cultural change by examining the evidence of the effectiveness of micro-level approaches to women’s leadership development.

  3. The paper includes practical recommendations to advance women’s leadership program evaluation practice and research, emphasising the need for enhanced methodological rigour and realistic expectations regarding both the program and the evaluation.

Despite compelling ethical (Kalev et al., Citation2006; Skaggs, Citation2008) and economic (Alcázar et al., Citation2013; Desvaux et al., Citation2008; Noland et al., Citation2016; Woetzel, Citation2015) arguments for more women in leadership, and decades of equal employment opportunity legislation and affirmative action programs, women hold less than a quarter (24%) of senior roles globally (Grant Thornton, Citation2018). Reasons for the persistent underrepresentation of women in leadership are multiple and complex, and women leaders face both structural and internal barriers in their efforts to ascend to the top levels of organisations. Structural factors such as normative perceptions of men as leaders (Ibarra et al., Citation2013), gendered career paths and gendered work (Hoobler et al., Citation2011), and lack of access to networks and sponsors (Davey, Citation2008), and individual factors such as a relative lack of confidence, communication skills, self-efficacy and willingness to self-promote (Kay & Shipman, Citation2014; O’Neil & Hopkins, Citation2015; Pelfrey et al., Citation2022) are commonly cited barriers to women’s career progression.

Metz and Kumra (Citation2019) identify a clear evolution in the academic literature on women’s leadership from a focus on fixing the women (1967–1976), to fixing the organisation (1977–1996), to fixing the environment (1998-present). These periods describe approaches to gender equality in leadership that can also be categorised as micro-, meso-, and macro-level, respectively (van Esch et al., Citation2017). Research has largely moved away from a micro-level focus on how best to assimilate women into existing, male-dominated workplaces, to a focus on dismantling structural barriers to women’s leadership such as meso-level organisational policies and practices and macro-level national and societal factors such as culture and legislation (Metz & Kumra, Citation2019).

In practice, however, micro-level strategies to address gender imbalance in leadership are frequently employed (J. A. de Vries & van den Brink, Citation2016; European Commission, Citation2010; Leimon et al., Citation2010; Nentwich, Citation2006; Zahidi & Ibarra, Citation2010). Women’s leadership programs are an enduring example of a micro-level approach, encouraging personal agency, self‐awareness, self‐confidence, and the development of leadership skills and networks within an all-female environment (O’Neil & Bilimoria, Citation2005; Weyer, Citation2007). Rather than being a forgotten relic of the “fix the women” era, demand for women’s leadership programs is high (J. de Vries, Citation2010; Debebe et al., Citation2016; Kassotakis, Citation2017; Kolhatkar, Citation2016), and many major organisations have a women’s leadership program (Anderson et al., Citation2008; Sandler, Citation2014). Taking a global perspective, over the last two decades 36% of Australian universities had implemented staff development programs exclusively for women (Tessens, Citation2008). In the UK, this figure was 31% (Bagilhole, Citation2002), and in the US, over one third of the top-ranking business schools offer women’s-only leadership programs (Sugiyama et al., Citation2016).

Leadership programs cost organisations billions of dollars annually (Gurdjian et al., Citation2014; Kassotakis & Rizk, Citation2015) and it is reasonable to argue that investment of resources to develop women leaders should be focused on evidence-based results (Kassotakis, Citation2017). However, there is a dearth of leadership development frameworks specifically for women (Gipson et al., Citation2017) and management journals rarely publish leadership development studies with woman-only participants (Madsen & Scribner, Citation2017; Ngunjiri & Gardiner, Citation2017). As such, there is little empirical evidence to suggest if and when women’s leadership programs are most beneficial (Harris & Leberman, Citation2012; Kulik & Roberson, Citation2008b) and there is much debate regarding their capacity to contribute to gender change in organisations (J. A. de Vries & van den Brink, Citation2016; R. J. Ely & Meyerson, Citation2000; Nentwich, Citation2006; Zanoni et al., Citation2010).

Much of the scholarly literature that informs our current understanding of women’s leadership development is descriptive (J. de Vries, Citation2010), or focuses on practical recommendations for organisations to implement (R. J. Ely et al., Citation2011; Hopkins et al., Citation2008) rather than implementing and evaluating these recommendations within organisational settings (Gipson et al., Citation2017). There are few studies that both describe and evaluate women’s leadership programs, let alone compare their outcomes (Bierema, Citation2017; Debebe, Citation2009; Debebe et al., Citation2016; Hayward & Voller, Citation2010) and any evidence that does exist for the effectiveness of women’s leadership programs is siloed within individual programs, industries, sectors, and countries (Kulik & Roberson, Citation2008a; Ngunjiri & Gardiner, Citation2017). As such, our knowledge about the impact of women’s leadership programs is limited and fragmented, as is our understanding of how this impact is assessed. Responding to calls in the literature for research to inform the improved evaluations of women’s leadership programs (Debebe et al., Citation2016; Gipson et al., Citation2017; Kassotakis, Citation2017), we identify a need to universally appraise, synthesise and report on the current nature of women’s leadership programs and the quality of women’s leadership program evaluations.

For the purpose of the current work, evaluation is defined as a systematic determination of the merit, worth or value of a program and includes “the collection of descriptive and judgemental information that is necessary to make decisions about the utility of training efforts and identify areas for modification and improvement” (K. Ely et al., Citation2010, p. 588). Critically, evaluations need to be of sufficient quality to inform learning and accountability regarding a program’s effectiveness (Hirsh et al., Citation2011; Raifman et al., Citation2017; Stufflebeam, Citation2001).

Evaluation may serve a number of purposes and may be conducted through a number of methods (Easterby-Smith, Citation1994; Voller, Citation2010). Women’s leadership programs vary in their design, target audience, organisational and cultural contexts, and intended outcomes (Clarke, Citation2011; J. de Vries, Citation2010; Debebe et al., Citation2016; Vinnicombe & Singh, Citation2002), and their evaluation may consequently vary in response to the nature and emphasis of the program itself (Debebe et al., Citation2016; Grove et al., Citation2005). Evaluations may be conducted internally (i.e., the evaluators are affiliated with the organisation) or externally (Hirsh et al., Citation2011; Torres & Preskill, Citation2001), and for summative purposes (i.e., to assess the results of a women’s leadership program after completion) and/or formative purposes (i.e., to improve program development and implementation during the intervention; Easterby-Smith, Citation1994; Ford & Sinha, Citation2008; Voller, Citation2010). Data may be collected once, or at multiple times pre-, mid-, post-program, and/or longitudinally, and from one or multiple stakeholders (Grove et al., Citation2005; Hirsh et al., Citation2011).

Although there is no standard evaluation model for women’s leadership programs, Kirkpatrick’s (Citation1979, Citation1996) four-step taxonomy (measuring reactions, learning, behaviour and results) dominates the general training evaluation literature (K. Ely et al., Citation2010; Ford & Sinha, Citation2008; Hirsh et al., Citation2011), and provides a useful framework for mapping program outcomes. Reactions (level 1) refers to individuals’ subjective evaluations of their training experiences. Learning (level 2) refers to the retention of training material. Behaviour (level 3) refers to the influence of a program on an individual’s work-related behaviours. Lastly, Results (level 4) refers to the influence of training on organisational objectives. Examining the frequency with which these outcome levels are measured, as well as the purposes and methods of women’s leadership program evaluations, will aid our understanding of the function of evaluation in different contexts, which outcomes levels are considered pertinent and possible to measure, the validity of reported outcomes, and the sustainability of outcomes over time.

Evaluations can be based on qualitative or quantitative data, or both (mixed methods). Qualitative data from participant observations, case studies, in depth interviews, narrative and document analysis, and focus groups can be particularly useful for evaluators to explore and explain the effectiveness or otherwise of interventions (Farmer et al., Citation2006; Foster-Fishman et al., Citation2005; Lockwood et al., Citation2017; Marecek, Citation2003). Questions relating to the utility of women’s leadership programs and their impact can be addressed through open-ended responses from participants, facilitators, managers and other relevant stakeholders (Farquhar et al., Citation2006; Lockwood et al., Citation2017). Not only can these responses provide evaluators with participant reactions to the program, but they can also provide information regarding knowledge gained and transfer of learning once participants return to work (Basl, Citation2000). To our knowledge, there are no existing qualitative reviews in this area.

This systematic review focuses on women’s leadership program evaluations based on qualitative data. We have included descriptive and demographic information relating to the included publications and women’s leadership programs to provide a complete and exhaustive summary of the current global status of academic efforts to evaluate women’s leadership programs within organisations. Though commonly used in the health sciences to inform best practice, systematic reviews in the field of Industrial and Organisational psychology are rare (Briner & Rousseau, Citation2011; Rojon et al., Citation2011) and those which exist are often poorly executed (Schalken & Rietbergen, Citation2017). For these reasons, the methodology for this qualitative systematic review is informed by guidelines from the Joanna Briggs Institute (Lockwood et al., Citation2017), for a rigorous and defensible methodology.

Review questions

The main questions for this systematic review are:

  1. What is the methodological quality of women’s leadership program evaluations?

  2. What are the descriptive and demographic characteristics of women’s leadership program evaluations and of the programs themselves?

  3. How are women’s leadership programs evaluated? Secondary questions include: What data collection methods and time points are used?; For what purposes are program evaluations conducted?; What evaluation methodologies are used?; and What levels of outcomes are measured?

Method

Review registration

The systematic review protocol was registered with the Open Science Framework a priori to the commencement of data extraction.

Inclusion and exclusion criteria

This systematic review considered peer reviewed and non-peer reviewed academic papers written in English from multidisciplinary fields including (but not limited to) behavioural, social, physical, health and life sciences, management and business, and gender and women’s studies. The focus was on qualitative data including (but not limited to) research designs and methods such as grounded theory, ethnography, phenomenology, action research, and feminist research. Included studies were those that evaluated women’s leadership programs conducted within organisational settings. There were no limits applied in terms of country, industry, or sector. Included participants were women who had completed a women’s leadership program, as well as program providers and other relevant stakeholders from organisations that had hosted a program. Excluded studies were non-academic publications, papers using quantitative or mixed methods, and papers that did not evaluate women’s leadership programs or identify women’s leadership program outcomes.

Search strategy

A specialist librarian assisted in developing the search strategy. Initially developed in PsycINFO, the search strategy was then adapted for all other databases to suit local syntax, thesaurus terms and subject headings. A fully mapped search strategy for PsycINFO appears in Appendix A. The following electronic databases were systematically searched from the date of database inception to December 2021: PsycINFO, Web of Science, Emerald Insight, Business Source Complete, Health Business Elite, Academic Search Complete, Sociological Abstracts, and Scopus. All relevant journal articles were manually reviewed for forward and backward citations to ensure no relevant papers were missed. That is, the reference lists of all relevant papers were checked for additional relevant papers (forward citations), and a Google Scholar search looked for additional relevant papers that had cited the original journal articles (backward citations). Academic grey literature was sought from Google Scholar and ProQuest. Each database was searched, and citations were imported into reference management software (EndNote). Duplicate articles were identified by EndNote and removed by hand. The titles and abstracts of all remaining articles were screened by the first author and irrelevant articles were excluded.

Assessment of methodological quality

Full text qualitative papers selected for retrieval were assessed by two independent reviewers for methodological rigour prior to inclusion in the review using the JBI Critical Appraisal Checklist for Qualitative Research (Lockwood et al., Citation2015), a standardised critical appraisal instrument from the Joanna Briggs Institute System for the Unified Management, Assessment and Review of Information (JBI SUMARI). All papers were independently reviewed by the first author and a second reviewer; the second and third authors independently reviewed half the papers each. Any disagreement that arose between authors was to be resolved by the third reviewer, however, the reviewers were able to resolve all disagreements within their original pairings.

Data extraction

Qualitative data extracted from papers included information regarding the publications, the women’s leadership programs, and the program evaluations. Descriptive and demographic features of the included studies included paper type (peer reviewed journal article or thesis), authors, country of origin, and the industries in which these studies were conducted. Program-related data included program aims, participant and cohort descriptions, recruitment procedures, program length, and whether the program was discrete or embedded in wider gender change efforts. Information pertaining specifically to women’s leadership program evaluations included whether the evaluation was conducted internally or externally, for summative or formative purposes, the methods, sources and time points of data collection, and the levels of outcomes measured.

Data synthesis

Many researchers have supplemented Kirkpatrick’s model with objective outcome categories designed to better inform the modification and improvement of programs and gauge program impact at the structural level. Previous systematic reviews examining the effectiveness of general leadership development programs (Frich et al., Citation2015; Voller, Citation2010) have adopted the typology of evaluation outcomes outlined in . In keeping with this typology, seven categories were used to classify evaluation outcomes: Reaction (1), Knowledge (subjective) (2A), Behaviour/Expertise (3A), System results/Performance (subjective) (4A) and System results/Performance (objective) (4B).

Table 1. Adapted Kirkpatrick typology of evaluation outcomes for leadership programs.

Results

Study inclusion

A Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow diagram was created to demonstrate the number of studies remaining at each stage of the review process, and to document the reasons for study exclusion (see ). There were 42,994 studies identified from the search strategy and a total of 34,678 were eligible for inclusion after duplicates were removed. All papers were screened for inclusion based on title and abstract and 34,589 were excluded. A total of 89 publications were assessed for inclusion through full text review using the inclusion/exclusion criteria. Sixty-six studies were then excluded after full text review, leaving 24 studies that were appraised for inclusion. Of these, eight were excluded due to extremely poor methodological quality (i.e., few items on the JBI tool were endorsed or program outcomes were unclear). Appendix B contains information on excluded studies and the reasons for exclusion.

Figure 1. PRISMA diagram.

Alt Text: A flow diagram depicts the flow of information through the different phases of this systematic review. It maps out the number of records identified, included and excluded, and the reasons for exclusions.
Figure 1. PRISMA diagram.

Methodological quality

contains the quality appraisals of the 16 included studies. Eleven of these were peer reviewed articles and five were theses. The most common weaknesses in the study designs were failure to address the influence of the researcher on the research, and vice-versa (nine papers), and a lack of evidence of ethical approval (nine papers). Six papers lacked a statement locating the researcher culturally or theoretically, and five lacked congruity between the stated philosophical perspective and the research methodology. Participants and their voices were adequately represented, and conclusions drawn in the research report flowed from the analysis, or interpretation, of the data in 14 papers. There was congruence between the research methodology and the methods used to collect data, the research methodology and the interpretation of data, and the research methodology and the interpretation of results in 15 papers. The research methodology and research questions or objectives were congruent in all 16 papers.

Table 2. JBI-Qualitative critical appraisal checklist.

Characteristics of included studies

contains descriptive and demographic features of the evaluation studies and details of the women’s leadership programs.

Table 3. Descriptive and demographic characteristics of the studies and details of the women’s leadership programs.

Descriptive and demographic features

A total of 16 papers, evaluating 12 programs, were reviewed. The same women’s leadership program is discussed in Debebe (Citation2009, Citation2011, Citation2017), J. de Vries (Citation2010), J. A. de Vries and van den Brink (Citation2016), Calizo (Citation2011) and Selzer et al. (Citation2017).

Of the 12 programs evaluated, five were delivered in the USA, two in Australia, and one program each in New Zealand and Ireland. A further three programs were delivered across multiple countries; one paper evaluated a collaborative program between Ethiopia and the USA, another between South Africa and Australia, and one program included global participants. Three papers evaluated women’s leadership programs in more than one industry. Six programs were set in the context of higher education, two each in academic medicine and STEM (Science, Technology, Engineering and Maths), one program each in policing and government, and one program was open to participants from multiple industries.

Women’s leadership program descriptions

Program aims varied considerably. Two programs sought to identify and explore issues that impacted on women’s personal and professional development (Brue & Brue, Citation2016; Monks & Barker, Citation1999) whereas four stated the explicit purpose of increasing the percentage of women leaders (Calizo, Citation2011 [WILD program]; Clarke, Citation2011; Debebe, Citation2009, Citation2011, Citation2017; Harris & Leberman, Citation2012). The development of personal and professional skills required for leadership featured in seven programs (Broyles, Citation2019; Brue & Brue, Citation2016; Gibbs, Citation2016; Harris & Leberman, Citation2012; Selzer et al., Citation2017; Vinas, Citation2018). The onus was on participants to act as champions for gender equity within their organisation in Kvach et al. (Citation2017), but the program examined in J. de Vries (Citation2010) and J. A. de Vries and van den Brink (Citation2016) had a concurrent focus on gendered organisational change. Other program aims included establishing a strong network of women (Calizo, Citation2011), encouraging the retention of women (Vinas, Citation2018), and an increase in the quality and quantity of research publications (Louw and Zuber-Skeritt, Citation2009).

Samples ranged from three to 76 participants. Participants were recruited from a single program cohort (i.e., one in which the participants all start and finish at the same time) in two papers (Louw and Zuber-Skeritt, Citation2009; Selzer et al., Citation2017). The remaining 14 papers recruited participants from multiple program cohorts. There were single papers recruiting participants from two (Clarke, Citation2011), three (Brue & Brue, Citation2016) and four (Vinas, Citation2018) program cohorts, and 11 papers recruited participants from unspecified multiples of program cohorts (Broyles, Citation2019; Calizo, Citation2011; J. de Vries, Citation2010; J. A. de Vries & van den Brink, Citation2016; Debebe, Citation2009, Citation2011, Citation2017; Gibbs, Citation2016; Harris & Leberman, Citation2012; Kvach et al., Citation2017; Monks & Barker, Citation1999). Participants varied between studies by way of their position, department, industry, sector, and nationality. Eight papers did not clearly refer to how women were selected for the leadership program (Broyles, Citation2019; Brue & Brue, Citation2016; Debebe, Citation2009, Citation2011, Citation2017; Kvach et al., Citation2017; Louw & Zuber-Skeritt, Citation2009; Monks & Barker, Citation1999). Where recruitment details were stated, women were invariably selected for their existing leadership performance or potential (Calizo, Citation2011; Clarke, Citation2011; Gibbs, Citation2016; Harris & Leberman, Citation2012; Selzer et al., Citation2017; Vinas, Citation2018). The single exception to this was J. de Vries (Citation2010) (and, by extension, and J. A. de Vries & van den Brink, Citation2016), who placed emphasis on a diverse and broadly representative cohort group, eschewing a “merit” based and competitive selection process.

Program length ranged from two days (Monks & Barker, Citation1999) to two years (Louw & Zuber-Skeritt, Citation2009). Six of the 12 programs ran for two weeks or less (Broyles, Citation2019; Debebe, Citation2009, Citation2011, Citation2017; Harris & Leberman, Citation2012; Kvach et al., Citation2017; Monks & Barker, Citation1999; Vinas, Citation2018), five programs ran between six and 12 months (Brue & Brue, Citation2016; Clarke, Citation2011; J. de Vries, Citation2010; J. A. de Vries & van den Brink, Citation2016; Gibbs, Citation2016; Selzer et al., Citation2017), and one program length was indeterminable (Calizo, Citation2011).

Five programs were embedded as part of a broader organisational strategy to address gender inequity in leadership, as explicitly discussed in the papers (Calizo, Citation2011; Debebe, Citation2009, Citation2011, Citation2017; J. A. de Vries & van den Brink, Citation2016; J. de Vries, Citation2010; Kvach et al., Citation2017; Selzer et al. Citation2017; Vinas, Citation2018). Of the remaining seven programs, five papers did not state whether the program was discrete or embedded with other gender equity practices (Broyles, Citation2019; Brue & Brue, Citation2016; Clarke, Citation2011; Louw & Zuber-Skeritt, Citation2009; Monks & Barker, Citation1999), and two papers made only vague references to wider gender equality strategies (Gibbs, Citation2016; Harris & Leberman, Citation2012).

Program evaluation

lists the modes (internal/external), purposes (summative/formative), methods and outcome levels of the included evaluations. Six of the 12 women’s leadership programs were evaluated internally. Three programs were designed, delivered, and evaluated by the authors (J. de Vries, Citation2010; J. A. de Vries & van den Brink, Citation2016; Kvach et al., Citation2017; Monks & Barker, Citation1999), and three were evaluated by past participants or staff of the organisation (Broyles, Citation2019; Louw & Zuber-Skeritt Citation2009; Selzer et al., Citation2017). Evaluations were conducted externally in seven studies (Calizo, Citation2011; Clarke, Citation2011; Debebe, Citation2009, Citation2017; Gibbs, Citation2016; Harris & Leberman, Citation2012; Vinas, Citation2018). It was unclear whether evaluations were internal or external in two papers; one author was a former employee of the organisation in which the program was run (Brue & Brue, Citation2016) and another received research funding from the host organisation (Debebe, Citation2011).

Table 4. Evaluation purpose, methods and outcomes from 16 qualitative evaluations of women’s leadership programs.

All papers employed evaluation for summative purposes only. All data were collected after program completion (i.e., none were collected pre-program), with nine studies collecting data at a single time point (Broyles, Citation2019; Brue & Brue, Citation2016; Clarke, Citation2011; J. de Vries, Citation2010; J. A. de Vries & van den Brink, Citation2016; Debebe, Citation2009, Citation2011, Citation2017; Kvach et al., Citation2017; Louw & Zuber-Skeritt, Citation2009; Selzer et al., Citation2017; Vinas, Citation2018). Four studies collected data at two time points (Calizo, Citation2011; Gibbs, Citation2016; Harris & Leberman, Citation2012; Monks & Barker, Citation1999). Data collection time points varied considerably between studies, ranging from immediately post-program (Monks & Barker, Citation1999) to 5.5 years post-completion (Gibbs, Citation2016). All but two studies (Louw and Zuber-Skeritt, Citation2009; Selzer, Howton & Wallace, Citation2017) included participants from multiple program cohorts resulting in considerable ambiguity in terms of the time that had elapsed between the completion of the program and data collection.

A total of eight studies used data gleaned exclusively from participant interviews, including face to face in three studies (Kvach et al., Citation2017; Louw & Zuber-Skeritt, Citation2009; Vinas, Citation2018) and via telephone in five studies (Brue & Brue, Citation2016; Clarke, Citation2011; Debebe, Citation2009, Citation2017; Harris & Leberman, Citation2012). Data triangulation featured in the evaluation of six programs, with studies eliciting responses from both participants and secondary informants such as supervisors, program administrators, instructors, or mentors (Broyles, Citation2019; Calizo, Citation2011; J. de Vries, Citation2010; J. A. de Vries & van den Brink, Citation2016; Debebe, Citation2011), and though autoethnography (Selzer et al., Citation2017). There were single cases of studies utilising written responses from participants (Monks & Barker, Citation1999), and secondary data from interview transcripts (Gibbs, Citation2016).

No formal models of program evaluation were used. Therefore, qualitative data from all papers were coded as per the seven outcome levels listed in . Subjective outcome levels (1, 2A, 3A, 4A) were addressed with greater frequency than objective outcomes levels (2B, 3B, 4B). Of the subjective outcome levels, all 16 papers addressed Reaction (1), Knowledge (subjective) (2A) and Behaviour/Expertise (subjective) (3A). System results/Performance (subjective) (4A) was addressed by 5 studies. The only objective outcome level addressed was System results/Performance (objective) (4B). Notably, nine of the 15 studies evaluated outcomes at this level, as indicated by references to promotions data. However, of these studies, three acknowledged that it was difficult (Clarke, Citation2011) or impossible (Calizo, Citation2011; Clarke, Citation2011) to establish a direct causal link between the leadership program and advancement to leadership positions.

Discussion

Despite the proliferation of women’s leadership programs globally, there have been limited attempts to investigate the impact of these programs. Fewer attempts have been made to examine how this impact is actually assessed, and this systematic review is the first to map the quality and nature of women’s leadership program evaluations globally. We identified 11 peer reviewed articles and five theses that described outcomes of women’s leadership programs. There was considerable heterogeneity in the methodological quality of the articles, in the women’s leadership programs, and in the quality and nature of the evaluations.

Almost thirty years ago, Gray (Citation1994, p. 203) lamented that most of the literature on women’s leadership programs was “decontextualized, unreflective and pragmatic”. The state of the literature to date still warrants the same criticism. A large body of qualitative literature excluded from this review either simply described a program (13), or unreflectively stated the benefits of women’s leadership programs generally (20). Furthermore, eight papers evaluating women’s leadership programs were excluded due to extremely poor methodological quality, (as per Appendix B).

Of the studies included in the review, methodological shortfalls related to a lack of authors’ philosophical, cultural and theoretical statements, acknowledgement of the influence of the researcher on the research, and vice versa, and evidence of ethics approval. This is in line with previous studies examining program evaluation in women’s leadership (Tessens, Citation2008) and diversity management (Alhejji et al., Citation2016) which found only a quarter of papers explicitly stated a philosophical approach or theoretical perspective and the contribution of the paper to theory. An absence of women’s leadership theory in programs and evaluations studies is problematic. Echoing Gray’s (Citation1994) criticisms, a-theoretical interventions have been accused of being cosmetic, fragmented and ineffective (Sandler, Citation2014), resulting in (at best) slow progress or (at worst) harm, in that they reinforce inequality (Janssens & Zanoni, Citation2014).

Women’s leadership program evaluations are conducted in multiple countries around the world, however US and Australian programs dominate the literature. Nine of the 12 programs were either conducted or co-developed in the US or Australia. This has considerable implications for our understanding of intersectionality in women’s leadership development, and how global economic and cultural complexities affect how women become leaders (Ngunjiri & Gardiner, Citation2017). Just as masculine theories of leadership development may not be applicable or beneficial to all women, leadership interventions perpetuating US or Australian cultural norms may be ineffective or inappropriate for women from other countries.

In addition to US and Australian domination, programs conducted in higher education, academic medicine and STEM comprise the bulk of the literature. This skew towards the public sector and science industries is congruent with the general leadership program evaluation literature (Voller, Citation2010), though multiple explanations from the leadership and diversity management literature are proposed for cross-sector differences in evaluation practices. An emphasis on evidence-based decision-making, obligations to demonstrate public investment value, and the academic outputs required of researchers (Hayward & Voller, Citation2010; Voller, Citation2010) may account for this discrepancy. Conversely, in the private sector, a lack of incentive to publish results (Voller Citation2010), commercial sensitivity (Hayward & Voller, Citation2010), different managerial priorities regarding diversity (Johansen & Zhu, Citation2017), or fear of unfavourable results (Alhejji et al., Citation2016), may explain a lack of published evaluations.

Programs varied considerably in terms of their aims, relationship to other gender strategies, length, participants, frequency of contact with cohort and instructors, and recruitment methods. Five programs were acknowledged as being embedded as part of a wider gender change strategy. The concurrent focus of J. de Vries (Citation2010) and J. A. de Vries and van den Brink’s (Citation2016) program on organisational level change was particularly noteworthy. While no programs were explicitly identified as being stand-alone, it is concerning that there was not more focus placed on the integration of women’s leadership programs with simultaneous meso- and macro-level strategies, as there is strong evidence that bundling interventions is the most effective way to tackle organisational problems generally (Becker & Gerhart, Citation1996; Kulik, Citation2014), and when addressing women’s leadership development specifically (Hopkins et al., Citation2008; Ngunjiri & Gardiner, Citation2017).

Limitations of the evaluation studies reflect those found in the general leadership development literature in terms of their summative focus, lack of longitudinal design, minimal focus on learning transfer, and an overreliance on self-reported data and subjective outcome levels (Voller, Citation2010). Leadership development occurs over time (Day et al., Citation2014), as does learning decay (Riggio, Citation2008), and training impact does not necessarily occur in a linear progression from input to results via knowledge and behaviour (Hirsh et al., Citation2011). As such, the evaluation of women’s leadership programs requires a long-term perspective (K. Ely et al., Citation2010). More than two-thirds of studies in this review evaluated a women’s leadership program at a single time point, leading to indeterminate conclusions about the sustainability of program impact.

In addition to being a continuous process, leadership development is also intensely personal (K. Ely et al., Citation2010), making participants logical sources of program feedback. However, self-assessments are notoriously subject to bias (Collins & Holton, Citation2004; Kruger & Dunning, Citation1999) and may be significantly different to outcomes reported by other stakeholders (Alhejji et al., Citation2016). Also, self-assessments are related only moderately to other measures of knowledge and performance (Ely & Sitzmann, Citation2007; Mabe & West, Citation1982) and are not intended to substitute for learning or behavioural assessments (K. Ely et al., Citation2010). As only six papers triangulated self-report data, it is possible that program outcomes may be overemphasised in some studies.

There are multiple complexities to consider when evaluating women’s leadership programs and multiple factors that may influence women’s advancement to leadership. The influence of factors such as “innate talent, individual motivation, luck, and competing home-life issues” (Kassotakis, Citation2017, p. 405) can fluctuate over time and be difficult to measure. These factors can also be difficult to disentangle from the complexities of evaluation, and link causally to program outcomes. Sampling issues in the reviewed evaluations further complicated this task. A vast majority of studies examined program experiences and outcomes across multiple cohorts and multiple program evolutions, using purposeful sampling techniques or giving little detail as to how participants were selected for the studies. As a result, it is difficult to determine which program cohorts or which versions of the program related to which outcomes, or whether a sampling bias towards those who had benefitted from the program magnified reports of program impact.

Clarke (Citation2011, p. 508) recognised that “outcomes may also have been influenced by where participants were in their careers when they undertook the program”. Although it may be intuitively appealing to refer to promotions data as evidence for program effectiveness, it can be problematic to assume a link between a leadership program and the promotion of women. It could be argued that a leadership program had little or no impact on the career trajectories of those women who were selected for the program based on their existing or potential leadership abilities. Indeed, this was acknowledged by three of the nine papers that referred to promotions data when examining program outcomes. Of greater concern, when considering how many studies attempted to measure program outcomes at the system level (4A = 5/16; 4B = 9/16), and how few programs were explicitly embedded as part of a wider gender change strategy (5/12), is the assumption of a link between the development of individual women and organisational level change. Fundamentally, many academics argue that attempts to increase the “body count” (Due Billing & Alvesson, Citation2009) of senior women is a necessary but insufficient condition of organisational transformation towards gender equity (Due Billing & Alvesson, Citation2009).

A final concern regarding evaluations was that only eight were clearly conducted by researchers who were not involved or invested in the program or organisation. When considering the failures of multiple authors to address the influence of the researcher on the research (and vice-versa), the subsequent risk of author bias towards affirming their own program, the risk of participant response bias when reporting their experiences to internal evaluators, inflated participant self-assessments compared to others’ or objective assessments, and the conflation of promotions with program effectiveness, we return to Gray’s original criticism of the women’s leadership program literature. Apart from Calizo (Citation2011), J. de Vries (Citation2010), and Selzer et al. (Citation2017) who highlighted the ineffectiveness of their evaluated programs, this systematic review concludes that qualitative evaluations of women’s leadership programs globally are largely affirming but unreflective.

These findings should be interpreted considering two main limitations. First, studies were restricted to those from the academic literature. As discussed, women’s leadership evaluations may go unpublished or be inaccessible for various reasons, and we therefore cannot claim to have exhausted the grey literature on women’s leadership program evaluation practices. Secondly, this review was limited to studies published in English. Due in part to this limitation, US and Australian leadership norms dominate the studies in this review and the importance of intersectionality in women’s leadership development was not fully examined.

The following recommendations to advance women’s leadership program evaluation practice and research emphasise the need for enhanced methodological rigour and realistic expectations regarding both the program and the evaluation. A greater focus on formative evaluation will provide valuable information regarding program development and improvement in addition to summative conclusions about whether the program worked. Data collection from multiple sources at multiple outcome levels and time points will allow for longitudinal and triangulated evidence of program impact. Evidence of participants’ knowledge and behavioural change resulting from the program should be sought via multi-source data collection before and after the intervention. Evidence should also be sought that participants advance more rapidly as a result of the program than do comparable women who did not complete it. The development of partnerships between external evaluators and internal stakeholders will aid in avoiding bias and potential conflicts of interest and provide more objective assessments of program outcomes.

Future research needs to consider and test the wider theories of women’s leadership that guide scholarship and understanding (van Eschet al., Citation2017). First, women’s leadership programs need to be viewed through a wider lens of intersectionality (Ngunjiri & Gardiner, Citation2017) to address the barriers to women’s leadership on a global level. Second, there is a dearth of theories focussed specifically on women’s leadership development (van Esch et al., Citation2017). Critical tests of theory are therefore vital to our understanding of women’s leadership. Opening the black box of evaluations, a theory of change plots a “logic model” of individual program steps and explains “how the intervention is expected to bring about the desired results rather than just describing the results” (Mayne, Citation2012, Section 3, para. 1). In future, approaches to evaluation that utilise a theory of change can be used to draw conclusions about whether and how a women’s leadership program contributed to observed outcomes.

As per the diversity management literature (Hirsh et al., Citation2011), academic debate is polarised between positivist and post-modern paradigms with vastly different perspectives and assumptions guiding research into women’s leadership development (Storberg-Walker & Natt och Dag, Citation2017). Multiplism is a “mixed method design strategy that stresses how validity of results can be enhanced via convergence of results from multiple methods, theoretical orientations, and political or value perspectives” (Ford & Sinha, Citation2008, p. 16). This paper has examined qualitative approaches to women’s leadership program evaluation. Future research examining quantitative and mixed methods approaches are required to further extend the evidence base for women’s leadership programs.

Multiple barriers to evaluation exist such as inadequate funding for rigorous evaluation, insufficient knowledge about administering an evaluation, and a lack of long-term evaluation priorities (Russon & Reinelt, 2004), reflecting a general perception that the “perfect” evaluation is unattainable (Hirsh et al., Citation2011). However, it is imperative that stakeholders place greater priority on the validity of their approaches to women’s leadership development and the evaluations of programs. Funding for rigorous evaluation should be factored into the initial development of a new program. This will place a high standard of evidence on providers to justify programs and demonstrate that an organisation’s investment pays off (Avolio et al., Citation2009; Riggio, Citation2008). It will also allow organisations to demonstrate that their efforts towards creating more gender equitable workplaces are legitimate. Currently, the global evidence of women’s leadership programs’ impact on individuals and organisations is far from conclusive.

Supplemental material

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Supplementary material

Supplemental data for this article can be accessed at https://doi.org/10.1080/00049530.2023.2213781

References

  • * Denotes study included in this review
  • Alcázar, F. M., Fernández, P. M. R., & Gardey, G. S. (2013). Workforce diversity in strategic human resource management models: A critical review of the literature and implications for future research. Cross Cultural Management: An International Journal, 20(1), 39–20. https://doi.org/10.1108/13527601311296247
  • Alhejji, H., Garavan, T., Carbery, R., O’Brien, F., & McGuire, D. (2016). Diversity training programme outcomes: A systematic review. Human Resource Development Quarterly, 27(1), 95–149. https://doi.org/10.1002/hrdq.21221
  • Anderson, D., Vinnicombe, S., & Singh, V. (2008). Women only leadership development: A conundrum. In K. James & J. Collins (Eds.), Leadership learning (pp. 147–160). Palgrave Macmillan.
  • Avolio, B. J., Reichard, R. J., Hannah, S. T., Walumbwa, F. O., & Chan, A. (2009). A meta-analytic review of leadership impact research: Experimental and quasi-experimental studies. The Leadership Quarterly, 20(5), 764–784. https://doi.org/10.1016/j.leaqua.2009.06.006
  • Bagilhole, B. (2002). Challenging equal opportunities: Changing and adapting male hegemony in academia. British Journal of Sociology of Education, 23(1), 19–33. http://www.jstor.org/stable/1393095
  • Basl, M. (2000). Comparative analysis of quantitative and qualitative methods in French non-experimental evaluation of regional and local policies: Three cases of training programmes for unemployed adults. Evaluation, 6(3), 323–334. https://doi.org/10.1177/13563890022209316
  • Becker, B., & Gerhart, B. (1996). The impact of human resource management on organizational performance: Progress and prospects. Academy of Management Journal, 39(4), 779–801. https://doi.org/10.5465/256712
  • Bierema, L. L. (2017). No woman left behind: Critical leadership development to build gender consciousness and transform organisations. In S. Madsen (Ed.), Handbook of research on gender and leadership (pp. 145–162). Edward Elgar Publishing.
  • Briner, R. B., & Rousseau, D. M. (2011). Evidence-based I–O psychology: Not there yet but now a little nearer? Industrial and Organisational Psychology, 4(1), 76–82. https://doi.org/10.1111/j.1754-9434.2010.01301.x
  • *Broyles, W. G. H. (2019). Profiles of a Non-Profit Statewide College Women’s Leadership Training Program’s Effectiveness [ Doctoral dissertation]. Liberty University.
  • *Brue, K. L., & Brue, S. A. (2016). Experiences and outcomes of a women’s leadership development program: A phenomenological investigation. Journal of Leadership Education, 15(3), 75–97. https://doi.org/10.12806/V15/I3/R2
  • *Calizo, L. S. H. (2011). A case analysis of a model program for the leadership development of women faculty and staff seeking to advance their careers in higher education [ Unpublished doctoral dissertation]. University of Maryland.
  • *Clarke, M. (2011). Advancing women’s careers through leadership development programs. Employee Relations, 33(5), 498–515. https://doi.org/10.1108/01425451111153871
  • Collins, D. B., & Holton, E. F. (2004). The effectiveness of managerial leadership development programs: A meta‐analysis of studies from 1982 to 2001. Human Resource Development Quarterly, 15(2), 217–248. https://doi.org/10.1002/hrdq.1099
  • Davey, K. M. (2008). Women’s accounts of organisational politics as a gendering process. Gender, Work & Organisation, 15(6), 650–671. https://doi.org/10.1111/j.1468-0432.2008.00420.x
  • Day, D. V., Fleenor, J. W., Atwater, L. E., Sturm, R. E., & McKee, R. A. (2014). Advances in leader and leadership development: A review of 25 years of research and theory. The Leadership Quarterly, 25(1), 63–82. https://doi.org/10.1016/j.leaqua.2013.11.004
  • *Debebe, G. (2009). Transformational learning in women’s leadership development training. Advancing Women in Leadership, 29(7), 2–12. https://doi.org/10.21423/awlj-v29.a264
  • *Debebe, G. (2011). Creating a safe environment for women’s leadership transformation. Journal of Management Education, 35(5), 679–712. https://doi.org/10.1177/1052562910397501
  • *Debebe, G. (2017). Navigating the double bind: Transformations to balance contextual responsiveness and authenticity in women’s leadership development. Cogent Business & Management, 4(1), 1–28. https://doi.org/10.1080/23311975.2017.1313543
  • Debebe, G., Anderson, D., Bilimoria, D., & Vinnicombe, S. M. (2016). Women’s leadership development programs [special issue of journal of management education]. Journal of Management Education, 40(3), 608–611. https://doi.org/10.1177/1052562914533088
  • Desvaux, G., Devillard-Hoellinger, S., & Meaney, M. C. (2008). A business case for women. The McKinsey Quarterly, 4, 26–33.
  • *de Vries, J. (2010). A realistic agenda?: Women only programs as strategic interventions for building gender equitable workplaces [ Unpublished doctoral dissertation]. University of Western Australia.
  • *de Vries, J. A., & van den Brink, M. (2016). Transformative gender interventions: Linking theory and practice using the “bifocal approach”. Equality, Diversity & Inclusion: An International Journal, 35(7/8), 429–448. https://doi.org/10.1108/EDI-05-2016-0041
  • Due Billing, Y., & Alvesson, M. (2009). Understanding gender and organizations: An introduction to epistemology. Understanding Gender and Organizations, 1–272.
  • Easterby-Smith, M. (1994). Evaluating management development, training and education (2nd ed.). Gower Publishing Company.
  • Ely, K., Boyce, L. A., Nelson, J. K., Zaccaro, S. J., Hernez-Broome, G., & Whyman, W. (2010). Evaluating leadership coaching: A review and integrated framework. The Leadership Quarterly, 21(4), 585–599. https://doi.org/10.1016/j.leaqua.2010.06.003
  • Ely, R. J., Ibarra, H., & Kolb, D. M. (2011). Taking gender into account: Theory and design for women’s leadership development programs. Academy of Management Learning & Education, 10(3), 474–493. https://doi.org/10.5465/amle.2010.0046
  • Ely, R. J., & Meyerson, D. E. (2000). Advancing gender equity in organisations: The challenge and importance of maintaining a gender narrative. Organisation, 7(4), 589–608. https://doi.org/10.1177/135050840074005
  • Ely, K., & Sitzmann, T. (2007). Self-reported learning: What are we really measuring. In 22nd Annual Conference of the Society for Industrial-Organizational Psychology Conference, New York, NY.
  • European Commission. (2010). More women in senior positions: Key to economic stability and growth. Publications Office of the European Union. https://doi.org/10.2767/92882
  • Farmer, T., Robinson, K., Elliott, S. J., & Eyles, J. (2006). Developing and implementing a triangulation protocol for qualitative health research. Qualitative Health Research, 16(3), 377–394. https://doi.org/10.1177/1049732305285708
  • Farquhar, S. A., Parker, E. A., Schulz, A. J., & Israel, B. A. (2006). Application of qualitative methods in program planning for health promotion interventions. Health Promotion Practice, 7(2), 234–242. https://doi.org/10.1177/1524839905278915
  • Ford, J. K., & Sinha, R. (2008). Advances in training evaluation research. In S. Cartwright & C. Cooper (Eds.), The Oxford handbook of personnel psychology. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199234738.003.0013
  • Foster-Fishman, P., Nowell, B., Deacon, Z., Nievar, M. A., & McCann, P. (2005). Using methods that matter: The impact of reflection, dialogue, and voice. American Journal of Community Psychology, 36(3–4), 275–291. https://doi.org/10.1007/s10464-005-8626-y
  • Frich, J. C., Brewster, A. L., Cherlin, E. J., & Bradley, E. H. (2015). Leadership development programs for physicians: A systematic review. Journal of General Internal Medicine, 30(5), 656–674. https://doi.org/10.1007/s11606-014-3141-1
  • *Gibbs, P. J. (2016). Self-efficacy and the leadership development of women in academic medicine: A study of women alumnae of the Hedwig van Ameringen Executive Leadership in Academic Medicine (ELAM) program [ Unpublished doctoral dissertation]. The George Washington University.
  • Gipson, A. N., Pfaff, D. L., Mendelsohn, D. B., Catenacci, L. T., & Burke, W. W. (2017). Women and leadership: Selection, development, leadership style, and performance. The Journal of Applied Behavioral Science, 53(1), 32–65. https://doi.org/10.1177/0021886316687247
  • Grant Thornton. (2018). Women in business: Beyond policy to progress. https://www.grantthornton.global/en/insights/women-in-business-2018/
  • Gray, B. (1994). The gender-based foundations of negotiation theory. Research on Negotiation in Organisations, 4, 3–36.
  • Grove, J., Hass, T., & Kibel, B. M. (2005). EvaluLEAD: A guide for shaping and evaluating leadership development programs. Sustainable Leadership Initiative. Public Health Institute. https://www.wkkf.org/resource-directory/resource/2005/05/evalulead-a-guide-for-shaping-and-evaluating-leadership-development-programs
  • Gurdjian, P., Halbeisen, T., & Lane, K. (2014). Why leadership-development programs fail. The McKinsey Quarterly, 1(1), 121–126.
  • Harris, C. A., & Leberman, S. I. (2012). Leadership development for women in New Zealand universities: Learning from the New Zealand women in leadership program. Advances in Developing Human Resources, 14(1), 28–44. https://doi.org/10.1177/1523422311428747
  • Hayward, I., & Voller, S. (2010). How effective is leadership development. The Ashridge Journal, 360, 8–13. http://www.ashridge.org.uk
  • Hirsh, W., Tamkin, P., Garrow, V., & Burgoyne, J. (2011). Evaluating management and leadership development: New ideas and practical approaches. Key research findings. Institute for Employment Studies.
  • Hoobler, J. M., Lemmon, G., & Wayne, S. J. (2011). Women’s underrepresentation in upper management: New insights on a persistent problem. Organisational Dynamics, 40(3), 151–156. https://doi.org/10.1016/j.orgdyn.2011.04.001
  • Hopkins, M. M., O’neil, D. A., Passarelli, A., & Bilimoria, D. (2008). Women’s leadership development strategic practices for women and organisations. Consulting Psychology Journal: Practice and Research, 60(4), 348. https://doi.org/10.1037/a0014093
  • Ibarra, H., Ely, R., & Kolb, D. (2013). Women rising: The unseen barriers. Harvard Business Review, 91(9), 60–66. https://web.stanford.edu/dept/radiology/cgi-bin/raddiversity/wp-content/uploads/2017/12/WomenRising_TheUnseenBarriers.pdf
  • Janssens, M., & Zanoni, P. (2014). Alternative diversity management: Organizational practices fostering ethnic equality at work. Scandinavian Journal of Management, 30(3), 317–331. https://doi.org/10.1016/j.scaman.2013.12.006
  • Johansen, M., & Zhu, L. (2017). Who values diversity? Comparing the effect of manager gender across the public, private, and nonprofit sectors. The American Review of Public Administration, 47(7), 797–809. https://doi.org/10.1177/0275074016634201
  • Kalev, A., Dobbin, F., & Kelly, E. (2006). Best practices or best guesses? Assessing the efficacy of corporate affirmative action and diversity policies. American Sociological Review, 71(4), 589–617. https://doi.org/10.1177/000312240607100404
  • Kassotakis, M. E. (2017). Women-only leadership programs: A deeper look. In S. R. Madsen (Ed.), Handbook of research on gender and leadership (pp. 395–408). Edward Elgar Publishing. https://doi.org/10.4337/9781785363863.00036
  • Kassotakis, M. E., & Rizk, J. B. (2015). Advancing women’s executive development: Effective practices for the design and delivery of global women’s leadership programs. In F. W. Ngunjiri & S. R. Madsen (Eds.), Women and leadership: Research, theory, and practice. Women as global leaders (pp. 163–185). IAP Information Age Publishing.
  • Kay, K., & Shipman, C. (2014). The confidence gap. The Atlantic, 14(1), 1–18.
  • Kirkpatrick, D. (1996). Great ideas revisited. Training & Development, 50(1), 54–60.
  • Kirkpatrick, D. L. (1979). Techniques for evaluating training programs. Training & Development Journal, 33(6), 78–92.
  • Kolhatkar, S. (2016). The female solidarity have-it-all, feel-good machine. Bloomberg Businessweek, February(4462), 48–55.
  • Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality & Social Psychology, 77(6), 1121–1134. https://doi.org/10.1037/0022-3514.77.6.1121
  • Kulik, C. T. (2014). Working below and above the line: The research-practice gap in diversity management. Human Resource Management Journal, 24(2), 129–144. https://doi.org/10.1111/1748-8583.12038
  • Kulik, C. T., & Roberson, L. (2008a). Common goals and golden opportunities: Evaluations of diversity education in academic and organisational settings. Academy of Management Learning & Education, 7(3), 309–331. https://doi.org/10.5465/amle.2008.34251670
  • Kulik, C. T., & Roberson, L. (2008b). Diversity initiative effectiveness: What organisations can (and cannot) expect from diversity recruitment, diversity training, and formal mentoring programs. In A. P. Brief (Ed.), Cambridge companions to management. Diversity at work (pp. 265–317). Cambridge University Press. https://doi.org/10.1017/CBO9780511753725.010
  • *Kvach, E., Yesehak, B., Abebaw, H., Conniff, J., Busse, H., & Haq, C. (2017). Perspectives of female medical faculty in Ethiopia on a leadership fellowship program. International Journal of Medical Education, 8, 314–323. https://doi.org/10.5116/ijme.5985.f644
  • Leimon, A., Moscovici, F., & Goodier, H. (2010). Coaching women to lead. Routledge. https://doi.org/10.4324/9780203841013
  • Lockwood, C., Munn, Z., & Porritt, K. (2015). Qualitative research synthesis: Methodological guidance for systematic reviewers utilizing meta-aggregation. International Journal of Evidence Based Healthcare, 13(3), 179–187. https://doi.org/10.1097/XEB.0000000000000062
  • Lockwood, C., Porrit, K., Munn, Z., Rittenmeyer, L., Salmond, S., Bjerrum, M., Loveday, H., Carrier, J., & Stannard, D. (2017). Chapter 2: Systematic reviews of qualitative evidence. In E. Aromataris & Z. Munn. (Eds.), Joanna Briggs Institute reviewer’s manual. The Joanna Briggs Institute. https://reviewersmanual.joannabriggs.org/
  • *Louw, I., & Zuber-Skeritt, O. (2009). Reflecting on a leadership development programme: A case study in South African higher education. Perspectives in Education, 27(3), 237–246.
  • Mabe, P. A., & West, S. G. (1982). Validity of self-evaluation of ability: A review and meta-analysis. Journal of Applied Psychology, 67(3), 280–296. https://doi.org/10.1037/0021-9010.67.3.280
  • Madsen, S. R., & Scribner, R. T. (2017). A perspective on gender in management: The need for strategic cross-cultural scholarship on women in management and leadership. Cross Cultural & Strategic Management, 24(2), 231–250. https://doi.org/10.1108/CCSM-05-2016-0101
  • Marecek, J. (2003). Dancing through minefields: Toward a qualitative stance in psychology. In P. Camic, J. Rhodes, & L. Yardley (Eds.), Qualitative research in psychology: Expanding perspectives in methodology and design (pp. 49–69). American Psychological Association. https://doi.org/10.1037/10595-004
  • Mayne, J. (2012). Theory-based approaches to evaluation: Concepts and practices. Treasury Board of Canada Secretariat. https://www.canada.ca/en/treasury-board-secretariat/services/audit-evaluation/centre-excellence-evaluation/theory-based-approaches-evaluation-concepts-practices.html
  • Metz, I., & Kumra, S. (2019). Why are self-help books with career advice for women popular? Academy of Management Perspectives, 33(1), 82–93. https://doi.org/10.5465/amp.2016.0152
  • Monks, K., & Barker, P. (1999). Management development for women academics and administrators. Journal of Management Development, 18(6), 531–542. https://doi.org/10.1108/02621719910279635
  • Nentwich, J. C. (2006). Changing gender: The discursive construction of equal opportunities. Gender, Work & Organisation, 13(6), 499–521. https://doi.org/10.1111/j.1468-0432.2006.00320.x
  • Ngunjiri, F. W., & Gardiner, R. A. (2017). Future strategies for developing women as leaders. In S. R. Madsen (Ed.), Handbook of research on gender and leadership (pp. 423–437). Edward Elgar Publishing. https://doi.org/10.4337/9781785363863.00038
  • Noland, M., Moran, T., & Kotschwar, B. R. (2016). Is gender diversity profitable? Evidence from a global survey (Working Paper, (16-3)). Peterson Institute for International Economics. https://doi.org/10.2139/ssrn.2729348
  • O’Neil, D. A., & Bilimoria, D. (2005). Women’s career development phases: Idealism, endurance, and reinvention. Career Development International, 10(3), 168–189. https://doi.org/10.1108/13620430510598300
  • O’Neil, D. A., & Hopkins, M. M. (2015). The impact of gendered organisational systems on women’s career advancement. Frontiers in Psychology, 6, 905. https://doi.org/10.3389/fpsyg.2015.00905
  • Pelfrey, C. M., Cola, P. A., Gerlick, J. A., Edgar, B. K., & Khatri, S. B. (2022). Breaking through barriers: Factors that influence behavior change toward leadership for women in academic medicine. Frontiers in Psychology, 13, 854488. https://doi.org/10.3389/fpsyg.2022.854488
  • Raifman, J. G., Lam, F., Keller, J. M., Radunsky, A., & Savedoff, W. D. (2017). Evaluating evaluations: Assessing the quality of aid agency evaluations in global health. (Working Papers 461). Center for Global Development. https://doi.org/10.2139/ssrn.3025057
  • Riggio, R. E. (2008). Leadership development: The current state and future expectations. Consulting Psychology Journal: Practice and Research, 60(4), 383–392. https://doi.org/10.1037/1065-9293.60.4.383
  • Rojon, C., McDowall, A., & Saunders, M. N. K. (2011). On the experience of conducting a systematic review in industrial, work, and organizational psychology: Yes, it is worthwhile. Journal of Personnel Psychology, 10(3), 133–138. https://doi.org/10.1027/1866-5888/a000041
  • Sandler, C. (2014). Developing female leaders: Helping women reach the top. Industrial and Commercial Training, 46(2), 61–67. https://doi.org/10.1108/ICT-11-2013-0077
  • Schalken, N., & Rietbergen, C. (2017). The reporting quality of systematic reviews and meta-analyses in industrial and organisational psychology: A systematic review. Frontiers in Psychology, 8, 1–12. https://doi.org/10.3389/fpsyg.2017.01395
  • Selzer, R., Howton, A., & Wallace, F. (2017). Rethinking women’s leadership development: Voices from the trenches. Administrative Sciences, 7(2), 18. https://doi.org/10.3390/admsci7020018
  • Skaggs, S. (2008). Producing change or bagging opportunity? The effects of discrimination litigation on women in supermarket management. The American Journal of Sociology, 113(4), 1148–1182. https://doi.org/10.1086/522808
  • Storberg-Walker, J., & Natt Och Dag, K. (2017). Creativity in theorizing for women and leadership: A multi-paradigm perspective. In S. R. Madsen (Ed.), Handbook of research on gender and leadership (pp. 65–84). Edward Elgar Publishing. https://doi.org/10.4337/9781785363863.00012
  • Stufflebeam, D. L. (2001). Evaluation checklists: Practical tools for guiding and judging evaluations. American Journal of Evaluation, 22(1), 71–79. https://doi.org/10.1177/109821400102200107
  • Sugiyama, K., Cavanagh, K. V., van Esch, C., Bilimoria, D., & Brown, C. (2016). Inclusive leadership development: Drawing from pedagogies of women’s and general leadership development programs. Journal of Management Education, 40(3), 253–292. https://doi.org/10.1177/1052562916632553
  • Tessens, L. (2008). A review of current practices in women-only staff development programmes at Australian universities. In M. Barrow & K. Sutherland (Eds.), HERDSA: Engaging communities (Vol. 1, pp. 329–340). HERDSA.
  • Torres, R. T., & Preskill, H. (2001). Evaluation and organisational learning: Past, present, and future. American Journal of Evaluation, 22(3), 387–395. https://doi.org/10.1177/109821400102200316
  • van Esch, C., Assylkhan, K., & Bilimoria, D. (2017). Using organisational and management science theories to understand women and leadership. In S. R. Madsen (Ed.), Handbook of research on gender and leadership (pp. 127–144). Edward Elgar Publishing. https://doi.org/10.4337/9781785363863.00016
  • *Vinas, K. L. (2018). Narratives of women’s leadership identity development: An assessment of senior-level information technology (IT) leaders following participation in a women-only training program [ Unpublished doctoral dissertation]. Boston University.
  • Vinnicombe, S., & Singh, V. (2002). Women-only management training: An essential part of women’s leadership development. Journal of Change Management, 3(4), 294–306. https://doi.org/10.1080/714023846
  • Voller, S. (2010). The role of programme evaluation in organisational decision-making about management and leadership development [ Unpublished doctoral dissertation]. Cranfield University.
  • Weyer, B. (2007). Twenty years later: Explaining the persistence of the glass ceiling for women leaders. Women in Management Review, 22(6), 482–496. https://doi.org/10.1108/09649420710778718
  • Woetzel, J. (2015). McKinsey Global Institute. The power of parity: How advancing women’s equality can add $12 trillion to global growth. Online: McKinsey & Company. http://www.mckinsey.com/mgi
  • Zahidi, S., & Ibarra, H. (2010). The corporate gender gap report 2010. World Economic Forum. http://www.weforum.org
  • Zanoni, P., Janssens, M., Benschop, Y., & Nkomo, S. (2010). Guest editorial: Unpacking diversity, grasping inequality: Rethinking difference through critical perspectives. Organisation, 17(1), 9–29. https://doi.org/10.1177/1350508409350344