Abstract

AI has been used for years to improve collection and analysis in signals intelligence but this article explores the range of tasks generative AI can perform for strategic intelligence analysts. It argues that the most prudent integration of generative AI into intelligence assessment is as a ‘co-pilot’ for human analysts. Notwithstanding issues of inaccuracy, imported bias and ‘hallucination’, generative AI can liberate time-poor analysts to focus on tasks where humans add most value – applying their expertise, tacit knowledge and ‘sense of reality’.

There is no escaping artificial intelligence (AI). Each of us interacts with it directly or indirectly every day. It permeated society with little fanfare, save the odd inflection point such as when Ke Jie lost a game of Go to Google’s AlphaGo. But now the debate surrounding AI is very salient. This is largely related to the release and widespread adoption of user-friendly generative AI software, most famously ChatGPT and Google Bard. The potential of these powerful programs is such that many commentators consider their impact as analogous to another industrial revolution. Indeed, the application of AI to various fields, particularly medicine, may be revolutionary; equally it poses significant potential risks – security, economic, social and cultural. Prime Minister Rishi Sunak wants the UK to master both sides of this equation: to lead the world in AI regulation and safety, laying a marker with November’s AI Safety Summit at Bletchley Park; whilst also seizing the opportunities this technology offers. It is a conundrum that former occupants of Bletchley Park – the codebreakers, linguists, mathematicians and engineers – who grappled with the Enigma machine and pioneered computing some eight decades ago, would have appreciated and wrestled with energetically. This article is concerned with the opportunities and challenges generative AI poses for their heirs in the intelligence community, particularly those focused on the craft of intelligence assessment. It argues that generative AI has the potential to supplement analysis significantly. But its most useful application for now is as an aid, a co-pilot, one that has the potential to augment analysts’ work significantly, but one that should also be used with care.

Intelligence and technology are old pals. They have driven each other’s development over the decades. Perhaps nowhere has this been more apparent than in electronics and computing. Working in secret, agencies have pushed the boundaries of technology. They have also frequently been early adopters of new technologies, using them to develop, maintain and enhance capabilities. Adaptability is one of the hallmarks of a successful intelligence agency, after all. GCHQ transformed successfully from an analogue to a digital organisation, even styling itself today as an ‘intelligence, security and cyber agency’.Footnote1 AI already complements intelligence work in a variety of ways. States have routinely used AI augmented systems to aid collection. Many of the private sector contractors at work in the secret world make a lot of their AI credentials.Footnote2 CCTV camera networks supported by AI software are used widely to identify and track individuals or objects in urban environments or in areas where the risk of terror is higher, like railway stations.Footnote3 The technology also offers authoritarian governments unparalleled opportunities to repress difference or dissent, as illustrated in Xinjiang and elsewhere.Footnote4 Beyond data collection, much of this activity involves discriminating or selecting between data more easily and efficiently, and thereby facilitating the work of time-poor analysts who need to assess what it all means. AI is used widely for laborious tasks like translation, reducing the volume of intercepted internet traffic to a manageable level, voice recognition or scraping the open internet for subjects’ associations and contacts.Footnote5 In the UK, the INDEX system allows analysts to search across government and external reports. Core information can be extracted and summarised with a natural language processing system.Footnote6 But, as Sir Simon Gass, the recently retired Chair of the UK Joint Intelligence Committee, noted in June, ‘we’re in the foothills of this’.Footnote7

Extending Gass’s metaphor, the higher peaks probably entail integrating generative AI and large language models (LLMs) into the normal business of intelligence assessment. Generative AI, simply put, refers to ‘deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on’.Footnote8 These technologies are already being taken very seriously in defence and intelligence. John Ridge, the UK Ministry of Defence’s Director of Defence Innovation, noted recently that ‘the one thing we do know for certain is that these sorts of capabilities are going to be absolutely crucial’.Footnote9 Whether or not they prove revolutionary, or just another evolutionary stage in the intelligence business, remains to be seen. But their potential to alter elements of the business model is apparent. Whereas previous generations of AI have focused largely on more efficiently collecting data and on more effectively curating the material placed before civilian and military intelligence analysts, generative AI has demonstrated the potential to undertake tasks that have hitherto been possible only for a human analyst. The key selling point of LLM-based tools like ChatGPT is that they can respond to prompts, in the form of questions or commands, and use the material available to them to produce a response within specific parameters. Or, put another way, they can be commanded to write human-like reports, to particular specifications, to offer insights or draw inferences, at the speed of a computer, and based on an immense amount of data.

Figure 1: Definition of Generative AI According to Google Bard

Source: Generated by Google Bard, 3 August 2023, <https:/bard.google.com/chat>.

Figure 1: Definition of Generative AI According to Google BardSource: Generated by Google Bard, 3 August 2023, <https:/bard.google.com/chat>.

Intelligence analysis and assessment is, in this sense, in a similar position to other research-based fields of work that may (and almost certainly will) be disrupted. These include the medical and legal professions, where the prospect of a rapid and clear report or filing based on the entire corpus of digitised literature on a particular topic is alluring. The education sector is also affected, its traditional models are being upended by the challenge of detecting machine-generated work, and by the more philosophical question of what exactly constitutes legitimate research in the age of AI. Nonetheless, in each of these fields a fundamental task that was once human can, in theory, be outsourced to a significant extent to a machine, albeit with prudent vigilance. Doing so has produced impressive and sometimes thought-provoking results, like an academic paper on the implications of ChatGPT for detecting plagiarism that was submitted to, and accepted by, a peer-reviewed academic journal, but that was ‘written’ using ChatGPT.Footnote10 However, if the anecdotal evidence from the widespread adoption of LLMs in various professions is indicative, the days of human analysts are far from numbered. In the immediate future, LLMs should be considered as additional tools for intelligence analysts, aids to efficiency and effectiveness. They are ‘co-pilots’, there to evaluate arguments, perform data analysis or proofread, rather than potential replacements. For the time being, the stakes are too high to proceed otherwise in any of these fields. Intelligence is no different: the imperative to integrate these tools will only intensify in a globally competitive environment, but there are clear risks in moving too quickly or recklessly. The prudent approach is for intelligence assessment organisations to use AI to empower human analysts by creating more time and space for them to employ their indispensable tacit knowledge and ‘sense of reality’ – by which Isaiah Berlin referred to the empathetic understanding that is a key feature of historical interpretation – to make sense of the whole picture.Footnote11

Reassuringly, perhaps, Google Bard agrees (see ). When asked what benefits it could bring to intelligence analysis, the program replied that it could perform many useful tasks. These included collecting information, analysing information, generating reports, communicating findings, generating intelligence requirements, managing intelligence resources and overseeing intelligence operations to ensure they comply with legal and ethical standards. But, when asked to identify the risks of using LLMs for strategic intelligence analysis, it noted: ‘it is important to integrate machine outputs with human analysis and interpretation, as well as a comprehensive understanding of the geopolitical environment’. Clearly, if the system is to be taken ‘at its word’, it has significant potential. But before this potential can be fully exploited, all involved will need to consider and resolve several fundamental challenges.

Figure 2: Bard’s View of Its Ability to Help with Strategic Intelligence Analysis

Source: Generated by Google Bard, 3 August 2023, <https://bard.google.com/share/5a6e3cfdc8b7?hl=en>.

Figure 2: Bard’s View of Its Ability to Help with Strategic Intelligence AnalysisSource: Generated by Google Bard, 3 August 2023, <https://bard.google.com/share/5a6e3cfdc8b7?hl=en>.

These range from the usual worries about the security and robustness of IT networks, for instance: the challenge of ensuring that integrated software is vetted for their secure architecture, and the potential for supply chain risks, the security of data stores, ensuring that queries submitted to any system are encrypted or otherwise impossible to reconstruct by a hostile party. Other notable security issues arise from the large volume of training data, the billions of parameters and the training processes needed to devise a workable tool. Currently, this takes place in a cloud-based system, which adds to the customary cybersecurity concerns the additional issue of data sovereignty.Footnote12 In addition, to maximise its value and utility, particularly in rapidly evolving situations, an LLM will need frequent or constant access to the internet. It would clearly be necessary to separate those systems that maintained a link to the open internet from the closed, classified networks on which intelligence analysts handle more sensitive material and produce intelligence assessment products.

None of the above is an insuperable problem, but the list of challenges highlights the need to address this issue methodically, coordinating across the relevant institutional stakeholders in government, to achieve successful implementation of what will be a critical IT project. Nor are the challenges all centred upon securing the system from hostile actors. There are also regulatory issues to consider. Indeed, Lord David Anderson noted in a House of Lords debate on AI that ‘in a world where everybody is using open-source datasets to train large language models, UKIC is uniquely constrained by Part 7 of the Investigatory Powers Act’.Footnote13 And that these constraints ‘impinge in certain important contexts on UKIC’s agility, on its co-operation with commercial partners, on its ability to recruit and retain data scientists, and, ultimately, on its effectiveness’.Footnote14

Provided satisfactory solutions can be identified, LLMs could prove extremely useful for analysts in many aspects of their work. These include more conventional yet laborious tasks, like being a research assistant providing near-instantaneous summaries of varying length and detail on particular topics – like the background of an international dispute – or constructing timelines, penning profiles, summarising or analysing lengthy texts, or (assuming copyright and subscription issues are addressed) integrating the latest academic work into the mix. While the first LLMs were trained on English corpus, there are ongoing efforts to develop multilingual models that are progressing well.Footnote15 Of course, given the identified issues with accuracy and completeness in responses generated by generative AI, it would be prudent for any such product to be checked by a subject matter expert, in an analogue of the cross-Whitehall Current Intelligence Group system. This would probably improve both robustness and efficiency, and, over time, lead to institutional learning and reformed processes.

But there is clearly more potential than this. Generative AI could also include more advanced and significant work. Analysts could, for instance, use LLMs to review and validate their written reports, augmenting existing routines for auditing the analytical process and products. This could occur, for example, by asking for any data that challenges or falsifies a key judgement; by querying a body of reports that have been generated over time to identify assumptions that have become conventional wisdom; or by using a tool to generate a ‘red-team’ assessment. In theory, such a capacity could aid analysts in several ways in identifying or rooting out some elements of bias that have been contributory factors in intelligence failures, and in ensuring that reports are as up-to-date as possible. It is easy to conceive of ways in which the availability and appropriate use of such tools would improve the analytical community’s speed, reach and ability to reflect critically about its conduct and performance.

The current generation of LLMs can also write reports or assessments. Outsourcing early phases of the drafting of such writing tasks to a tool could create economies of effort for resource- and time-poor intelligence analysts. There is doubtless a case for adoption of LLMs in a prudent way. But the technology remains limited and in need of careful monitoring. These limits pose risks, as has been demonstrated and well documented following the widespread experimentation with LLMs in 2023 by the general public (until META rolled out Threads, no application had been adopted as quickly as ChatGPT, which reached one million users within five days of its launch).Footnote16 For intelligence analysts and the recipients of their products, many of these challenges are very problematic. They include concerns about the accuracy and reliability of information provided by these tools. The systems are very good at generating plausible text, declarations and conclusions. But these may not have any basis in reality, or even in the training data upon which the LLM is built. Such ‘hallucinations’ have been widely observed; in academic work the frequent ‘tell’ is the generation of non-existent sources (for example, citing plausible-sounding webpages that do not, in fact, exist) in footnotes to support a generated claim.Footnote17 Whether or not this is a feature or a bug of LLMs is debated. Either way, it poses a significant challenge to the adoption of LLMs for aspects of intelligence assessment. Systematic checking of the underlying source data would need to be mandatory for analysts deriving material from these tools for incorporation in analytical products. So the technology presents a paradox: saving time at one end, but generating work on the other.

As with other AI systems, LLMs can also embed biases in whatever they generate. The appeal and potential of the system relates to its capacity to ingest and query a vast body of material – essentially the open internet in its entirety – but the corollary is that the system also ingests the biases and nonsense that are available, which may be a dominant narrative on a particular topic, or in a particular language on a particular topic. Equally, there is no doubt that disruptive or malign actors will use LLMs to generate and flood the web with vast amounts of disinformation quickly and cheaply. Nor can there be doubt that hostile actors will attempt to poison public or proprietary LLMs. Currently, most open generative AI applications are essentially black boxes, systems that do not (or will not) allow their users to examine the process by which they have arrived at a particular judgement. This is due to the very nature of neural networks that rely on multiple layers of nodes to process the data. This lack of observability, coupled with a certain brittleness in the working of LLM-based systems when it comes to replicability – that is, its dependence on exact wording of the prompt – poses risks and challenges.Footnote18 Indeed, given the importance of having an auditable process for analytical assessments in a professional intelligence community, this issue poses a significant barrier to be overcome – or challenge to be mastered – before these tools are integrated into normal business. Invariably, as in the era before AI, conclusions will need to be checked, verified and the whole process audited, by experienced and well-trained personnel.

It is possible that many, perhaps all, of these risks can be sufficiently mitigated to enable relatively swift integration of these tools into the analytical process. Many researchers are developing AI systems that identify AI-generated content in various contexts, like academic essays or video files.Footnote19 Others are working on auditable LLM systems;Footnote20 still others on the challenge of how to develop secure systems that allow analysts to search across classified systems and the open internet. But even if they can be mitigated, there remains another fundamental question about whether or not these systems can be anything other than derivative, given that they are, essentially, entirely grounded in computational models based on already extant material. Can the insight they offer match anything approaching ‘imagination’, or will their current contribution remain limited to exercises in grammar and style, with an occasional hallucination thrown in? Or, put another way, they may offer extremely (or superficially) plausible discussions of an issue, but given that it will be derived from a statistical model concerned with the likelihood of a particular word or concept, or ‘token’, being linked to another, based on the training material, will there be an inherent conservatism – or other bias – in the results? Notwithstanding all this, the pace of change in the field has been such that forecasting even the relatively near-term impact of these on intelligence assessment is fraught with uncertainty and highlights the need to keep developments in the field under continuous review.

While contributions of other types of AI have already been proven, too much techno-optimism for the prospect of generative AI carries risks. Though not a precise analogy, the neglect of human intelligence (HUMINT) skills in favour of high technology in the US intelligence community before 9/11 should provide a cautionary tale for anyone tempted, for example, to see the advent of LLMs as an opportunity to reduce the human workforce in intelligence communities. Ill-chosen shortcuts can make for long delays. Clearly, the government must and will have to engage with LLMs, and it must keep under review the utility of available technologies, being willing to expand the use of the systems when they are proven. But as well as investing in (owning or accessing) LLMs, government should retain and redouble its investment in people. A key factor in maximising benefits and minimising risks from adopting LLMs will need to include maintaining and developing the training of intelligence analysts, enabling them to make the best possible use of these powerful new tools. This may include specialist pathways, nurturing a cadre of officers adept at integrating generative AI into a ‘new normal’ for analytical practice, capable of mastering existing systems to maximise their utility whilst minimising the risks they pose. But it should also maintain and prioritise nurturing experts in subject matter and analytical tradecraft, who can complement the immense power of generative AI with experience and wisdom, tacit knowledge and that quintessentially human ‘sense of reality’. This should take place alongside a broader programme of education within government (not to mention the broader public) about the uses and limits of generative AI. Consumers, particularly self-professed technophiles and visionary disrupters of the staid ‘deep state’ or ‘blob’, should be carefully briefed about the limits and risks of bypassing their analytical machineries owing to the convenience of LLM. The world does not need a ChatGPT version of Donald Rumsfeld’s pre-Iraq Office of Special Plans (inevitably, ChatOSP).Footnote21 For the time being, the most reasonable use case for integrating LLM-derived tools into the analytical process is as ‘co-pilots’ operated by experienced and well-trained human analysts, embedded in organisations still happy to deliver their consumers unwelcome news. ν

Additional information

Notes on contributors

Joe Devanny

Joe Devanny is a Lecturer in the Department of War Studies at King’s College London. He was a 2022–23 British Academy Innovation Fellow at the UK Foreign, Commonwealth and Development Office, where he researched cyber diplomacy and emerging technology.

Huw Dylan

Huw Dylan is Reader in Intelligence and International Security at the Department of War Studies, King’s College London, and a visiting Lecturer at the Norwegian Intelligence School. He has published widely on intelligence affairs, including ‘The Autocrat’s Intelligence Paradox’, an examination of Russia’s intelligence failure before invading Ukraine.

Elena Grossfeld

Elena Grossfeld is a PhD candidate at the Department of War Studies, King’s College London, researching intelligence organisations, their strategic culture and technologies. Her recent publications include ‘What Israeli Intelligence Got Wrong About Hamas’, an analysis of Israeli intelligence failure to anticipate Hamas’s attack, published in Foreign Policy.

Notes

1 David Pepper, ‘The Business of Sigint: The Role of Modern Management in the Transformation of GCHQ’, Public Policy and Administration (Vol. 25, No. 1), pp. 85–97; Richard James Aldrich, GCHQ: The Uncensored Story of Britain’s Most Secret Intelligence Agency (London: Harper Press, 2010); GCHQ, ‘Welcome to GCHQ’, <https://www.gchq.gov.uk>, accessed 16 October 2023.

2 Palantir is a notable example.

3 Railtech, ‘Renfe to Analyse Video Surveillance in nearly 500 Stations with AI’, 11 February 2022, <https://www.railtech.com/digitalisation/2022/02/11/renfe-to-analyse-video-surveillance-in-nearly-500-stations-with-ai/>, accessed 15 October 2023.

4 Human Rights Watch, ‘China’s Algorithms of Repression’, 1 May 2019, <https://www.hrw.org/report/2019/05/01/chinas-algorithms-repression/reverse-engineering-xinjiang-police-mass>, accessed 15 October 2023.

5 This is widely discussed, but see for instance Hansard, House of Lords Debate, Vol. 832, Cols. 23–68, 23 July 2023.

6 The Economist, ‘The Boss of Britain’s Spies Speaks’, 29 June 2023.

7 Ibid.

8 This definition is offered by IBM, ‘What is Generative AI?’, <https://research.ibm.com/blog/what-is-generative-AI>, accessed 2 November 2023. We also asked Google Bard for a more developed definition. This is reproduced in .

9 Stew Magnuson, ‘Pentagon’s Top AI Official Addresses ChatGPT’s Possible Benefits, Risks’, National Defense, 8 March 2023, <https://www.nationaldefensemagazine.org/articles/2023/3/8/pentagons-top-ai-official-addresses-chatgpts-possible-benefits-risks>, accessed 9 September 2023.

10 Anna Fazackerley, ‘AI Makes Plagiarism Harder to Detect, Argue Academics – in Paper Written by Chatbot’, The Guardian, 19 March 2023.

11 Isaiah Berlin, The Sense of Reality (New York, NY: Farrar, Straus and Giroux, 1998).

12 The Intelligence and Security Committee is currently conducting an enquiry into cloud technologies and their implications, see <https://isc.independent.gov.uk/>. The Committee also noted in its Annual Report 2021–22 that the UK Security Service was preparing for the adoption of certain cloud platforms and the implications for working practices and the use of data: Intelligence and Security Committee, Annual Report 2021–22, HC922 (London: The Stationery Office, 2022), <https://isc.independent.gov.uk/wp-content/uploads/2022/12/ISC-Annual-Report-2021%E2%80%932022.pdf>, accessed 1 October 2023.

13 Hansard, House of Lords Debate’, Col. 37.

14 Ibid.

15 See, for example, Sean Michael Kerner, ‘Large Language Model Expands Natural Language Understanding, Moves Beyond English’, Venturebeat, 12 December 2022, <https://venturebeat.com/ai/large-language-model-expands-natural-language-understanding-moves-beyond-english/>, accessed 22 October 2023.

16 Cindy Gordon, ‘ChatGPT is the Fastest Growing App in the History of Web Applications’, Forbes, 2 February 2023.

17 Haziqa Sajid, ‘What are LLM Hallucinations? Causes, Ethical Concern, and Prevention’, 29 April 2023, <https://www.unite.ai/what-are-llm-hallucinations-causes-ethical-concern-prevention/>, accessed 2 August 2023.

18 Nature, ‘ChatGPT is a Black Box: How AI Research Can Break it Open’, 25 July 2023, <https://www.nature.com/articles/d41586-023-02366-2>, accessed 2 August 2023.

19 Lydia Morrish ‘Fact-checkers are Scrambling to Fight Disinformation with AI’, Wired, 1 February 2023, <https://www.wired.co.uk/article/fact-checkers-ai-chatgpt-misinformation>, accessed 3 August 2023.

20 See Jakob Mökander et al., ‘Auditing Large Language Models: A Three-layered Approach’, (2023), doi:10.1007/s43681-023-00289-2.

21 Rumsfield’s OSP is widely discussed, see Seymour M Hersh, ‘Selective Intelligence’, New Yorker, 4 May 2003; Julian Borger, ‘The Spies Who Pushed for War’, The Guardian, 17 July 2003.