Publication Cover
The Design Journal
An International Journal for All Aspects of Design
Latest Articles
461
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Using Generative AI Midjourney to Enhance Divergent and Convergent Thinking in an Architect’s Creative Design Process

ORCID Icon &
Received 22 May 2023, Accepted 25 Apr 2024, Published online: 13 May 2024

Abstract

Architects use a range of tools, from the traditional pencil to Virtual Reality technologies to prototype and articulate their creative designs. In recent years, Generative Artificial Intelligence (GenAI) software has reached the mainstream and there is an exponential appearance of GenAI images that portray architectural designs. This article documents an architectural design methodology that uses Midjourney, a text-to-image GenAI software, as a design tool that enhances architects’ creativity. Prompted by a design brief, we used Midjourney to accelerate (1) the identification and refinement process of developing a prospective user and (2) the ideation process of creating different desirable spaces for the designed user.

Introduction

Artificial Intelligence (AI) has created innovative opportunities to enhance both life and work, such as optimising one’s schedule and creating predictive models of pedestrian movement within a city area. Contrastingly, there have also been many discussions regarding the potential dangers of AI technology, such as replacing human jobs through automation and authorship concerns regarding AI-generated outcomes. In recent years, there has been a surge of discussions on how AI will shape the future of architecture. On the popular architecture blog, Archdaily, journalists have written about how AI can reimagine our traditional architectural workflows, analyse urban environments (Iñiguez Citation2022), and even provoked to wonder if AI can assist architects with creative tasks (Florian Citation2022). Such discussions have re-emerged in architectural research too. On a macro scale, such as a review of AI topics in architectural research (Özerol and Arslan Selçuk Citation2022) and an editorial that discusses the impacts of AI in architecture (Wit et al. Citation2018), to a micro scale, such as demonstrating how AI is used to generate design concepts (As, Pal, and Basu Citation2018), spatial compositions (Dzieduszyński Citation2022), and architectural forms (del Campo et al. Citation2019). These publications demonstrate an emerging interest in, and point to the potential value of, AI in the architectural process.

This study describes an architectural design method that uses Midjourney, a text-to-image Generative AI (GenAI) software, as a design tool. It contributes to explorations and discussions of how GenAI influences the designer’s divergent and convergent thinking, a key process in creativity (Mumford Citation2001). We used Midjourney to accelerate convergent thinking and demonstrated this process through the identification and refinement process to develop a user persona. We then used Midjourney to accelerate divergent thinking and demonstrated this process through the ideation and design of five different spaces and eventually an architectural proposal.

Background

Basic digital technologies are useful tools when it comes to performing and optimising repetitive tasks, for example automating workflows. On the one hand, more advanced technologies such as AI are capable of dealing with vast and complex information, arguably more so than humans. On the other hand, humans lead in tasks that involve creativity, strategy and compassion (Amershi et al. Citation2019; Lee Citation2018). To capitalise on the different strengths of AI technologies and human capabilities, Wu et al. (Citation2021) proposed a human-AI co-creation model to enhance human creativity. To further understand how AI can be used to enhance an architectural designer’s creativity, we (1) describe visual creativity and creative cognition, (2) creative cognition in the design process, then (3) highlight how AI technologies have been used in recent years in architectural design.

Visual creativity and creative cognition

There are many ways to describe creativity (see e.g. Brown Citation2008; de Bono Citation1985; Design Council UK Citation2019), though there are generally two recognised forms; visual and verbal creativity. Visual/figural creativity (Hetrick, Lilly, and Merrifield Citation1968) is defined as the creation of novel visual forms such as drawings and paintings (see e.g. Aziz-Zadeh, Liew, and Dandekar Citation2013; Dake Citation1991; Herman and Hwang Citation2022; Palmiero et al. Citation2015). Verbal creativity is defined as the creation of novel verbal forms such as written and spoken words (see e.g. Gold, Faust, and Ben-Artzi Citation2012; Landry Citation1973; Preiss et al. Citation2019; Torrance Citation1962; Yemez and Dikilitaş Citation2022), like lyrics and poems. Although some studies examine both forms of creativity concurrently (e.g. Dău-Gaşpar Citation2013; Fink et al. Citation2020; Petsche Citation1996; Studente, Seppala, and Sadowska Citation2016; Zhu et al. Citation2017), Palmiero et al. (Citation2015) found an insignificant correlation between both forms. As our study is focused on architectural designers, we situate this study in the domain of visual creativity.

Creative thinking can be described as the unconscious processing of information to reach new insights (Miller Citation2019), the emergence of an idea (Wertheimer Citation2020), or unexpectedly reaching an ‘eureka’ moment (Csikszentmihalyi Citation2013, Citation2014). In this article, we rely on a common established understanding that creativity is the generation of an outcome deemed novel by people at a certain point in time (Amabile Citation2019; Mackinnon Citation1962; Stein Citation1953) and can be attributed to the person, the process and the outcome (Kaufman and Baer Citation2004, 12). This research uses the divergent-convergent framework to explore the creative cognition of architectural designers when using AI tools.

Divergent thinking is characterised as the exploration of different possibilities and the consideration of alternative perspectives. In contrast, convergent thinking is characterised as being logical and analytical to find the most effective idea to solve a problem. When coupled, divergent thinking generates a wide range of ideas and convergent thinking identifies one idea, among those generated, to develop further (Kaufman and Beghetto Citation2009). We adopted this specific framework for three reasons. Firstly, these two processes are necessary for creative cognition (Mumford Citation2001) and have been widely used to demonstrate creativity (see e.g. Guilford Citation1956; Kaufman, Plucker, and Baer Citation2008; Torrance and Ball Citation1966). Secondly, the divergent-convergent framework references Guilford’s (Citation1956) Structure of Intellect (SOI) theory, which has greatly influenced the creative cognition (Guilford Citation1950; Mumford Citation2001). Specifically, this framework aligns with the divergent and convergent production on the theory’s operations dimension. Additionally, the text-to-image GenAI explorations of this research operationalise the visual and semantic mental contents dimension of this theory. Thirdly, the divergent-convergent framework draws parallel to how designers develop designs based on the project brief. Thus, we integrated this framework with the Co-evolution of Design model (Maher, Poon, and Boulanger Citation1996), which we describe below.

Creative cognition in the design process

According to Maher, Poon, and Boulanger (Citation1996, 2), ‘design is an iterative interplay to “fix” a problem from the problem space and to “search” plausible solutions from the corresponding solution space.’ This implies that when a designer focuses on the problem/brief, the designer is analytical (i.e. convergent thinking) and is in the process of finding a logical solution to the problem. Inversely, when the designer focuses on the solution/outcome, the designer is exploratory (i.e. divergent thinking) and in the process of searching for different solutions. It is through iterating these thinking spaces that evolves a design and matures it from an idea into a solution. Our research synthesises the divergent-convergent framework and co-evolution model into a single framework (see , Combining Co-evolution model of design with divergent-convergent framework), which frames our research regarding the use of GenAI in our architectural design.

Figure 1. Co-evolution Model of Design combined with Divergent Convergent framework.

Figure 1. Co-evolution Model of Design combined with Divergent Convergent framework.

Both factors—(1) working in the brief-outcome dimension and (2) doing the associated thinking process—are mutual and necessary for creative design (Wiltschnig, Christensen, and Ball Citation2013). This co-evolution necessity has been argued as the core of creative practice (Dorst Citation2019), not just in design but in other disciplines as well (Dorst Citation2018). Hence, our study uses GenAI to visualise the thinking (or perhaps in the case of machines, interpretation) in the brief and outcome dimensions, so that the designer, instead of oscillating between dimensions and different mindsets, can focus their cognitive load within a single dimension and its associated thinking process. Additionally, cognitively offloading the interpretation process to digital technologies also capitalises on GenAI's rapid generation of imagery to iterate quickly through ideas.

Examples of using AI in architecture and design

Using digital tools to expedite design is not novel. Design researchers have used AI as tools to support ideation in engineering design (e.g. Camburn et al. Citation2020), early-stage aerospace design (e.g. Tan et al. Citation2023), digital environment design (e.g. Serra and Miralles Citation2021), and even speculative design (Simeone, Mantelli, and Adamo Citation2022). In architecture, researchers have used AI to analyse design (Tamke et al. Citation2017, Tamke, Nicholas, and Zwierzycki Citation2018; Wilkinson, Bradbury, and Hanna Citation2014), generate architecture and design ideas (As, Pal, and Basu Citation2018; del Campo et al. Citation2019; Dzieduszyński Citation2022; Tan Citation2023; Tan and Luke Citation2024), and even introduce unexpectedness to the design (del Campo and Manninger Citation2022; Luhrs and Tan Citation2022).

This article contributes to the studies mentioned above by discussing the potential role of GenAI in enhancing architects’ creative cognition. Our research is different to these exemplars as we focus on the relationship between GenAI and creative cognition, not on how AI contributed to the design outcome. Finally, our research contextualises GenAI outcomes in our synthesised creative and design framework (refer to ) to demonstrate GenAI’s influence on creative cognition. To do so, we examined how architectural designers can use GenAI to:

  1. Enhance their convergent thinking process, particularly during the development of a project brief, and inversely,

  2. Enhance their divergent thinking process, particularly during the exploration of spatial design ideas.

Research design

We conducted research through design (Frankel and Racine Citation2010; Frayling Citation2012) to analyse how GenAI images influence creative cognition during design. The most common text-to-image GenAI available during the project were Midjourney, Dall-E and Stable Diffusion. We used Midjourney because it was publicly released in July 2022 (Midjourney Citation2022), which was when we started our research. Stable Diffusion was released in August (Stability.ai Citation2022) and Dall-E in September (OpenAI Citation2022), after we commenced our project. The first design task was to create a user persona, and then create an architectural design for the created persona. The design project lasted approximately 13 weeks and MidJourney was split into two phases:

  1. The discovery process of defining a user persona (i.e. the designer’s convergent thinking process), and

  2. The exploratory process of developing spatial designs (i.e. the designer’s divergent thinking process).

As a design project, there were naturally biases in our decisions regarding how we developed the design. We relied on Midjourney’s generated images to examine how they responded to our biased design choices and how the outputs influenced our creative cognition. However, as a research project, we maintained a systematic approach in collecting and analysing our experimentation with Midjourney. Therefore, we designed our research to explore Midjourney’s convergent and divergent capabilities in two separate phases (each approximately six weeks long), to avoid overlapping both processes and to show an unbiased examination of the GenAI tool on creative cognition.

Research positionality

Both authors are architectural designers. Author A also works as an architectural researcher focusing on design cognition and behaviour whereas Author B works in an architecture practice. In this paper, Author B employed Midjourney as a tool in the architectural design process whereas Author A examined creative cognition through the designer, the design process and the design outcome. While both authors used different perspectives and had different responsibilities in this project, they engaged in reflective conversations (Schön Citation1983) to capture insights throughout the project.

Data collection

We collected three data sets, (1) keywords that were used to generate images, (2) images created by Midjourney, and (3) the relationship and co-evolutionary development between iterations of the keywords given and images produced. These three sets of data were collected separately in both phases of the project. We focused on these three data sets as they are associated with the assessment of creative cognition through the three components that we highlighted in the section above; (1) the person, (2) the creative outcome and (3) the overall creative process.

Tool limitation

We used Midjourney v3 in this study to avoid any changes to the GenAI process because of software updates. Although updates generally improve the performance of the software, they also influence the default method of generating visuals. Maintaining the use of a single version throughout the project ensured consistent results over the 13 weeks. We acknowledge that Midjourney requires text to generate images and hence there is an extent of creativity in the words used as prompts (i.e. creative cognition in the verbal creativity domain). Aside from the textual prompts used, the categorised images used as databases by Midjourney and other GenAI may not be consistent, which results in inaccurate images. For example, hands are not always captured uniformly in the image database; they can be pictured clenched, open, making finger symbols, gesturing, etc. This makes Midjourney less able to generate images of hands accurately. However, the scope of our research is specifically on how GenAI outcomes sparked new ideas (i.e. divergent thinking), helped the designer refine their designs (i.e. convergent thinking) and in general, influenced the designer’s creative cognition.

Results

We started our analysis by reflecting on and assessing how we designed our user persona. Representing a fictitious user was necessary for creating a project brief that informs the subsequent architectural proposal. As we were exploring the converging capabilities of Midjourney, we were broad, almost artistic-like, in our approach and textual inputs. This process consisted of three tasks, (1) speculating a future Melbourne context, (2) designing the user attributes, and (3) contextualising the user in Melbourne to determine their experience. Below shows our Midjourney experimentations and describe key interactions between Midjourney and us that are significant in developing the persona.

Offloading converging process to midjourney

To speculate a future Melbourne, we used Hoddle Grid, birds eye view, skyscrapers (see , Image #1) to generate images. Hoddle Grid is the colloquial name given to the grid-like streets of Melbourne Central Business District. Despite Hoddle Grid being specific to Melbourne, the GenAI image appeared generic. , Image #2 used Melbourne city, desert, filmmaking. While the Melbourne skyline appeared in the background, the visualised desert was unlike the Australian desert. Film making was a textual device added to maintain a cohesive image style. Hence, Image #3 used Melbourne City, Australian desert, Mad Max Fury Road, filmmaking. Additionally, an image source depicting Australia’s urban wetland was used as a visual reference.

Figure 2. Interacting with Midjourney to speculate on the context.

Figure 2. Interacting with Midjourney to speculate on the context.

This was a creative exercise to think divergently and generate different images that speculated a future Melbourne. The goal of this task was not to reproduce a feasible or authentic future Melbourne imagery with Midjourney. While this visualisation was not used specifically in the following two tasks, it was an exercise to understand the process of Midjourney and had implicit implications for the following two tasks, which we explain below.

To develop the user persona, an early GenAI iteration used the keywords couple of girlfriends, catching up for a coffee, café (see , Image #1). Despite Melbourne having a strong coffee culture, the GenAI image still appeared generic. This is like the Hoddle Grid keyword outcome of Task 1, whereby a non-visual characteristic of a city’s culture was not successful in producing a contextualised GenAI image. Image #2 added the keyword in Melbourne, but the café interior in the GenAI was still unclear in suggesting the Melbourne context. Image #3 used Film making, in Melbourne, friends. While filmmaking in Task 1 was used as a style-controlling keyword, it was re-used in Task 2 to annotate what the persona does in a typical week. Since this GenAI image depicted an exterior scene (which we continue to develop in Task 3), we focused the next iteration on generating interior scenes, to develop a balanced understanding and portrayal of the user’s daily activities. Hence Image #4 used watching Netflix, sitting on her bean bag, film making, in Melbourne. We used this designer-AI process to visualise different scenes of how our user might perform different activities during the day (see , Developing the scene). Then, we cross-integrated keywords and used visual cues from images generated before developing specific activities (see , Developing the activity). Through this process, Midjourney portrayed the persona in a consistent visual manner (see , Developing the journey), which began giving a particular visual aesthetic to the persona. Though this second task was exploratory (i.e. divergent thinking) in designing a particular persona to inform the architectural project later on, the Midjourney outputs converged the explorations towards a particular portrayal of the character.

Figure 3. Interacting with Midjourney to develop the character, scene, activity, and journey.

Figure 3. Interacting with Midjourney to develop the character, scene, activity, and journey.

While we attempted to diverge our thinking by using different keywords to explore and represent a persona, Midjourney was converging those keywords into visuals where we could identify key features in the images to inform the next iteration of keywords. This was an iterative process, where the almost-instant visual response from Midjourney enabled us to maintain a continuous focus on divergent thinking, as opposed to oscillating between different thinking mindsets. We point out here that while the keywords we used to prompt these persona visualisations were mostly based on personal design preferences, the resurfacing of similar keywords demonstrated the influence of Midjourney is helping us converge a breadth of design ideas into a curated persona visualisation. Through these tasks, of designing the persona and discovering the persona’s activity and experiences in Melbourne, we gradually developed, with Midjourney, an overarching snapshot of what this persona experiences in a typical week.

Offloading the diverging process to midjourney

While the earlier section described how we used Midjourney to narrow down our persona exploration (i.e. convergent interpretation), the following section describes how we used Midjourney to scope out ideas for different spatial designs (i.e. divergent interpretation). This process consisted of two tasks, (1) design charrettes to produce five different spaces, then (2) configure the different spaces into an architectural proposal. We visualised this process and described key interactions between us and Midjourney that are significant to the design development in the section below.

Design charrettes

To develop Spatial Design #1, we started with multiple keywords (see ), including the phrase film making in Melbourne as a reference to the prior persona development exercise. With each iteration from Midjourney, we reduced the keywords to converge the Midjourney visualisations towards a single visual idea which we analysed and used as conceptual images for the architectural design. In this example, the visual idea was staggered multi-platforms for observation.

Figure 4. Interacting with Midjourney to develop space #1.

Figure 4. Interacting with Midjourney to develop space #1.

We then used a single description of the space, ornithological habitat, and provided more geometrical descriptions, repetitive space frame matrix, symmetrical geometry, endless cubic space division, to develop Spatial Design #2 (see ). Unlike Spatial Design #1, this example added more keywords to each image generation. Nonetheless, the aim was to inform Midjourney with heavy geometrical descriptions, to channel the images generated towards more architectural-like spaces.

Figure 5. Interacting with Midjourney to define the space #2.

Figure 5. Interacting with Midjourney to define the space #2.

We used targeted keywords to converge towards an overall idea for a space. With each interpretation, we analysed and used key features in the images to generate the next iteration of images. However, these additional keywords also enabled Midjourney to make ‘divergent interpretations’ thus providing more spatial options each time. Again, the almost-instant visual response from Midjourney enabled us to maintain a continuous and unbroken focus on convergent thinking, as opposed to oscillating between the different thinking mindsets. We repeated the above-mentioned processes with different keywords to create five spatial designs (see ).

Figure 6. Interacting with Midjourney to define the five spaces.

Figure 6. Interacting with Midjourney to define the five spaces.

We then aggregated and translated these five designs into an architectural proposal (see ). This design is a residential apartment where the previously developed persona can watch birds in different environments.

Figure 7. Architectural proposal developed using Midjourney in the design process.

Figure 7. Architectural proposal developed using Midjourney in the design process.

Discussion

This study used the Divergent-Convergent framework (Mumford Citation2001) and Maher, Poon, and Boulanger (Citation1996) Co-evolution of Design model to explore the role of Midjourney in enhancing architects’ creativity. In doing so, the significance of this research is in synthesising the Divergent-Convergent framework with the Co-evolution of Design model through a GenAI tool (which we describe below and in ). In the sections below, we (1) contextualise our work with prior research, (2) reflect on our design process failures, (3) discuss the broader implications of AI on designers, and (4) propose research opportunities to build on this study.

Figure 8. Designer-AI interaction, through the co-evolution of the design model.

Figure 8. Designer-AI interaction, through the co-evolution of the design model.

Wilkinson, Bradbury, and Hanna (Citation2014) used AI to accelerate simulations to analyse the design, whereas we used AI to accelerate the generation of visual ideas based on given keywords. Since Midjourney takes approximately one minute to generate the images, the rapid visualisation of ideas provided a speed advantage for designers to develop their ideas. While del Campo et al. (Citation2019) used AI to generate imagery of architectural spaces, we used Midjourney to suggest spatial qualities, then translated these qualities into design features and eventually into an architectural proposal. del Campo and Manninger (Citation2022) showed how glitches and defamiliarization could spark new approaches to design. Similarly, we first allowed Midjourney to interpret our keywords, then analysed the imagery to find and retain unexpected visual qualities into subsequent keyword iterations, which generated images with different spatial qualities. While Dzieduszyński (Citation2022) used AI to develop spatial compositions, we used AI to generate imagery that enabled us to interrogate our ideas, and then attempt to recreate the compositions while maintaining the desirable spatial quality produced by Midjourney. Through this interplay of our ideas and AI visualisations, we created a designer-AI process that we used to develop an architectural outcome.

Using AI software to enhance a designer’s creative process

According to Maher, Poon, and Boulanger (Citation1996, 2), design is an iterative interplay between problem and solution. When we designed our persona (which acted as a brief that informed the architectural proposal), we used keywords that explored different persona scenarios (i.e. divergent exploration), whereas Midjourney made ‘convergent interpretations’ and visualised our keywords into a persona (see , Convergent interpretations). Miller (Citation2019) described that creativity is about reaching new insights. In this exercise, Midjourney interpreted our keywords into unexpected visualisations. These visualisations helped to inform the next set of keywords (refer to example, where the image of the crow resulted in the star of the show keyword), and we iterated through this interplay until we reached the final portrayal of the persona. As Wiltschnig, Christensen, and Ball (Citation2013) argued that convergent and divergent processes are mutual and necessary for creative design, Midjourney played an integral role in our creative design process. While we provided Midjourney with descriptions focused on actions, locations, and time (refer to examples), the imagery gave us visual cues that guided us to discover our defined persona. Without Midjourney, the overall idea of the persona would not have emerged as quickly and would have been arguably different. As Wertheimer (Citation2020) posited creativity is also about the emergence of an idea, our study demonstrated that Midjourney enhanced our creative process, by providing faster and unexpected visualisations. Instead of taking the time and relying on sketching skills to visualise the idea, Midjourney produced four options within one minute for us to review (refer to examples). More importantly, it synthesised a variety of different keywords into one image, a task that may have been challenging to sketch manually. This enabled us to push our divergent thinking even further, going beyond our usual exploration boundary.

When we created our five design features (refer to ) and subsequently our architectural proposal (refer to ), we started with a range of keywords but gradually concentrated the keywords with each iteration. Midjourney made ‘divergent interpretations’ and visualised different kinds of imagery based on the concentrated keywords as well as prior iterations (refer to , Divergent interpretations). Like the design of our persona, Midjourney images provided unexpected visualisations where we analysed and extracted interesting spatial features into keywords used in the next iteration of inputs (refer to example, where two of the four generated images informed lots of vegetation and cubic matrix room keywords), and we iterated through this interplay until we reached the final spatial design. While we provided Midjourney with descriptions focused on functions, quality, and geometry, to channel the generated images to show specific architectural spaces, the images consisted of unexpected qualities which we used as keywords to enhance the specificity of subsequent image generations (refer to ). While we did not apply this design process to create the final architectural proposal, the proposal was a derivative of the five spatial designs. Midjourney is nonetheless a two-dimensional image-generating tool and so does not consider how the represented spaces work in three dimensions. Without Midjourney’s role in the development of the five spaces, the overall architectural proposal would be different.

Reflecting on our ‘failures’

As an exploration of how Midjourney can be used as an architects’ design tool, many early prompt attempts failed to produce images significant enough for us to either generate more ideas or to identify insights from the Midjourney outputs. These ‘failed’ images were either too abstract to suggest design ideas or directions or too concrete to provoke further interrogation into the design project. An example of an abstract image was the output based on our Hoddle Grid and skyscrapers prompt (see , Image #1). While the image did not prompt any creative thinking, it indicated to us that the original text might have been too abstract thus informing us to provide a more tangible keyword (i.e. Melbourne city) with our second iteration (see , Image #2). This reflection and change in prompt were our process to learn the tool; through active experimentation and learning.

GenAI implications on the creative design process

At the start of our paper, we briefly discussed the difference between verbal and visual creativity and postulated that architects’ creative cognition is often in the visual creativity domain. However, such text-to-image GenAI tools rely heavily on the textual prompts used as well as the semantics of the prompts. This might imply that designers looking to wield GenAI as a design tool may need to sharpen their verbal creativity to effectively use the software. In the Tool limitation section above, we described how clear prompts may still generate inaccurate visuals because of GenAI’s inconsistent image database, by giving the example of visualising hands. For designers to demonstrate more control over GenAI, they will need to rethink how their desired GenAI images are described in the image database, and prompt it accordingly. For example, using a ‘colouring book’ generates a more precise black and white line drawing image than prompting GenAI with a ‘black and white line drawing’ itself. While these ‘imitation’ prompts may raise concerns about the authenticity of reference materials used in GenAI image creation, these GenAI tools are simply image-making software and should be treated, for the time being, as visualisation software, similar to Adobe Photoshop. Using such GenAI tools effectively may require designers to improve their verbal creativity, as it relies on the designer’s creative ability to reframe their desired GenAI outcomes from different perspectives and use less accurate but more precise textual prompts to represent their desired outcomes.

While designers may purchase and use pre-written prompts (see e.g. PromptBase Citation2023; Prompti.ai Citation2023), our research demonstrated that the initial prompts are often only the starting point. Designers and designers will still need to finetune their textual prompts to generate a broad variety of visual ideas and to curate a set of generated images to suggest a single idea. Additionally, GenAI software is constantly evolving, and the speed of change might disrupt the designers’ ability to curate a coherent series of Midjourney output for their design process. Software changes can range from revising the AI algorithms used to translate text to image as well as access changes to the software’s image database. Midjourney version 4, released during our project, gave the GenAI access to a larger image database to reference from and changed how prompts are analysed and translated to images. Version 5, released during the review of our article, produced more photorealistic and sharper images by default and could create new surrounding imagery (zoom-out function). Such abrupt changes can break the coherence of Midjourney outcomes, which may make the designer’s GenAI-embedded process appear disjointed.

GenAI limitations and implications on the architectural design process

As mentioned above, Midjourney, and other similar GenAI tools, are currently two-dimensional image-generating software and have limitations and implications on the architectural design process. Architectural design is a three-dimensional endeavour, where images only represent either certain elements of the full architectural design or from obscured perspectives of the space. Therefore, when such GenAI tools reference their database of architectural images to propose an architectural design, the GenAI image may be arousing but is unlikely to have accounted for how the spaces work in real life before representing the space in the generated image.

However, GenAI tools are rapidly evolving, and these limitations may likely be addressed to an extent. For example, Midjourney v4 introduced a blending function, where multiple images can be hybridised with keywords. This function can give the architects greater control over the architectural representation accuracy in the design concept stage. Additionally, Midjourney v5 introduced the vary region function, which enables users to regenerate only parts of an image. This can provide architects the ability to quickly iterate aesthetic features simply through text. An example would be iterating the style of windows of a building photographed on a street. This can immensely reduce the time taken to propose design options to clients. While GenAI tools are not new to designers, their recent rapid development and mainstream accessibility have demonstrated their potential capabilities to shape how architects design.

Limitations

This study has several limitations. We only showed key Midjourney images created in the design process to demonstrate its role in Maher, Poon, and Boulanger (Citation1996) Co-evolution of Design model. Showing and analysing all the images created by Midjourney within our design process would have revealed significant repetition of similar visualisations. These repetitions were partly the result of us experimenting with its capabilities. While we deemed the repetitions insignificant in progressing our divergent-convergent thinking process and the overall creative design, we acknowledge that the repetitions might have made the unexpected visualisations (of persona experience and spatial design features) more apparent to us. By documenting each iteration of keywords and generated images, as well as the time between iterations, we might achieve more measurable insights into how many iterations of keywords and generated images were needed to achieve an observable advancement of the idea.

Recommendations

Building on the findings from this study, we propose the following research. Keywords can adhere to a pre-defined list of word types. While we used action, location, and time in the development of the persona, future research could implement a greater range of descriptions (e.g. textures, image composition, mood). Synonyms could also be used to replace keywords (e.g. watching replaced with seeing, observing, looking) to generate different images for comparison and analysis. This may reveal how semantics affects AI-generated images, and possibly open new avenues to discover the significance of verbal creativity for architects.

As this project uses Research through Design methodology and the findings are based on our hands-on insights, this process can be tested with multiple architectural designers working on the same project brief. The variations in design outputs between designers may reveal greater richness in the opportunities and challenges of using GenAI in the architectural design process. This future research can also include a creativity self-assessment component, to compare how architects assessed their level of creativity pre- and post- working with GenAI. Additionally, the assessment can also target the impact on divergent and convergent thinking, by surveying participants on how GenAI enabled them to ideate more broadly and quicker and how GenAI narrowed down their ideas and helped them make decisions respectively.

Conclusion

GenAI tools hold great potential to help architects visualise their design ideas quickly. In the last year alone, there has been an exponential increase in architectural-like AI visualisations made by the public. This has raised questions on the usefulness and potential disruptions of AI to architects. Our study captured how we used Midjourney, a text-to-image GenAI, as a tool to enhance our creative cognition during design. The first part of our study demonstrated how Midjourney generated imagery that converged our exploratory thinking towards a cohesive overarching idea that was used as the brief for our design project. The second part of our study demonstrated how Midjourney generated imagery that produced a diverse range of spatial ideas based on a set of targeted keywords. These demonstrations, which were visually summarised into our synthesised creativity and design framework, show designers and researchers when in their design project they can use GenAI to enhance their divergent and creative thinking. Further research that replaces keyword inputs with alternatives is needed to interrogate how semantics influences architectural spaces and the visual interpretations created by AI software.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Linus Tan

Dr Linus Tan is an architecture lecturer in the School of Design and Architecture at Swinburne University of Technology, Australia. He researches how designers think, act, and learn (design cognition, behavior and professional learning), and how using generative AI software changes how designers and design teams think, act, and learn.

Max Luhrs

Max Luhrs recently completed his master's degree in Architecture and Urban Design at Swinburne University of Technology. He works in practice at Ewert Leaf Architects and his current design research explores and applies the use of text-to-image Generative AI tools in architecture, with specific attention to MidJourney and non-linear design processes.

References

  • Amabile, T. M. 2019. Creativity in Context: Update to the Social Psychology of Creativity. Routledge.
  • Amershi, S., D. Weld, M. Vorvoreanu, A. Fourney, B. Nushi, P. Collisson, J. Suh, et al. 2019. “Guidelines for Human-AI Interaction.” In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–13. https://doi.org/10.1145/3290605.3300233.
  • As, I., S. Pal, and P. Basu. 2018. “Artificial Intelligence in Architecture: Generating Conceptual Design Via Deep Learning.” International Journal of Architectural Computing 16 (4): 306–327. https://doi.org/10.1177/1478077118800982.
  • Aziz-Zadeh, L., S.-L. Liew, and F. Dandekar. 2013. “Exploring the Neural Correlates of Visual Creativity.” Social Cognitive and Affective Neuroscience 8 (4): 475–480. https://doi.org/10.1093/scan/nss021.
  • Brown, T. 2008. “Design Thinking.” Harvard Business Review 86 (June): 84–92. https://doi.org/10.5437/08956308X5503003.
  • Camburn, B., R. Arlitt, D. Anderson, R. Sanaei, S. Raviselam, D. Jensen, and K. L. Wood. 2020. “Computer-Aided Mind Map Generation Via Crowdsourcing and Machine Learning.” Research in Engineering Design 31 (4): 383–409. https://doi.org/10.1007/s00163-020-00341-w.
  • Csikszentmihalyi, M. 2014. The Systems Model of Creativity, the Collected Works of Mihaly Csikszentmihalyi. CA: Springer. https://doi.org/10.1007/978-94-017-9085-7.
  • Csikszentmihalyi, M. R. 2013. Creativity: The Psychology of Discovery and Invention. New York: Harper Perennial.
  • Dake, D. M. 1991. “The Visual Definition of Visual Creativity.” Journal of Visual Literacy 11 (1): 100–104. https://doi.org/10.1080/23796529.1991.11674461.
  • Dău-Gaşpar, O. 2013. “Verbal and Figural Creativity in Contemporary High-School Students.” Procedia - Social and Behavioral Sciences 78: 662–666. https://doi.org/10.1016/j.sbspro.2013.04.371.
  • de Bono, E. 1985. Six Thinking Hats: An Essential Approach to Business Management. New York: Little, Brown.
  • del Campo, M., and S. Manninger. 2022. “Strange, But Familiar Enough: The Design Ecology of Neural Architecture.” Architectural Design 92 (3): 38–45. https://doi.org/10.1002/ad.2811.
  • del Campo, M., S. Manninger, M. Sanche, and L. Wang. 2019. "The Church of AI - an Examination of Architecture in a Posthuman Design Ecology." In Proceedings of the 24th CAADRIA Conference, 767–772. Wellington, New Zealand: Association for Computer-Aided Architectural Design Research in Asia. https://doi.org/10.52842/conf.caadria.2019.2.767.
  • Design Council UK. 2019. “What Is the Framework for Innovation? Design Council’s Evolved Double Diamond.” Design Council UK, September 10. https://www.designcouncil.org.uk/news-opinion/what-framework-innovation-design-councils-evolved-double-diamond
  • Dorst, K. 2018. Notes on Design: How Creative Practice Works. Amsterdam, the Netherlands: BIS Publishers.
  • Dorst, K. 2019. “Co-Evolution and Emergence in Design.” Design Studies 65: 60–77. https://doi.org/10.1016/j.destud.2019.10.005.
  • Dzieduszyński, T. 2022. “Machine Learning and Complex Compositional Principles in Architecture: Application of Convolutional Neural Networks for Generation of Context-Dependent Spatial Compositions.” International Journal of Architectural Computing 20 (2): 196–215. https://doi.org/10.1177/14780771211066877.
  • Fink, A., T. Reim, M. Benedek, and R. H. Grabner. 2020. “The Effects of a Verbal and a Figural Creativity Training on Different Facets of Creative Potential.” The Journal of Creative Behavior 54 (3): 676–685. https://doi.org/10.1002/jocb.402.
  • Florian, M.-C. 2022. “Can Artificial Intelligence Systems like DALL-E or Midjourney Perform Creative Tasks?” ArchDaily, August 15. https://www.archdaily.com/987228/can-artificial-intelligence-systems-like-dall-e-or-midjourney-perform-creative-tasks
  • Frankel, L., and M. Racine. 2010. “The Complex Field of Research: For Design, Through Design, and About Design.” In Proceedings of the Design Research Society (DRS) International Conference (No. 043), 12. Montreal, Canada: Université de Montréal.
  • Frayling, C. 2012. “Research in Art and Design.” In Mapping Design Research, 95–108, edited by S. Grand and W. Jonas. Vol. 1. Germany: Birkhäuser.
  • Gold, R., M. Faust, and E. Ben-Artzi. 2012. “Metaphors and Verbal Creativity: The Role of the Right Hemisphere.” Laterality: Asymmetries of Body, Brain and Cognition 17 (5): 602–614. https://doi.org/10.1080/1357650X.2011.599936.
  • Guilford, J. P. 1950. “Creativity.” American Psychologist 5 (9): 444–454. https://doi.org/10.1037/h0063487.
  • Guilford, J. P. 1956. “The Structure of Intellect.” Psychological Bulletin 53 (4): 267–293. https://doi.org/10.1037/h0040755.
  • Herman, L. M., and A. H.-C. Hwang. 2022. “In the Eye of the Beholder: A Viewer-Defined Conception of Online Visual Creativity.” New Media & Society 26 (5): 2721–2747. https://doi.org/10.1177/14614448221089604.
  • Hetrick, S. H., R. S. Lilly, and P. R. Merrifield. 1968. “Figural Creativity, Intelligence, and Personality in Children.” Multivariate Behavioral Research 3 (2): 173–187. https://doi.org/10.1207/s15327906mbr0302_3.
  • Iñiguez, A. 2022. “The Use of Artificial Intelligence as a Strategy to Analyse Urban Informality” (A. P. Bravo, Trans.). ArchDaily, March 15. https://www.archdaily.com/978356/the-use-of-artificial-intelligence-as-a-strategy-to-analyse-urban-informality
  • Kaufman, J. C., and J. Baer. 2004. “Hawking’s Haiku, Madonna’s Math: Why It is Hard to Be Creative in Every Room of the House.” In Creativity: From Potential to Realization, edited by R. J. Sternberg, E. L. Grigorenko, and J. L. Singer, 3–19. Washington, DC: American Psychological Association. https://doi.org/10.1037/10692-001.
  • Kaufman, J. C., and R. A. Beghetto. 2009. “Beyond Big and Little: The Four C Model of Creativity.” Review of General Psychology 13 (1): 1–12. https://doi.org/10.1037/a0013688.
  • Kaufman, J. C., J. A. Plucker, and J. Baer. 2008. Essentials of Creativity Assessment. Hoboken, NJ: John Wiley & Sons.
  • Landry, R. G. 1973. “The Relationship of Second Language Learning and Verbal Creativity.” The Modern Language Journal 57 (3): 110–113. https://doi.org/10.1111/j.1540-4781.1973.tb04676.x.
  • Lee, K.-F. 2018. “How AI Can Save Our Humanity.” TedTalks, Vancouver, British Columbia, April. https://www.ted.com/talks/kai_fu_lee_how_ai_can_save_our_humanity
  • Luhrs, M., and L. Tan. 2022. “Incorporating Artificial Intelligence Generated Visualisations in the Architectural Design Process.” In Architecture, Media, Politics and Society: Representing Pasts Visioning Futures Conference , Online, December 1. https://amps-research.com/event/visioning/schedule/digitality-architecture-urban-space/incorporating-artificial-intelligence-generated-visualisations-in-the-architectural-design-process/
  • Mackinnon, D. W. 1962. “The Nature and Nurture of Creative Talent.” American Psychologist 17 (7): 484–495. https://doi.org/10.1037/h0046541.
  • Maher, M. L., J. Poon, and S. Boulanger. 1996. “Formalising Design Exploration as Co-Evolution: A Combined Gene Approach.” Advances in Formal Design Methods for Computer-Aided Design 3–30. https://doi.org/10.1007/978-0-387-34925-1_1.
  • Midjourney 2022. “We’re Officially Moving to Open-Beta!” [Tweet]. Twitter, July 13. https://twitter.com/midjourney/status/1547108864788553729
  • Miller, A. I. 2019. The Artist in the Machine : The World of AI-Powered Creativity. Cambridge, MA: MIT Press.
  • Mumford, M. D. 2001. “Something Old, Something New: Revisiting Guilford’s Conception of Creative Problem Solving.” Creativity Research Journal 13 (3–4): 267–276. https://doi.org/10.1207/S15326934CRJ1334_04.
  • OpenAI. 2022. “DALL·E Now Available Without Waitlist.” OpenAI, September 28. https://openai.com/blog/dall-e-now-available-without-waitlist/
  • Özerol, G., and S. Arslan Selçuk. 2022. “Machine Learning in the Discipline of Architecture: A Review on the Research Trends Between 2014 and 2020.” International Journal of Architectural Computing 21 (1): 23–41. https://doi.org/10.1177/14780771221100102.
  • Palmiero, M., R. Nori, V. Aloisi, M. Ferrara, and L. Piccardi. 2015. “Domain-Specificity of Creativity: A Study on the Relationship Between Visual Creativity and Visual Mental Imagery.” Frontiers in Psychology 6: 1870. https://doi.org/10.3389/fpsyg.2015.01870.
  • Petsche, H. 1996. “Approaches to Verbal, Visual and Musical Creativity by EEG Coherence Analysis.” International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology 24 (1–2): 145–159. https://doi.org/10.1016/s0167-8760(96)00050-5.
  • Preiss, D. D., M. Ibaceta, D. Ortiz, H. Carvacho, and V. Grau. 2019. “An Exploratory Study on Mind Wandering, Metacognition, and Verbal Creativity in Chilean High School Students.” Frontiers in Psychology 10: 1118. https://doi.org/10.3389/fpsyg.2019.01118.
  • PromptBase. 2023. “Prompt Marketplace.” PromptBase. https://promptbase.com
  • Prompti.ai. 2023. “Prompt Marketplace for AI.” Prompti.ai, January 18. https://prompti.ai/
  • Schön, D. A. 1983. The Reflective Practitioner: How Professionals Think in Action. NY: Basic Books, Inc.
  • Serra, G., and D. Miralles. 2021. “Human-Level Design Proposals by an Artificial Agent in Multiple Scenarios.” Design Studies 76: 101029. https://doi.org/10.1016/j.destud.2021.101029.
  • Simeone, L., R. Mantelli, and A. Adamo. 2022. “Pushing Divergence and Promoting Convergence in a Speculative Design Process: Considerations on the Role of AI as a Co-creation Partner.” In DRS2022. Bilbao: Design Research Society. https://doi.org/10.21606/drs.2022.197.
  • Stability.ai. 2022. “Stable Diffusion Public Release.” Stability.Ai. https://stability.ai/blog/stable-diffusion-public-release
  • Stein, M. I. 1953. “Creativity and Culture.” The Journal of Psychology 36 (2): 311–322. https://doi.org/10.1080/00223980.1953.9712897.
  • Studente, S., N. Seppala, and N. Sadowska. 2016. “Facilitating Creative Thinking in the Classroom: Investigating the Effects of Plants and the Colour Green on Visual and Verbal Creativity.” Thinking Skills and Creativity 19: 1–8. https://doi.org/10.1016/j.tsc.2015.09.001.
  • Tamke, M., P. Nicholas, and M. Zwierzycki. 2018. “Machine Learning for Architectural Design: Practices and Infrastructure.” International Journal of Architectural Computing 16 (2): 123–143. https://doi.org/10.1177/1478077118778580.
  • Tamke, M., M. Zwierzycki, A. H. Deleuran, Y. S. Baranovskaya, I. F. Tinning, and M. R. Thomse. 2017. “Lace Wall—Extending Design Intuition Through Machine Learning.” In Fabricate 2017, edited by A. Menges, B. Sheil, R. Glynn, and M. Skavara, 98–105. London, UK: UCL Press. https://doi.org/10.2307/j.ctt1n7qkg7.
  • Tan, L. 2023. "Using textual GenAI (ChatGPT) to extract design concepts from stories." In The 7th International Conference for Design Education Researchers. London, UK: Design Research Society. https://doi.org/10.21606/drslxd.2024.054.
  • Tan, L., and Luke, T. 2024. "Accelerating future scenario development for concept design with text-based GenAI (ChatGPT)." In Proceedings of the 29th CAADRIA Conference, 39–48. Singapore: Association for Computer-Aided Architectural Design Research in Asia.
  • Tan, L., T. Luke, A. di Pietro, and A. Kocsis. 2023. “Using Machine Learning as a Material to Generate and Refine Aircraft Design Prototypes.” In EKSIG 2023: From Abstractness to Concreteness, 796–813. Milano, Italy: Politecnico di Milano.
  • Torrance, E. P. 1962. “Testing and Creative Talent.” Educational Leadership 20 (1): 7–10.
  • Torrance, E. P., and O. E. Ball. 1966. Torrance Tests of Creative Thinking. Norms-Technical Manual. Princeton, NJ: Personnel Press.
  • Wertheimer, M. 2020. Max Wertheimer Productive Thinking (V. Sarris, Ed.). Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-36063-4.
  • Wilkinson, S., G. Bradbury, and S. Hanna. 2014. “Approximating Urban Wind Interference.” In Proceedings of the Symposium on Simulation for Architecture & Urban Design (SimAUD ’14), 143–150. FL: Simulation Councils, Inc.
  • Wiltschnig, S., B. T. Christensen, and L. J. Ball. 2013. “Collaborative Problem-Solution Co-Evolution in Creative Design.” Design Studies 34 (5): 515–542. https://doi.org/10.1016/j.destud.2013.01.002.
  • Wit, A. J., L. Vasey, V. Parlac, M. Marcu, W. Jabi, D. Gerber, M. Daas, and M. Clayton. 2018. “Artificial Intelligence and Robotics in Architecture: Autonomy, Agency, and Indeterminacy.” International Journal of Architectural Computing 16 (4): 245–247. https://doi.org/10.1177/1478077118807266.
  • Wu, Z., D. Ji, K. Yu, X. Zeng, D. Wu, and M. Shidujaman. 2021. “AI Creativity and the Human-AI Co-creation Model.” In Human-Computer Interaction. Theory, Methods and Tools, Vol. 12762, 171–190, edited by M. Kurosu. Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-78462-1_13.
  • Yemez, N., and K. Dikilitaş. 2022. “Development of Verbal Creativity by Bilingual and English as Foreign Language Learners in Kindergarten to 8th Grade Schools.” Creativity Studies 15 (1): 25–39. https://doi.org/10.3846/cs.2022.12603.
  • Zhu, W., Q. Chen, L. Xia, R. E. Beaty, W. Yang, F. Tian, J. Sun, et al. 2017. “Common and Distinct Brain Networks Underlying Verbal and Visual Creativity: Brain Networks Underlying Verbal and Visual Creativity.” Human Brain Mapping 38 (4): 2094–2111. https://doi.org/10.1002/hbm.23507.