2,845
Views
0
CrossRef citations to date
0
Altmetric
Review

Transcriptional condensates and phase separation: condensing information across scales and mechanisms

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2213551 | Received 10 Jan 2023, Accepted 10 May 2023, Published online: 22 May 2023

ABSTRACT

Transcription is the fundamental process of gene expression, which in eukaryotes occurs within the complex physicochemical environment of the nucleus. Decades of research have provided extreme detail in the molecular and functional mechanisms of transcription, but the spatial and genomic organization of transcription remains mysterious. Recent discoveries show that transcriptional components can undergo phase separation and create distinct compartments inside the nucleus, providing new models through which to view the transcription process in eukaryotes. In this review, we focus on transcriptional condensates and their phase separation-like behaviors. We suggest differentiation between physical descriptions of phase separation and the complex and dynamic biomolecular assemblies required for productive gene expression, and we discuss how transcriptional condensates are central to organizing the three-dimensional genome across spatial and temporal scales. Finally, we map approaches for therapeutic manipulation of transcriptional condensates and ask what technical advances are needed to understand transcriptional condensates more completely.

This article is part of the following collections:
Phase Separation in Nuclear Biology

Introduction

Transcription is a core biochemical process of life. This DNA-dependent synthesis of RNA molecules by RNA polymerase enzymes forms the basis of gene expression [Citation1]. Regulation of transcription is a fundamental requirement of cell biology and must be tightly controlled in space, time and genomic location for organisms to develop properly and avoid disease. This process occurs in the extremely complicated and physicochemically diverse environment of the cell nucleus in eukaryotes. While decades of careful molecular and structural biology have given us unparalleled understanding of the individual components of the transcriptional machinery, and many aspects of transcriptional dynamics, genomics, and biology are well explained, there are outstanding questions about how and where transcription occurs and is maintained by the cell. The recently appreciated process of phase separation leading to membraneless biomolecular condensates may help explain many aspects of transcription [Citation2,Citation3]. Broadly defined, transcriptional condensates (TCs) are local assemblies of DNA-binding transcription factors (TFs), transcription co-activators, the transcriptional machinery (such as RNA polymerases and their cofactors), the RNA products of transcription, associated signaling or metabolite molecules and the immediate chromatin environment. Here we focus on the role of TFs, co-activators, and RNA polymerases in TC formation, but the additional components are equally important.

While ‘transcriptional condensate’ is the most inclusive term for molecular assemblies driving transcription, the formation and dissolution of these condensates is generally explained by the process of phase separation. This describes two or more components in a mixture separating into distinct phases, such as liquid and gel, upon physical and energetic conditions like concentration or heat, and does not require stable or stoichiometric molecular interactions to achieve phase transition [Citation2,Citation4]. Classical models of phase separation invoke saturating concentrations and have many requirements for molecular transport, such as a clear boundary between the phase-separated droplet and surrounding solvent, and are independent of specific protein–protein interactions [Citation5,Citation6]. These restraints, while useful, may not reflect the nature or biological utility of condensate-based molecular organization. Instead, the recently developed model of Phase Separation Coupled to Percolation (PSCP/PS++) considers condensates as complex viscoelastic network fluids at the mesoscale, with both specific and nonspecific interactions leading to emergent biophysical properties [Citation7]. Additionally, the concept of microphase separation is particularly useful to describe transcription, where the chromatin is considered to be a microemulsion with other biomolecules such as RNA and associated proteins preventing full phase separation through linkage of otherwise separated phases [Citation8]. These concepts that go beyond classical phase separation can explain the molecular behavior of the multicomponent ribo-nucleoprotein complexes that dynamically self-organize to power transcription.

The last decade’s advances in next-generation sequencing capabilities, super-resolution imaging techniques, protein structure determination, and conceptual understanding of the nuclear environment have combined to offer a new view of transcription, with 3D genome organization, spatial chromatin remodeling, and condensate formation as central to transcriptional regulation. The cooperative and dynamic properties of TCs and their regulatory functions may provide a long-sought link between genome structure and biological function [Citation9]. Importantly, an increasing number of studies show that transcription-dependent condensate formation remodels the 3D genome to sustain changes in gene expression, from osmotic shock [Citation10] to heat response [Citation11] to cancer-causing gene fusion products [Citation12]. Using a condensate-based model of transcription can reconcile conflicting observations about transcription factor behavior, and also explain the relationship between many possible regulatory sequences and their specific effects on gene expression. Therefore, TCs are essential for our understanding of genome organization, cell fate specificity and responses, and how these processes can be modulated to improve human health.

The detailed biophysics and cellular organization of phase separation are beyond the scope of this review and have been described in detail elsewhere [Citation13], as have the roles of biomolecular condensates in diseases including cancer, neurodegeneration, and systemic protein dysregulation [Citation14,Citation15]. We also do not systematically describe the roles of condensates in transcription mediated by stress granules, paraspeckles, or histone locus bodies, or by polymerases other than RNA Polymerase II (RNAPII) such as rRNA transcription in the nucleolus mediated by RNA Polymerase I, transcription in mitochondria, transcription-coupled DNA repair, or transcription in the bacterial nucleoid. This review focuses on condensate-based mechanisms of RNAPII-mediated transcription and the role of TCs in regulating 3D genome organization to achieve cell-type specific gene expression programs. We also consider future directions in modulating these condensates for human health, and the potential for technological breakthroughs to better understand this fundamental aspect of molecular and cellular biology.

Condensates are a fundamental aspect of nuclear organization that occur from the nano- to mesoscales

The eukaryotic nucleus is one of the most information-dense objects in nature. The crowded and extremely heterogenous environment necessitates a variety of mechanisms to separate biochemical functions in space, time and within the genome. Therefore, compartmentalization of the different biochemical functions is a fundamental process. The spatially, temporally and genomically discrete nature of transcription has been recognized for decades, arguably from the discovery of sub-nuclear bodies in the 19th century, and from early electron micrographs of nuclear structures in the 1970s [Citation16], as well as in seminal studies of nuclear organization with early high-resolution light microscopy [Citation17,Citation18]. Improvements in experimental resolution have revealed a distinct series of sub-nuclear channels throughout the nucleus together called the Active Nuclear Compartment, that is relatively free of chromatin and where most biochemical and enzymatic activity occurs [Citation19]. Observations that RNAPII forms large discrete clusters in human cells give rise to the ‘transcription factory’ hypothesis that many genes are transcribed at a limited number of sites with high concentrations of RNAPII [Citation20–22]. This historical understanding foreshadowed the emergence of condensates as a field of study, but the rise of genomic technologies has encouraged transcription to be understood as a collection of individual genome-binding events, instead of as coordinated multimeric complexes with emergent properties. Importantly, chromatin is itself an exceptionally large and complex condensate that behaves similarly to a solid on the mesoscale, despite its more dynamic properties on the microscale [Citation23]. In concert with the constraints imposed by the 3D organization of the genome and the channel-like structure of the nucleoplasm [Citation19], chromatin and implicitly transcription can be regulated by both intrinsic physicochemical and biologically responsive condensation mechanisms [Citation24].

The classical physical definition of phase separation leading to macromolecular assemblies requires a saturating concentration of at least one type of molecule in a solvent, which is not well suited for the heterogeneous nucleoplasmic and genomic environment of transcription. While we have defined TCs in terms of their specific components, a broader definition is generally necessary for biomolecular condensates. We consider condensates to encompass any non-stoichiometric assembly of more than two individual molecules, which then recruit additional similar or distinct molecules to drive biological function beyond what individual components could achieve alone. This functional definition does not require strict parameters of condensate morphology, biophysical properties, or number of molecules. Alternative terminologies for the phase-separation-like, multi-protein complexes driving transcription have included assemblages [Citation25], hubs [Citation26], or condensates [Citation27], all of which could be used to describe TCs. These molecular assemblies (described below in roughly increasing order of physical size) occur throughout the genome and take place on multiple scales of spatial and temporal organization, representing diverse ways genome structure and function are controlled within the cell.

Biomolecular condensates occur at drastically different physical and temporal scales throughout the cell [Citation13]. Condensate-like behavior can be observed on scales starting at the single-molecule level, such as the sequence-dependent condensation of the Klf4 TF on its recognition sequence at sub-saturated concentrations to reinforce selective binding, which operates at a larger scale than DNA absorbance of proteins but at a smaller scale than true phase separation [Citation28]. At the scale of single- or dimerized molecules, the binding of TFs to DNA can be a form of ‘wetting’, as protein complexes adhere to nucleosome-free regions of chromatin [Citation29]. This is the initiating factor of any TC, as the sequence-specific binding of one or more TFs to their target motifs throughout the genome is obligatory before recruitment of RNAPII and other factors is possible.

Once TC formation is initiated by TF binding, the size and duration of the TC increase through homo- and heterotypic recruitment of RNAPII and cofactors. As they grow and transcription is initiated, there may be clustering of multiple genomically distant sites of transcription, and increasing local concentrations of transcriptional machinery at sharply upregulated genes. These TCs then evolve into larger and more stable clusters as RNA is produced, shifting their functional output from transcription initiation to elongation and splicing. In some cases, such as the continuous transcription necessary at histone locus bodies or rDNA clusters, relatively permanent TCs may form and be recognizable as distinct sub-nuclear bodies [Citation30,Citation31]. All of these steps of transcriptional activation depend on condensate mechanisms, but occur at physical sizes, timescales, and numbers of molecules that can range over orders of magnitude. We characterize the presence of condensates at multiple physical, temporal, and molecular scales, and trace increasing levels of stability and transcriptional output as a function of their duration and size, as shown in .

Figure 1. Transcriptional condensates occur across spatial, temporal, and genomic scales.

Note: Transcriptional condensates are illustrated at multiple scales, from TF binding and clustering, to RNAPII recruitment and fully activated transcription, to clusters of transcription and stable nuclear bodies. In general, the level of order in the TC increases as a function of physical size, temporal stability, and molecular concentration.
Figure 1. Transcriptional condensates occur across spatial, temporal, and genomic scales.

Condensation facilitates TF binding and RNAPII function

A transcription factor contains a sequence-specific DNA-binding domain, accompanied by a regulatory domain that may bind ligands, interact with other protein partners, or self-associate. This small-scale clustering may not operate at the physical or concentration scales necessary to be defined as phase separation. How these single binding events are translated into stable and coordinated gene expression is a fundamental question that has led to condensate-based models of transcriptional regulation. The answer may lie in the capacity of TFs (and their co-activators) to facilitate condensate formation through the intrinsically disordered regions (IDRs) in their protein structures. An enrichment of IDRs, which lack higher order structure and can have diverse functions, is a characteristic of eukaryotic TFs, indicating that both TFs and many of their partners may interact with weak multivalent interactions and form condensates [Citation32–34]. These TF IDRs are necessary for enhancer clustering and transcription activation [Citation35]. Further results have supported the role of IDR-mediated condensation in shaping both the 3D genome and transcriptional response [Citation12,Citation34,Citation36]. Together these studies reinforce an important role for IDR-mediated condensate mechanisms as a general force in models of transcriptional activation [Citation37].

One important mechanism that IDR-mediated interactions may utilize in establishing TCs is the alteration of TF target search kinetics. Recent single-molecule studies have demonstrated that transcription factors in general move quickly throughout the nucleoplasm and slow down significantly at sites of binding, nascent transcription, and cofactor assembly [Citation38–40]. Interestingly, the glucocorticoid receptor (GR) when tracked in live cells reveals two sub-diffusion populations, one that represents chromatin-bound GR, and another that likely represents GR condensates that are transcriptionally active but not necessarily chromatin-bound [Citation41]. The transition of a TF from a circulating nucleoplasmic pool to being either bound and driving transcription, or unbound but still involved in transcriptional activity, is a critical step in TC formation and likely tunes the expression level of the locus where the TC is functioning. Differentiating between these populations and understanding which protein-binding partners, post-translational modifications, or other factors typify each population or initiate transitions between them represents an essential problem for future study.

Once TF binding at a given genomic location is established and TC formation initiated, the recruitment and regulation of the transcriptional machinery itself becomes of prime importance. RNAPII, as the central unit of the transcriptional apparatus, associates with its multi-protein cofactor complex Mediator at sites of transcription, and forms phase-separable condensates [Citation42]. RNAPII clustering is meditated through the phosphorylation status of the C-terminal domain (CTD) and the condensation properties thereof [Citation43]. Increasing phosphorylation levels of Serine 5 and then Serine 2 of the CTD promote initiation and elongation, respectively, and subsequently the transition from transcription to splicing. All of these processes constitute unique TCs that are temporally, spatially, and genomically distinct from each other [Citation44,Citation45]. Further, the IDR-mediated interactions between TFs, their coactivators, and RNAPII can influence kinetics, search time, and clustering behavior without relying on classical phase-separating mechanisms [Citation46]. The IDRs of hypoxia-induced TFs are mainly responsible for target search dynamics, diffusion capacity, and chromatin binding, while the recruitment of RNAPII is driven through more stable protein–protein interactions [Citation47]. RNAPII clustering is observed at the sites of the TAF15 co-activator protein, driving amplification of transcription and lowering the energetic barrier for further TAF15 accumulation in a positive feedback loop [Citation48]. RNAPII clusters in zebrafish embryos undergo ‘micro-phase’ separation through the continued association of RNA transcripts with RNAPII clusters [Citation8]. The transcription factor YOYO1 (YY1) also forms TCs, dependent on its IDR-containing activation domain, that include the transcriptional machinery components p300/CBP, Mediator (MED1), bromodomain-containing protein 4 (BRD4) and RNAPII [Citation49]. However, in some cases, TF-mediated activation of transcription is independent of, and even reduced by, the condensate-forming capacity of the disordered TF activating domain [Citation50].

This process also links the activation function of TCs to their counterbalancing function of repressing transcription in response to reaching necessary expression levels or to external signals such as stress. In the case of stress response, condensates containing negative elongation factor can form to downregulate transcription [Citation51], illustrating that control of TC formation can work in both activating and repressive directions. Many TFs and co-activators show condensate-based tuning of their activity at both basal and high levels of gene expression, whereby excessive condensate formation leads to a reduction in gene expression as the condensate becomes more stable [Citation39,Citation52]. The production of RNA transcript itself can also function as a regulatory feedback mechanism for TC formation, where low levels of transcript stimulate TCs and transcriptional initiation, but higher levels of transcript cause TC dissolution and transcriptional arrest [Citation53]. Additionally, a recent study using RD-SPRITE, a proximity-based approach to simultaneously detect 3D genome organization and nascent mRNA, indicates that transcription is still present in chromatin regions previously considered inactive [Citation54]. While not identifying TCs directly, these findings underline the multi-functional nature of TCs, and suggest roles for them not only in activating genes within a permissive chromatin environment but also in repressing excessive transcript production and in regulating transitions from repressive to active chromatin and vice versa. The capacity of many components of transcription to form condensates, and reach peak activity only once a critical mass of components has been assembled, may be a cellular strategy to reduce abundant transcriptional noise [Citation55]. The emergent and non-linear properties of condensates likely provide a way for the cell to achieve quantitative control of gene expression, post-transcriptional processing, and genome organization through the physical separation of macromolecules [Citation56]. Together, these studies demonstrate that biomolecular condensate formation is a fundamental regulatory mechanism for transcription.

Chromatin adaptor proteins drive condensate formation for responsive 3D genome organization and transcriptional regulation

Transcriptional activity must be finely regulated in both space and time in response to developmental, environmental, and disease-related stimuli. The genome usually contains many thousands of potential binding sites for any given transcription factor, and transcriptional machinery is a constant component of the nucleoplasm and engages in pervasive low-level transcription throughout the genome [Citation57]. Despite the general accessibility of transcriptional components, transcription levels are not evenly distributed among genomic regions. Thus, another factor is necessary to direct local concentrations of transcriptional machinery, chromatin remodelers, and necessary cofactors to specific genomic locations indicated by transcription factor binding. The formation of TCs is also a plausible mechanism to explain how and why transcription levels are not evenly distributed among genomic regions. Chromatin adaptor proteins, or transcriptional coactivators that bind to transcription factors or transcriptional machinery but do not bind DNA, can function as this physical and informational link to drive TC formation at specific genomic locations ().

Figure 2. Chromatin adaptor proteins provide a physical and informational hub for condensate formation and diverse biological processes.

Note: An illustration of the process of chromatin adaptor proteins functioning to coordinate TF binding, transcriptional machinery recruitment, and condensate formation. Each step of the process can be self-perpetuating and recruit additional components to create functional condensates, but the linkage of genomic specificity with general transcription and remodeling processes through chromatin adaptors provides a central link between TF binding, chromatin remodeling, condensate formation, 3D genome organization, and the final gene expression and cell fate outcomes.
Figure 2. Chromatin adaptor proteins provide a physical and informational hub for condensate formation and diverse biological processes.

There are several examples of chromatin adaptors driving condensate formation. The intrinsically disordered transcriptional co-activator Yes-associated protein 1 (YAP1) forms active transcriptional hubs that reorganize the 3D genome in response to osmotic stress [Citation10]. These condensates also contain the YAP1 partner TEA-domain transcription factor 1 (TEAD1) and are indicative of condensate formation by YAP1 and its paralog WW domain-containing transcription regulator protein 1 (WWTR1 or TAZ) in several other biological contexts [Citation58,Citation59]. The chromatin adaptor protein LIM-domain-binding protein 1 (LDB1) is also critical to forming TF complexes that drive specific outcomes in systems including erythropoiesis and olfactory neuron specification, and these events are likely driven by LDB1’s disordered dimerization domain [Citation60]. Together, this implies a mechanism where the initial, smaller chromatin-associated TF-containing TCs are energetically favored, but cannot form and engage the overall transcriptional machinery without the contributing force of a chromatin adaptor protein or a ligand- or partner-induced change in the local environment to form larger transcriptionally permissive TCs. There are likely many other chromatin adaptor proteins essential for forming TCs with their TF partners that have yet to be described. Exploring the roles of chromatin adaptor proteins and their effects on partner TFs to induce TCs remains an important and under-explored area. Understanding these relationships is essential to unify classical protein–protein binding, 3D genome organization, and TC formation mechanisms that cell uses to respond to developmental, environmental, or disease-related external signals.

The connections between TC formation and 3D genome organization are most clear in the case of enhancer–promoter interactions. The necessity of enhancer–promoter contacts for gene expression has been widely accepted after rigorous debate in the field [Citation61,Citation62]. There are many examples of specific enhancer–promoter contacts being developmentally required and maintained despite the loss of genome structural proteins [Citation63,Citation64], and recent whole-genome enhancer studies have shown a general dependency on enhancer–promoter interactions [Citation65]. However, individual loci can also show increasing enhancer–promoter distances during transcriptional activation and be robust to cis-regulatory perturbations [Citation66,Citation67]. This mechanistic variability implies that active regulatory sites can contact each other over long distances and that enhancer–promoter proximity can be temporally separated from their effects on transcription [Citation68]. The clustering of multiple regulatory sites and gene loci together in a TC to increase transcriptional output and enhance co-regulation may underlie a fundamental mechanism of coordinating gene expression programs toward a specific functional outcome for the cell.

Sites of high binding of regulatory factors to distal elements of genes, particularly with high levels of activating histone acetylation, are referred to as ‘superenhancers’, or enhancers with particularly high transcription of their target genes [Citation69]. The formation of superenhancers that increase transcriptional output has been linked to the condensation of regulatory factors including MED1, BRD4, and others in some of the original demonstrations of TCs [Citation35,Citation70,Citation71]. The Mediator complex in particular has been shown to regulate enhancer–promoter interactions [Citation72]. This interaction is deeply conserved through evolution, with yeast cells utilizing an IDR-dependent interaction between heat-shock factor 1 (HSF1) and MED1 to create heat-shock-responsive RNA Pol II-containing TCs linking groups of target genes [Citation11]. MED1 condensates can also incorporate signaling molecules that direct transcription to specific response genes, performing a type of chromatin adaptor-like function [Citation73]. TCs are increasingly observed to be present both at super-enhancers that are experiencing active transcription [Citation70,Citation71,Citation74] and at latent enhancers being decommissioned [Citation75]. Estrogen-stimulated enhancers are activated through ligand-mediated condensation of homotypic enhancer elements and chromatin restructuring, but they can then become down-regulated and increasingly solidified if the ligand signal is prolonged [Citation76]. Superenhancers can also be created de novo through the action of TF fusion proteins that drive aberrant TC formation, such as NUP98-HOXA9 fusion proteins that reorganize 3D genome structure and gene expression in hematological malignancies [Citation12]. The direct effect of IDR-dependent TC formation on 3D genome organization is also shown by the effect that mutations ablating the core IDR in UTX/KDM6A have on reducing its tumor suppressive activity, creating disorganized histone modification patterns and reducing 3D genome contacts at promoters [Citation77]. These results indicate that formation of superenhancer clusters is a TC-driven phenomenon and link together enhancer control and 3D genome organization through TCs with both evolutionary and disease-related implications.

This understanding of TC formation at enhancers guided by chromatin adaptor proteins requires a return to the constraints of 3D genome organization and nuclear architecture. Chromatin organization is regulated by the structure of nucleosomes and large nonconcatenated loops of chromatin at multiple scales, as well as epigenetic histone modifications and structural genome organizing proteins including CTCF, cohesin, and others [Citation78]. Any discussion of TCs must take these overriding physical constraints into consideration, and they should ultimately be modeled and understood in the context of the nuclear environment. Modeling true-to-scale ensembles of chromatin finds that actively transcribed chromatin is enriched at interchromatin regions and exposed to the milieu of the nucleoplasm, and that synthetic beads, roughly the size of many proposed TCs at dozens of nanometers in diameter, are confined to the interchromatin space [Citation79]. The development of 3D-SIM super-resolution microscopy, particularly suited for imaging nuclear organization and sensitive enough to capture slight changes in chromatin intensity, has enabled several segmentation-based approaches to delineate the interchromatin space [Citation80]. These approaches have also been adapted to chromatin marks and structural proteins, demonstrating that the most active molecular sites are significantly enriched at the interface between compacted chromatin and the RNA-, protein- and solvent-rich interchromatin space [Citation81]. Other super-resolution microscopy analyses of chromatin states in individual cells have shown similar gradation of epigenetic marks and sites of transcription [Citation82]. Further, many aspects of chromatin organization such as topological domains may correspond to condensate formation at the boundaries between heterochromatin, active chromatin, and the nuclear milieu [Citation83]. It is possible that TCs can function to remodel repressive into open chromatin at the interchromatin boundary in response to developmental cues. The integration by chromatin adaptor proteins of specific TF binding, general transcription machinery recruitment and 3D genome re-organization through TCs at interchromatin boundaries is a previously unappreciated process that highlights how this mode of gene expression control is only beginning to be understood.

Targeting TCs for human health

The fundamental importance of biomolecular condensates makes them an intriguing but challenging target for therapeutic manipulation. The misregulation of condensates is implicated by pathogenic mutations of condensate-promoting proteins across the entire range of human diseases [Citation84]. In several types of cancer, various TFs are fused to condensate-forming proteins, driving aberrant TC formation at the TF binding sites [Citation12,Citation85]. TCs can also be utilized by endogenous retroviruses to increase their expression and can lead to pathological phenotypes [Citation36]. Accordingly, a number of both new and established pharmaceutical companies have programs to target condensates, but with varying levels of success. This may be due to discrepancies between in vitro assays suitable for drug screening but that miss relevant cellular contexts, highlighting the importance of cell-based assays for drug discovery and condensate formation.

One avenue of targeting condensates is to dissolve them, particularly by blocking interactions between condensate components with small molecules. For example, the TCs formed by YAP1 and TEAD1 can be disrupted by small-molecule inhibitors of the YAP-TEAD interaction [Citation86]. This strategy is promising because overactive YAP1 condensates are thought to lead to cancer, and aberrant condensate formation by YAP1 is also indicative of PD-1 immunotherapy resistance [Citation87]. Another method of utilizing TCs in therapy is using them to concentrate therapeutics at the relevant cellular or genomic location without dissolving the underlying condensate. This is particularly relevant in diseases involving hormone receptors such as the estrogen (ER), androgen (AR), and glucocorticoid (GR) receptors, as these proteins readily form ligand-induced condensates that can be harnessed for drug delivery [Citation76,Citation88,Citation89]. In one case, the nuclear receptor TF ERα can form phase-separated condensates in vitro with an ER-targeting drug [Citation90]. Additionally, the biophysical properties of disease-implicated condensates can be altered through drug treatment or genetic modifications to have a therapeutic effect, maintaining the presence of the condensate but reversing their deleterious biological function. For example, AR and several of its coregulators including MED1 demonstrate aberrant TC formation in types of castration-resistant prostate cancer [Citation91]. These TCs can be chemically targeted, and depending on the drug will in some cases dissolve AR condensates [Citation92], or in other cases physically harden them, sequestering their component TFs away from genomic regulatory sites and reducing the expression of androgen-responsive genes both in vitro and in vivo [Citation89]. Similarly, mutations in the YEATS domain of the chromatin remodeing protein eleven-nineteen leukemia (ENL) can lead to carcinogenic condensate formation, but over-expression of the same mutants can change the biophysical characteristics of these condensates from driving gene expression to being transcriptionally nonfunctional [Citation93]. Taken together, these observations imply that disease-associated condensate dissolution or hardening can be attenuated through small-molecule inhibitors. Other potential therapeutic interventions, including custom-designed protein constructs or changes to external signaling environments, while not discussed in this review, could be avenues to interrupt pathogenic TC formation [Citation94,Citation95]. These preliminary studies create several future directions for therapeutic intervention via TC function and should be explored vigorously though both academic and industrial research initiatives.

Technical advances necessary to understand the condensate nature of transcription

The nature of condensates, as dynamic, multicomponent local concentrations of biomolecules that are bound together by transient weak forces, makes them uniquely challenging to study with traditional biochemical, cell biology, and genomic tools [Citation96]. Their presence in the cell can be lost easily, as denaturing protein methods or chemical fixation necessary for traditional genomics technologies can disrupt or dissolve condensates, and they are often sensitive to temperature and experimental timing [Citation97]. As the field evolves, new experimental standards and practices are needed to rigorously declare whether a process is mediated by phase-separation or condensation, determine the biological relevance of a condensate, and reconcile in vitro with in vivo data [Citation6,Citation98]. The condensate-like nature of TFs in the nuclear space could also be influenced by common treatments for cellular imaging and molecular biology, such as the use of formaldehyde fixation, that can easily cause spontaneous and aberrant condensates, particularly of TFs [Citation97]. The aliphatic alcohol 1,6-hexanediol is a common tool for disrupting condensates and is frequently used with a control of 2,5-hexanediol, but both of these compounds contribute to global immobilization and condensation of chromatin at low concentrations and induce a global decrease in transcription, which can introduce systematic artifacts to the study of chromatin-related condensates [Citation99,Citation100]. These effects demonstrate that a more refined toolbox of chemical reagents to study condensate biology is urgently needed.

Current techniques to study TCs in cells usually involve labeling proteins and nucleic acids with fluorescent or enzymatic tags that may influence their normal functions, and often must be overexpressed. The CRISPR-mediated tagging of proteins expressed at endogenous levels is preferable and should reduce exogenous condensate or tag-mediated enzymatic activity, as long as the CRISPR tagging does not influence the protein’s function or condensate-forming ability. Additionally, orthogonal label-free strategies to assess nuclear architecture such as cryo-preservation, as used in the GAM technique [Citation101], or pooled approaches to detecting multimolecular interactions in the nucleus, such as SPRITE [Citation102], can bring unique insights to TC biology and identify multi-way contacts with both DNA and RNA [Citation103,Citation104]. There is also a wide variation between research groups and industry in how to reliably assess what constitutes a condensate, their number and cellular location, and the effects that any intervention may have on them. Field-wide transparency and reproducibility for imaging workflows associated with condensate counting [Citation105], localization dynamics [Citation106], and association with other protein factors should also be a goal for the community. Many commonly used techniques for identifying condensate-like behavior, such as FRAP, are not purpose-built for their desired task. New variations of these techniques, such as MOCHA-FRAP, can differentiate LLPS-driven condensation from other forms of transient binding, and provide a framework for how to tailor specific techniques to be more appropriate for studying condensate biology [Citation107].

Many of the advances in understanding TC behavior have come from high-resolution or single-molecule microscopy studies of RNAPII in various states [Citation108,Citation109]. As imaging technology, particular single-molecule and super-resolution techniques, continue to advance, we will be able to visualize, count, and track the individual components of condensates and systematically determine their interrelations. Testing the biophysical properties of TCs is possible with optical trapping techniques in vitro [Citation110], and new techniques such as CRISPR-targeted magnetic DNA-binding proteins can extend these methods into living cells [Citation111]. New developments in targeted proximity-labeling-based proteomic techniques can also reveal the specific protein composition of condensates with increasing specificity and sensitivity [Citation112,Citation113]. In combination with advances in sequencing techniques and structural biology, these technologies should provide the level of temporal and spatial resolution necessary to observe condensates in their natural state and increase the level of confidence in in vivo results.

The potential for targeting condensates to treat disease is substantial, particularly in some of the most intractable conditions including cancer, neurodegeneration, and infectious diseases [Citation114]. However, most evidence on the therapeutic potential of condensate biology has been generated in cell culture models, with a few studies proceeding into murine cancer models [Citation89]. The translational gap between current understanding and future therapeutic uses may be bridged by advances in 3D organoid/spheroid biology, which recapitulate organ formation with the benefits of controlled conditions and manipulations possible in cell culture [Citation115,Citation116]. There is also increasing potential of truly in vivo animal methods, including intra-vital imaging, to assess condensate formation and response to therapies in the biologically relevant organs within whole living organisms [Citation117]. Combining biological models of increasing complexity and biological relevance with our growing understanding of the fundamental mechanisms of TC formation and pharmacological vulnerability will promote a new era in targeted therapeutics and drive significant improvements in our ability to treat disease where it starts, at the level of transcriptional regulation.

Conclusions

This review focuses on the wide range and scale of biomolecular condensates involved in transcription – some of which fall within physically definable phase-separation phenomena, and many of which do not. The forces involved in orchestrating the entire lifecycle of transcription, from the pioneering recognition of a single TF to a single gene’s enhancer, to large-scale transcriptional hotspots, are often driven by specific protein–protein binding, but are also exposed to a nucleoplasmic environment that encourages spatial separation of biomolecular activity through condensate-based mechanisms.

There are several general principles that can be drawn from the current state of research into TCs. First, TCs are a fundamental property of transcription, driven by the chromatinized nuclear environment, the topology of interchromatin space, and the properties of specific and nonspecific protein–protein and protein–nucleic acid interactions. The combined actions of transcription factor activating domains, chromatin adaptors, transcriptional machinery, and their combined effects on RNA production all contribute to TC formation. Second, there is a ‘sweet spot’ in TC formation, below which factors will not assemble, and above which the excess transcriptional product and condensate hardening decreases gene expression. This is likely on the order of dozens of molecules, although the size of the condensate is likely related to the stability and volume of transcription at a specific locus. Third, there is no need to invoke classical phase separation to describe TCs, other models such as PSCP and microphase emulsions are sufficient to describe TCs. Fourth, dissolving, inducing, or stabilizing pathogenic TCs, and using them to concentrate drugs where the drug target is performing its function, can be promising strategies for health interventions, which will improve as they are tested in more biologically relevant disease models. Last, the field needs new techniques, experimental approaches, and biological models to fully exploit the manipulation of TCs for improving human health.

As the field progresses and new technologies are adopted, our theoretical and molecular understanding of TCs is increasing rapidly. The next step to apply these discoveries for the benefit of human health is to find new ways to specifically and reversibly manipulate TC formation in targeted tissue types to achieve specific gene expression outcomes. This may come through small molecules, targeted biomolecules, or more general changes to the cellular environment. This is already an area of great interest for companies and researchers, and we hope to see the rapid application of condensate-informed therapies entering clinics in the near future.

Acknowledgments

The authors have no conflicts to declare. The authors apologize to all researchers whose relevant work was not able to be included. Figures were created with BioRender.com. This work is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM142837 (D.C.) and by a National Cancer Institute training grant T32CA009110 (J.D.).

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The work was supported by the National Cancer Institute [T32 CA 009110] and National Institute of General Medical Sciences [R35GM142837].

References