1,497
Views
0
CrossRef citations to date
0
Altmetric
Tuberculosis(TB)-what is new

Genomic analysis of lineage-specific transmission of multidrug resistance tuberculosis in China

, , ORCID Icon, , , , , , & show all
Article: 2294858 | Received 20 Jun 2023, Accepted 11 Dec 2023, Published online: 13 Feb 2024

ABSTRACT

Objectives

We investigated the genetic diversities and lineage-specific transmission dynamics of multidrug-resistant tuberculosis (MDR-TB), with the goal of determining the potential factors driving the MDR epidemics in China.

Methods

We curated a large nationwide Mycobacterium tuberculosis (M. tuberculosis) whole genome sequence data set, including 1313 MDR strains. We reconstructed the phylogeny and mapped the transmission networks of MDR-TB across China using Bayesian inference. To identify drug-resistance variants linked to enhanced transmissibility, we employed ordinary least-squares (OLS) regression analysis.

Result

The majority of MDR-TB strains in China belong to lineage 2.2.1. Transmission chain analysis has indicated that the repeated and frequent transmission of L2.2.1 plays a central role in the establishment of MDR epidemic in China, but no occurrence of a large predominant MDR outbreak was detected. Using OLS regression, the most common single nucleotide polymorphisms (SNPs) associated with resistance to isoniazid (katG_p.Ser315Thr and katG_p.Ser315Asn) and rifampicin (rpoB_p.Ser450Leu, rpoB_p.His445Tyr, rpoB_p.His445Arg, rpoB_p.His445Asp, and rpoB_p.His445Asn) were more likely to be found in L2 clustered strains. Several putative compensatory mutations in rpoA, rpoC, and katG were significantly associated with clustering. The eastern, central, and southern regions of China had a high level of connectivity for the migration of L2 MDR strains throughout the country. The skyline plot showed distinct population size expansion dynamics for MDR-TB lineages in China.

Conclusion

MDR-TB epidemic in China is predominantly driven by the spread of highly transmissible Beijing strains. A range of drug-resistance mutations of L2 MDR-TB strains displayed minimal fitness costs and may facilitate their transmission.

Introduction

Multidrug-resistant tuberculosis (MDR-TB) remains a significant health and economic burden to global TB control. There were approximately 558,000 MDR-TB globally in 2017. Drug resistance is particularly acute in China, and the MDR-TB burden ranks second worldwide. MDR-TB occurred in 7.1% of new TB cases in China compared to 4.1% worldwide [Citation1]. Although inadequate therapy may lead to acquired drug resistance which may be the early origin of MDR-TB, it is currently recognized that recent transmission is an important driving force for drug resistance TB [Citation2]. Previous epidemiologic study in China found the relative transmission rate for MDR-TB is higher compared to drug-susceptible TB [Citation3]. However, to what extent is the MDR-TB epidemic in China is driven by acquisition of drug resistance during anti-TB treatment or by direct transmission of MDR strains has not been well studied.

Mycobacterium tuberculosis (M. tuberculosis) has been classified into seven well-differentiated lineages, which have distinct geographic distribution, host specificity, drug resistance, and transmissibility [Citation4]. For M. tuberculosis strains, drug-resistance strains were initially believed to be inherently weaker, less fit and with lower transmissibility, than drug-susceptible strains [Citation5]. Whereas, some MDR-TB strains were shown to be highly transmittable and caused large outbreaks. There is still limited data regarding the M. tuberculosis lineage variation in drug resistance and transmissibility in China. Previous research on MDR-TB in China mainly focused on the epidemiology [Citation3,Citation6], drug-resistance mutations [Citation7], and evolution of MDR-TB [Citation8,Citation9]. The analysis of lineage-specific transmission dynamics and genetic diversities of the MDR M. tuberculosis in this high-burden country could provide insights into the key drivers contributing to the MDR-TB epidemic prevalence and persistent in China.

In recent years, improved whole genome sequencing (WGS) technologies, and its application to M. tuberculosis isolates, provided novel insights to estimate the acquisition of drug resistance and to construct the phylogeny. WGS is also a superior tool to explore genomic clustering and timing of transmission chains of M. tuberculosis isolates [Citation10,Citation11]. Here, relying on nationwide whole genome dataset of 3204 TB isolates in China, which includes thousands of novel genomes, we set out to reconstruct the evolutionary history of MDR-TB in China, to explore the lineage-specific nationwide spread dynamics and migration pattern of MDR-TB, and to identify the drug-resistance mutations contributing to increased transmissibility.

Method

Study population and data collection

The M. tuberculosis strains analysed in the present study include two sampled sets. (1) A countrywide collection of publicly available M. tuberculosis isolates. The M. tuberculosis were genomes annotated with a minimum of metadata (sampling year and province of origin), or this information could be obtained from the published articles were included. Isolates were further filtered according to the procedures described below, resulting in a data set of 1870 genomes from 21 provinces, 4 municipalities, and 5 autonomous regions in previously published 9 articles [Citation6,Citation9,Citation12–18]. These areas covered 30 out of the 34 provincial regions of China. Illumina M. tuberculosis genome sequences were downloaded from the Sequence Read Archive repository. (2) Newly sequenced whole genome dataset of 1550 M. tuberculosis isolates in Shandong province. During 2013–2017, 1565 culture confirmed TB cases with drug-susceptibility testing (DST) results from Shandong Public Health Clinical Center (SPHCC) and Weifang Respiratory Disease Hospital (WRDH) notified to the TB Surveillance System in Shandong province were recruited to the present study. DST was conducted for 4 first-line anti-TB drugs (isoniazid, rifampicin, ethambutol, streptomycin) using an MGIT 960 system by the reference laboratory in SPHCC. Shandong is a coastal province in the east of China region, which consists of 17 municipalities and 140 counties (districts) with about 100 million inhabitants. SPHCC is one of the province-level and WRDH is one of the 13 municipal-level health departments responsible for the surveillance of TB in Shandong province. Patients under 18 years of age, HIV infected, or with a prior history of TB antibiotic therapy were excluded. Patients’ demographic and clinical characteristics were collected from the TB Surveillance System and the hospital-based medical record system. Genomic DNA was successfully extracted from the 1550 M. tuberculosis isolates. The study protocol was approved by the Center for Ethics in Human Research, Shandong Provincial Hospital Affiliated to Shandong First Medical University (No.2017-337). The H37Rv reference genome sequence was used for all reference-driven analyses. As a result, 3204 M. tuberculosis genomes were included in the analysis. The full list of isolates and provinces of isolation is provided in Supplementary Table 1.

Whole-genome sequencing and SNP Identification

The genomes of the Shandong 1550 M. tuberculosis isolates were obtained by sequencing using an Illumina HiSeq 4000 system, and the sequence data were deposited in NCBI BioProject. Low-quality raw reads (sequencing base ≤20 or sequencing fragments length ≤20) were excluded from the paired-end sequencing. We aligned the reads of the 1550 strains, along with the downloaded countrywide 1870 strains, to the H37Rv reference genome (NC_000962.3) using BWA-MEM (version 0.7.17-r1188). To remove clipped alignments and duplicated reads, we utilized samclip (v 0.4.0) and samtools markdup (v1.15). Samples with coverage less than 98% and depth less than 20 were excluded. Strains with mixed lineages were considered as mixed infections and were also excluded from the analysis. A total of 3204 isolates passed sequence quality criteria including 1775 strains from countrywide collection and 1449 strains from the Shandong dataset. The variant calling is done by Freebayes (v1.3.2) with an included filter parameter “FMT/GT = "1/1” && QUAL> = 100 && FMT/DP> = 10 && (FMT/AO)/(FMT/DP)> = 0.” The SNPs in previously defined repetitive regions were excluded, including PPE, PE-PGRS genes, and mobile elements or repeat regions and repeat bases generated by TRF (v4.09) and Repeatmask (v4.1.2-p1) [Citation19]. The filtered vcf files were annotated by snpEff (v4.3t) to get the final samples SNPs [Citation20].

In silico lineage and genotypic drug-resistance prediction

We used the web-based tool TBProfiler (v4.3.0) to analyse 3204 M. tuberculosis WGS data to assign lineages and predict drug resistance [Citation21]. Drug resistance is predicted in TBProfiler using the curated drug-resistance database called tbdb. This database was tested using over 17,000 samples with genotypic and phenotypic data. The resistance polymorphisms (SNPs and indels) identified by TBProfiler were further reassessed based on the WHO-endorsed catalogue of molecular targets for M. tuberculosis drug susceptibility testing for resistance interpretation [Citation22]. Genotypic MDR was defined by identification of resistance mutations to both isoniazid and rifampicin.

Phylogenetic analysis

We use snippy-core (v4.6.0) to get alignments of SNPs for all MDR isolates [Citation23]. The phylogenetic tree analysis was performed exclusively using SNP data, without incorporating indel data. The bcftools consensus tool was employed to generate sample sequences based on the SNP data. Maximum-likelihood (ML) phylogenetic trees were built and dated by the IQ-TREE (v1.6.12) model “JC + I + G4” with 1000 ultrafast bootstrap replicates and treetime (v0.9.0) [Citation24,Citation25]. The trees were constructed using the highest likelihood model selected by automatic model selection in IQ-TREE (v1.6.12), which utilized the JC model of nucleotide substitution and invariable site plus discrete Gamma model of rate heterogeneity to analyse the genome samples with only substitution variants replaced by reference sequence.

Sampling dates were used to construct a temporal phylogeny using TreeTime (v0.9.0) with a relaxed clock rate of 4.684e–07 according to previous studies [Citation24,Citation25], and tip-randomization was performed to confirm the presence of a strong temporal signal. Bayesian evolutionary analyses were conducted to identify the best substitution, clock, and demographic models, with marginal likelihood estimates used for model selection. The analysis favoured a GTR model with a relaxed clock and a Skyride demographic model.

Dating of drug resistance-conferring mutations

We used marginal and joint maximum likelihood-based phylogenetic inference in treetime to reconstruct dated phylogenetic trees and infer the time to the most recent common ancestor for isolates carrying drug-resistance mutations of interest. We used a strict molecular clock with an informative prior distribution on the mutation rate and the GTR nucleotide-substitution model with among-site rate heterogeneity defined by a Γ distribution. Nodes corresponding to the most recent common ancestor for relevant drug-resistance mutations were identified via maximum-likelihood ancestral state reconstruction.

Transmission cluster analysis

Initially, we defined broad putative clusters with a maximum genetic distance less than 15 SNPs. The threshold to determine potential transmission links between patients has previously typically been fixed between 5 and 15 SNP differences [Citation26–28]. We chose broad initial SNP cut-off to define the cluster for further analysis of transmission cluster to avoid missing cases and incorporating recent and old transmission events. Next, we use R package phybreak [Citation29] to get an intensive understanding of how stains are transmitted. Phybreak uses Bayesian inference with MCMC for devise novel proposal steps to effectively explore the posterior distribution while considering all unobserved processes simultaneously. This enables efficient sampling of transmission trees from the posterior distribution and robust estimation of the consensus transmission tree. The analysis parameter settings were set as sample.mean and gen.mean set to 14, nsample set to 5000. To analyse the spread of MDR-TB strains, we divided the China’s map into seven geographic regions. The cross-regional cluster was defined as 2 or more M. tuberculosis strains in a transmission cluster where the isolates were sampled from at least 2 geographic regions in China.

Drug resistance mutations and clustering of MDR strains

We use the ordinary least square (OLS) model from statsmodel [Citation30] to analyse the linear regression and identify the drug-resistance mutations contributing to an increased clustering rate. All the first-line and second-line drug-resistance mutations (including SNPs, indels, or large deletions) detected by TBProfiler in more than two isolates were included in the association analysis. We use the OLS function in statsmodels to fit a linear regression model, with the clustering rate as the dependent variable and all the mutation sites as the independent variables. The model coefficients were interpreted for determining the relationship between each drug-resistance mutation and clustering rate of MDR strains, and assess the statistical significance using a threshold of a p-value <0.05.

Treetime phylogeographic analysis

To understand the countrywide transmission patterns of L2 MDR-TB strains, the migration patterns between seven administrative regions in China and the migration rate of L2 strains were analysed by “treetime migration” analysis [Citation31]. By treating administrative regions as discrete characters, these transitions are plausibly modelled as a time reversible process with comparable sampling probabilities of the different states. Such models can infer migration dynamics between regions. The symmetrized migration rate indicates the weight of bidirectional migration. The actual migration rate indicates the weight of unidirectional migration.

Compensatory mutations and transmission

To identify genetic targets potentially facilitated the transmission for MDR strain, we examined compensatory mutations for rifampicin and isoniazid resistance respectively. We screened all 3204 clinical M. tuberculosis strains for nonsynonymous changes in rpoA and rpoC for rifampicin compensatory mutations and in ahpC, fabG1, inhA, and katG for isoniazid compensatory mutations [Citation18,Citation32]. Putative compensatory mutations comply with the following standards (i) for rifampicin-resistance refers to nonsynonymous mutations occur frequently in rifampicin-resistance isolates but do not to occur in rifampicin-susceptible isolates; (ii) for isoniazid-resistant refers to nonsynonymous mutations occur frequently in isoniazid-resistant isolates but do not occur in isoniazid-susceptible isolates; and (iii) two or more independent mutations occurred at least two times in different branches in the phylogeny tree. Genome-wide association study analyses were performed using PLINK (v1.9) [Citation33] to assess the statistically significant of the association between compensatory mutations with transmission clusters.

Coalescent-based skyline plot

We calculated the changes in the effective population size dynamics of MDR-TB isolates since 1974 (when rifampicin came into the Chinese market). These calculations were performed using the treetime skyline methods [Citation31] with the n_points parameter set to 50. In order to estimate population sizes or coalescent time scales, we maximized the likelihood contribution of the coalescent likelihood for a fixed tree. This analysis allowed us to determine the changes in the dynamics of the effective population size. Specifically, the Skyline coalescent model was applied separately for the L2 and L4 MDR strains.

Result

MDR population structure

According to TBProfiler drug-resistance prediction, we identified 1313 MDR-TB isolates from the 3204 M. tuberculosis isolates. Of which, 172 MDR strains were from the Shandong dataset (172/1449, 11.9%) and 1141 MDR strains from the countrywide collection (1141/1775, 64.3%) (). Through DST conducted in the Shandong data set, 193 MDR isolates were identified as phenotypically MDR. The MDR-TB proportion in this data set does not represent the national incidence. Of the nine published articles, four studies only included MDR isolates according to DST diagnosed for genome sequencing [Citation6,Citation9,Citation12–18], so MDR proportion is higher. Among MDR strains, the proportion of fluoroquinolone resistance was 32.7%, only one isolate with genotype bedaquiline resistance was detected, and none MDR isolate with linezolid resistance. According to the 2021 WHO definition for pre-XDR and XDR [Citation1], 430 strains (32.7%) were classified as pre-XDR, and only one strain was classified as XDR-TB.

Figure 1. Study flowchart. The process of identification and exclusion of genomic data included in the study is shown. M tuberculosis, Mycobacterium tuberculosis; TB, tuberculosis; MDR, multidrug resistance.

Figure 1. Study flowchart. The process of identification and exclusion of genomic data included in the study is shown. M tuberculosis, Mycobacterium tuberculosis; TB, tuberculosis; MDR, multidrug resistance.

Mapping reads for all the 1313 MDR-TB isolates against the reference sequence H37Rv identified a total of 229551 SNPs in nonrepetitive regions. These variable sites were used to reconstruct a maximum-likelihood phylogeny (). A total of 1142 MDR-TB isolates (87.4%) belong to L2, 168 isolates (12.1%) belong to L4 and only 3 isolates (0.2%) belong to L3 were detected. The majority of L2 was assigned to sub-L2.2.1 (n = 1061, 80.7%), which accounted for 94.4% of all L2 isolates; and followed by sub-L2.2.2 (n = 73, 5.6%) and sub-2.1 (n = 8, 0.6%). L4 including sub-L4.1 (n = 1, 0.1%), 4.2 (n = 23, 1.8%), 4.4 (n = 64, 4.9%), and 4.5 (n = 80, 6.1%). L2 MDR isolates were interspersed throughout all the 30 provincial regions in China, and L4 were drawn from 19 of the 30 provincial regions.

Figure 2. The Phylogeny tree and resistance profile of 1313 MDR M. tuberculosis strains. Blue and purple branches indicated L2 and L4 strains, respectively. The inner vertical bar indicated sampled geographic regions, respectively. The outermost coloured dots showed the resistance to first-line and second-line anti-TB drugs.

Figure 2. The Phylogeny tree and resistance profile of 1313 MDR M. tuberculosis strains. Blue and purple branches indicated L2 and L4 strains, respectively. The inner vertical bar indicated sampled geographic regions, respectively. The outermost coloured dots showed the resistance to first-line and second-line anti-TB drugs.

Time for drugs resistance mutations acquisition

Using treetime to reconstruct dated phylogenetic trees, we inferred the time for drug-resistance mutations acquisition. Overall, the first-line drug-resistance mutations emerged earlier than second-line. The earliest resistance-conferring mutations to emerge in the phylogeny tree were the acquisition of isoniazid (katG_p.Ser315Thr) and streptomycin (rpsL_p.Lys43Arg) mutations around 1973 and 1975, respectively. MDR level resistance via additional mutations in rpoB450 (p.Ser450Leu) was acquired in 1976. The most frequently acquired isoniazid resistance mutations fabG1_c.−15C > T were acquired around 1985. The most frequent acquired ethambutol resistance mutations (embB_p.Met306Val) emerged around 1980. Quinolones (gyrA_p.Asp94Gly) and amikacin/kanamycin resistance mutations (rrs_n.1401A > G) were acquired around 1997 and 2002, respectively (Supplementary Figure 1).

Transmission networks

A total of 705 MDR strains (53.7%, 705/1313) were grouped into 249 transmission clusters. (a and b) shows two examples of inferred transmission cluster. Transmission clusters sizes ranged from 2 to 20 isolates (median 3 cases, IQR 2–5). The majority of the clusters (191 clusters) were separated by <5 SNPS and 40 clusters were separated by 5–10 SNPS. L2 MDR strains were associated with larger clusters sizes (median 3 isolates, IQR 2–5) and with the lower genetic diversity within clusters (2.27 SNPs; IQR 0–5) compared with L4 (median 2 isolates, IQR 2–3; 4.86 SNPs within clusters; IQR 1–7) (Wilcoxon all P < 0.001). The proportion of clustered MDR strains also varied between lineages, with 58.3% of MDR strains in L2 were clustered compared to 36.9% of MDR strains in L4 (P < 0.001).

Figure 3. Examples of inferred transmission clusters according to Phybreak, a change of colour indicating a transmission event.

Figure 3. Examples of inferred transmission clusters according to Phybreak, a change of colour indicating a transmission event.

We further analysed the genotype mutations associated with isoniazid and rifampicin resistance within the transmission clusters. Nearly all clusters (244/249 clusters with 691 strains) had the same rifampicin and isoniazid resistance mutations within one cluster, which might represent direct transmission of MDR strains. Among the 244 clusters, 4 clusters including 17 MDR strains had developed additional rpoB mutations besides existing rifampicin-resistant mutations, 4 clusters with 18 MDR strains having additional developed isoniazid mutations. Only 5 clusters with 14 MDR strains had distinct isoniazid and/or rifampicin-resistance mutations within one cluster, which suggests recently developed resistance mutations from susceptible strains after transmission. Overall, for all the clustered MDR isolates, 98.0% (691/705) probably reflect direct transmission of MDR strains, and the estimated the overall burden of MDR-TB due to transmission in China was about 52.6%.

Transmissibility of lineage-specific drug-resistance mutations

We conducted OLS regression stratification by lineage to identify lineage-specific drug-resistance SNP associated with clustering. For L2, a large number of lineage-specific mutations (a total of 43 variants) associated with clustering in MDR strains were identified ( and Supplementary Table 2). The most frequent SNPs conferring resistance to rifampicin, isoniazid, streptomycin, and ethambutol were all significantly more likely to be found in transmission clusters of L2, implying enhanced transmissibility. Mutations in katG (encoding p. Ser315Asn and p. Ser315Thr) for isoniazid resistance, the most frequent mutations (isoniazid, 68.8%) across all the resistance-associated loci identified, were significantly more frequent in clustered MDR strains. The rifampicin-resistance rpoB 450 (encoding p. Ser450Leu) (rifampicin 57.0%) and rpoB 445 (encoding p. His445Tyr, p. His445Arg, p. His445Asp, p.His445Asn) (rifampicin, 11.65%), which were two of the most frequent MDR-TB-associated rifampicin mutations, all showed strong association with clustering. A total of 16 nonsynonymous SNPs in rpoB reached statistical significance, which was the highest ranked association with clustering (). For pyrazinamide, many low-frequency mutations in pncA locus confer resistance, and 11 variants were significantly more likely to be found in clustered strains (Supplementary Table 2). In contrast, for L4, only five variants reach statistical significance for association with transmission cluster, including those encoding (rpoB_ p. His445Tyr, ahpC_c.-52C > T, embA_c.-12C > T, embB_p.Gln497Lys, embB_p.Asp328Tyr).

Table 1. Significant associations between isoniazid and rifampicin-resistance variants and transmission cluster identified by OLS regression.

Transmission phylogeographic inference

The spatial distribution of MDR clusters in the seven geographic regions was shown in the map (Supplementary Figure 2). The percentage of clustered MDR isolates was highest in central China (69.5%), the next is Eastern (62.0%), and Southern (49.9%) China. Eastern, Central, and Southern China were also concentrated with high population density and economy.

The 249 MDR transmission clusters include 187 regional and 62 cross-regional clusters. The pairwise geographic distances within regional clusters range from 0 to 2581 km (1290 km on average) for cross-regional cluster. The longest pairwise geo-distance was 2581 km: one strain was isolated in Northeast China (Jilin province) and the other one was isolated in Southern China (Shenzhen).

The 62 cross-regional clusters included a total of 233 MDR strains and ranged in size from 2 to 20 strains, with a geographic distribution from 2 to 3 regions and 2 to 4 provinces (). All seven geographic regions were involved in cross-regional MDR clusters. Only 6 of the 30 provinces, Gansu, Ningxia, Inner Mongolia, Guangxi, Hebei, nor Yunnan, were not involved in any cross-regional MDR cluster. Guangdong (Southern China), Shandong (Eastern China), and Zhejiang province (Central China), and two municipalities including Beijing (Northern China) and Shanghai (Eastern China) were hot spots of cross-regional clustered MDR-TB strains, which were all areas of high population density and high migrant population. The proportion of L2 strains in cross-regional clusters was significantly higher than regional clusters (97.0% vs 88.9%, P < 0.001). Accordingly, the proportion of L4 MDR strains in cross-regional clusters was significantly lower than regional clusters (3.0% vs 11.1%, P < 0.001). Only eight isolates in the cross-regional clusters belong to L4. The main characteristics related to cluster composition, lineage data, and regions involved in the cross-regional clusters are reported in , while the corresponding minimum spanning trees are shown in .

Figure 4. Geographic distribution of eight cross-regional transmission clusters of MDR-TB in China. Each of the eight cross-regional transmission clusters is associated with particular geographic regions, as shown in a–h.

Figure 4. Geographic distribution of eight cross-regional transmission clusters of MDR-TB in China. Each of the eight cross-regional transmission clusters is associated with particular geographic regions, as shown in a–h.

Table 2. Cross-regional MDR transmission clusters identified according to phybreak.

Compensatory mutation and transmission clusters

We identified 77 potential compensatory mutations in rpoA and rpoC, and the majority of these mutations (51/77) have not been previously reported (Supplementary Tables 3a and 3b). Thirty variants are potential compensatory mutations for isoniazid, including 2 variants in ahpC, 21 variants in inhA, and 7 variants in katG. We further performed plink on individual variants to identify compensatory mutations that are significantly associated with clustering. In total, plink analysis detected three variants in rpoA, eight variants in rpoC, and two variants in katG that were significantly associated with clustering (P < 0.05) ().

Table 3. Significant associations between putative compensatory mutations and transmission cluster identified by plink.

Countrywide L2 migration analysis

As L2 MDR-TB strains found to be geographically widely dispersed, we performed a treetime “mugration” analyses to formally assess the phylogeographic migration of L2 between discrete geographic regions in China. The symmetrized migration rate indicates the weight of bidirectional migration. Overall, Eastern, Southern, and Northern China were associated with the highest levels of internal TB migration, followed by Central China. The southwest and northeast exhibited the lowest levels of internal TB migration. The migration phylogeographic analysis supports the presence of five frequent migration routes with high symmetrized migration rate in the transmission processes of L2 MDR-TB in China, including Eastern to Southern China (symmetrized migration rates = 27.02), Southern to Northern (6.63), Northern to Eastern (4.47), Central to Southern (4.15), and Eastern to Northeast (2.29). The actual migration rate indicates the weight of unidirectional migration. The higher actual rates of migration routes were seen from Eastern to Southern China (actual migration rates = 2.65), Southern to Eastern (1.27), and Northern to Southern (0.83) () (Supplementary Figure 3). As a result, as shown in , Southern, Eastern, and Northern were highly connected regions for L2 MDR-TB countrywide migration in China.

Figure 5. Heatmap summarizing the actual L2 migration rate between geographic regions as inferred by treetime “mugration” analysis. The numbers in the blocks represent the calculated migration rate between regions (from regions in the column to regions in the row).

Figure 5. Heatmap summarizing the actual L2 migration rate between geographic regions as inferred by treetime “mugration” analysis. The numbers in the blocks represent the calculated migration rate between regions (from regions in the column to regions in the row).

Effective population size according to lineages

We used the Coalescent-Based skyline analysis in treetime toolsets to compare the estimated changes in the effective population size for MDR strains belong to different lineages. For L2 MDR-TB, we detected a sharp population growth phase from 1974 to 2000 with a significant population size rapidly expansion compared to L4. Whereas for L4 MDR-TB, a temporary downward trend was observed between 1974 and 1985, and then a reversal of this downward trend was noticed and, followed by a relatively moderate population size increase until 2000 (Supplementary Figure 4). Subsequently, a decrease in population size was observed on the skyline plot for L2 and L4 MDR-TB after 2000, which coincided with the Directly Observed Treatment, Short Course (DOTS) to be included in the National TB Control Program in China.

Discussion

We report on the WGS and phylogenomic analysis of the largest collection of MDR M. tuberculosis sequenced to date in China. Through analysis of these genomes, we determined the order of genotypic resistance acquisition to eight drugs. The development of isoniazid and streptomycin resistance was found to arise firstly, followed by rifamycin, ethambutol, ethionamide, fluoroquinolones, and second-line injectables. The vast majority of MDR-TB strains in China belong to the Beijing lineage (the predominant sublineage was 2.2.1), and the next is L4 (the major sublineages were L4.2, L4.4, L4.5). Beijing MDR strains were found to be more geographically dispersed compared to L4. We reconstructed MDR-TB transmission networks throughout China using a Bayesian inference and showed Beijing lineage was significantly associated with MDR isolates clustering, large size clusters formation and cross-regional transmission. Transmission chain analysis indicated a central role of repeated, frequently transmission of highly prevalent L2 strains in the establishment of MDR epidemic in China, but no occurrence of a large predominant MDR outbreak was detected. OLS regression identified a series of drug-resistance mutations in L2 that were significantly more likely to be found in clustered strains. Furthermore, we revealed Eastern, Central, and Southern China were hot spots of MDR-TB clustering, as well as highly connected regions for L2 MDR migration in China. Lastly, we depicted the distinct lineage's effective population size expansion dynamic of MDR-TB in China.

Previous literature supports the idea that once drug resistance of M tuberculosis develops through resistance acquisition in the host, clonal spread drives the transmission of MDR-TB in multiple settings of the world [Citation2,Citation34]. Modelling data also suggested that transmission of MDR-TB will fuel global increases of drug-resistant TB in the coming decades [Citation35]. However, it is still not well established whether MDR-TB strains are associated with decreased or increased transmissibility compared with drug-susceptible strains [Citation34,Citation36]. Molecular epidemiologic studies in different geographic setting reported variable results, which range from 25% to 80% of MDR-TB strains being genetically clustered [Citation4,Citation37]. Indeed, lineage variability among strains might influence the results of the different studies [Citation38]. Our nationwide phylogenomic analysis in China showed lineage-specific disparities in transmissibility of MDR-TB strains. We found more than half of the L2 MDR strains are genetically clustered (60.1%), which is significantly higher compared to L4 MDR strains (36.9%), and even higher than susceptible M tuberculosis strains (40%) in our previous study. In particular, Beijing sublineage 2.2.1 was associated with a wider geographic dispersal, larger genomic clusters, and frequent cross-regional transmission. Notably, 97.0% of MDR strains in cross-regional clusters belong to the Beijing lineage.

As the fitness of the drug-resistant TB strains is a key parameter determining the resistance epidemic, frequent geographic spread of L2 MDR strains suggests high fitness. The observed enhanced capacity for L2 MDR-TB strains to spread supports that even the L2 strains acquire resistance, the intrinsic of compensatory mechanism might offsets the fitness cost caused by resistance acquisition. According to OLS regression, we provide evidence that Beijing strains harbouring certain isoniazid and rifampicin-resistance genotypes have a negligible fitness cost. Multiple resistance-conferring variants in L2 were found to be strongly associated with clustering that are suggestive of enhanced transmissibility. These variants warrant characterization studies to fully elucidate their associated compensatory mutations. As we observed the most frequent SNPs conferring resistance to rifampicin and isoniazid were significantly associated with clustering of L2, the majority of the Beijing MDR strains might associate with innate transmission advantages. No particular set of drug-resistance mutations confers MDR that leads to large outbreaks was observed. In contrast, the limited geographic transfer and drug-resistance mutations with transmission advantage of L4 MDR strains suggest lower compensatory mechanism. The associations identified may shed light on further understanding of the genetic mechanisms underlying the spread of different M. tuberculosis lineage strains.

The treetime “mugration” analysis of L2 indicated that Southern, Eastern, and Northern might be the epicentres for the nationwide migration of MDR-TB isolates; this was supported by the high symmetric (bidirectional) and actual (unidirectional) migration rates. Eastern and Southern China were also associated with the high percentage of clustered MDR strains among the seven geographic regions. These three regions had high levels of socioeconomic development, and with a high migrant population density in China. In the past four decades, China’s cities and small towns – especially those in the Southern and Eastern regions an underwent unprecedented urbanization transformation with an average of more than 15 million migrants who moved from villages to cities annually. By 2000, it was estimated that the migrant population in China reached between 100 and 120 million [Citation39]. The dominant migration tendency for migrants in China is the bidirectional move, which is closely tied with their “homeward” focus or agricultural seasonality [Citation39]. Previous studies showed population migration could contribute to MDR-TB spread cross the city or the country’s boundaries, which also making MDR-TB surveillance more difficult [Citation40,Citation41]. The migrant population in China is restricted in access to the city’s public health and free services, and complicates efforts to control infectious diseases such as TB [Citation39]. It is believed that internal migration plays an important role in extending MDR-TB spread in China.

According to the population size skyline estimation, our study showed the two major lineages in China exhibited distinct effects population size changes. For L2 MDR, we observed a rapid population size expansion between 1974 and 2000, which might be caused by the inadequate tuberculosis control linked to a malfunctioning public health system during that period of socioeconomic transition. From 1978 to 2002, the government’s share of total health expenditure fell from 32% to 16% [Citation42]. Many patients who ran out of money discontinued treatment. Additionally, the rapid economic development was accompanied by increased internal immigration flows [Citation42]. Whereas for L4 MDR, a relatively fluctuating and moderate population size increase was observed. As a result, it is probable that the rapidly increasing epidemic of MDR-TB before 2000 in China was predominantly attribute to L2 MDR-TB strains expansion. The WHO drug-resistance surveillance reported that the rates of MDR-TB in previously untreated cases which were 5–10 times higher in China than the global median, which reflected the high transmissibility of MDR-TB in China [Citation1]. In 2000, the central government revitalized the TB control program and DOTS began to expand to all provinces [Citation42]. DOTS was shown to rapidly reduce the incidence and transmission of drug-resistant TB [Citation43]. We observed a gradual decrease in both the L2 and L4 MDR-TB population size since 2000.

The present study has several limitations: firstly, the WGS data in the present analysis were from nine published studies [Citation6,Citation9,Citation12–18] and newly sequenced strains in Shandong, they could not represent the entire population and cover the distribution of TB cases across the whole country. Secondly, the collected metadata of the national dataset were incomplete, especially regarding basic clinical and epidemiological information. The lack of published datasets with the relevant epidemiological data highlights the need to incorporate these variables in prospective TB epidemiological studies.

Conclusion

In conclusion, this large sample WGS data analysis indicated MDR-TB epidemic in China is to a large extent driven by the wide geographic spread of L2 MDR strains. Multiple resistance-conferring variants in L2 were found to be associated with enhanced transmissibility. The Beijing lineage MDR strains exhibited rapid population size expansion, and they are the main composite in the cross-regional transmission clusters. In the future, the observed patterns should be reassessed to examine whether the lineage-specific genomic diversities exhibit distinct transmission dynamics.

Supplemental material

supplement_figures

Download Zip (1,023.7 KB)

Supplement_tables

Download Zip (630.8 KB)

Acknowledgments

The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. Authors' contribution: YFL and XLK analysed the data and drafted the manuscript. WMS, YML, YYL, WWF, JYY, and CBY commented and revised the manuscript. HCL and YL conceptualized, designed the study, and acquired funding for the present study. All authors approved the publication of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The newly sequenced whole genome dataset of 1,449 M. tuberculosis strains was deposited in the NCBI Bio Project (https://www.ncbi.nlm.nih.gov/sra/), and 1755 other isolates were downloaded from the European Nucleotide Archive repository. Any additional data are available from the corresponding authors upon reasonable request.

Additional information

Funding

This work was supported by the Department of Science & Technology of Shandong Province (CN) [grant number 2007GG30002033], [grant number 2017GSF218052], Natural Science Foundation of Shandong Province (CN) [grant number ZR2020KH013], [grant number ZR2021MH006], and Jinan Science and Technology Bureau (CN) [grant number 201704100].

References

  • Global Tuberculosis Report 2021. Available from: https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2021
  • Kendall EA, Fofana MO, Dowdy DW. Burden of transmitted multidrug resistance in epidemics of tuberculosis: a transmission modelling analysis. Lancet Respir Med. 2015;3(12):963–972. doi:10.1016/S2213-2600(15)00458-0
  • Yang C, Shen X, Peng Y, et al. Transmission of Mycobacterium tuberculosis in China: a population-based molecular epidemiologic study. Clin Infect Dis. 2015;61(2):219–227. doi:10.1093/cid/civ255
  • Freschi L, Vargas R, Husain A, et al. Population structure, biogeography and transmissibility of Mycobacterium tuberculosis. Nat Commun. 2021;12(1):6099. doi:10.1038/s41467-021-26248-1
  • Knight GM, Colijn C, Shrestha S, et al. The distribution of fitness costs of resistance-conferring mutations is a key determinant for the future burden of drug-resistant tuberculosis: a model-based analysis. Clin Infect Dis. 2015;61(Suppl 3):S147–S154. doi:10.1093/cid/civ579
  • Huang H, Ding N, Yang T, et al. Cross-sectional whole-genome sequencing and epidemiological study of multidrug-resistant mycobacterium tuberculosis in China. Clin Infect Dis. 2019;69(3):405–413. doi:10.1093/cid/ciy883
  • Zhao Y, Xu S, Wang L, et al. National survey of drug-resistant tuberculosis in China. N Engl J Med. 2012;366(23):2161–2170. doi:10.1056/NEJMoa1108789
  • Zhou Y, Anthony R, Wang S, et al. The epidemic of multidrug resistant tuberculosis in China in historical and phylogenetic perspectives. J Infect. 2020;80(4):444–453. doi:10.1016/j.jinf.2019.11.022
  • Liu Q, Ma A, Wei L, et al. China’s tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis. Nat Ecol Evol. 2018;2(12):1982–1992. doi:10.1038/s41559-018-0680-6
  • Coll F, Phelan J, Hill-Cawthorne GA, et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat Genet. 2018;50(2):307–316. doi:10.1038/s41588-017-0029-0
  • Nebenzahl-Guimaraes H, Van Laarhoven A, Farhat MR, et al. Transmissible Mycobacterium tuberculosis strains share genetic markers and immune phenotypes. Am J Respir Crit Care Med. 2017;195(11):1519–1527. doi:10.1164/rccm.201605-1042OC
  • Chen XC, He GQ, Wang SY, et al. Evaluation of whole-genome sequence method to diagnose resistance of 13 anti-tuberculosis drugs and characterize resistance genes in clinical multi-drug resistance Mycobacterium tuberculosis isolates from China. Front Microbiol. 2019;10:1741. doi:10.3389/fmicb.2019.01741
  • Yang CG, Luo T, Shen X, et al. Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. Lancet Infect Dis. 2017;17(3):275–284. doi:10.1016/S1473-3099(16)30418-2
  • Zhang HT, Li DF, Zhao LL, et al. Genome sequencing of 161 Mycobacterium tuberculosis isolates from China identifies genes and intergenic regions associated with drug resistance.
  • Hicks ND, Yang J, Zhang XB, et al. Clinically prevalent mutations in Mycobacterium tuberculosis alter propionate metabolism and mediate multidrug tolerance. Nat Microbiol. 2018;3(9):1032–1042. doi:10.1038/s41564-018-0218-3
  • Yang CG, Lu LP, Warren JL, et al. Internal migration and transmission dynamics of tuberculosis in Shanghai, China: an epidemiological, spatial, genomic analysis. Lancet Infect Dis. 2018;18(7):788–795. doi:10.1016/S1473-3099(18)30218-4
  • Luo T, Comas I, Luo D, et al. Southern East Asian origin and coexpansion of Mycobacterium tuberculosis Beijing family with Han Chinese. Proc Natl Acad Sci USA. 2015;112(26):8136–8141. doi:10.1073/pnas.1424063112
  • Jiang Q, Liu QY, Ji LC, et al. Citywide transmission of multidrug-resistant tuberculosis under China's rapid urbanization: a retrospective population-based genomic spatial epidemiological study. Clin Infect Dis. 2019:71(1):142–151.
  • RepeatMasker Home Page: RepeatMasker Open-4.0. http://www.repeatmasker.org/.
  • Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92.
  • Phelan JE, O'Sullivan DM, Machado D, et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med. 2019;11(1):41. doi:10.1186/s13073-019-0650-x
  • Walker TM, Miotto P, Köser CU, et al. The 2021 WHO catalogue of mycobacterium tuberculosis complex mutations associated with drug resistance: a genotypic analysis. Lancet Microbe. 2022;3(4):e265–e273. doi:10.1016/S2666-5247(21)00301-3
  • Nguyen LT, Schmidt HA, Von Haeseler A, et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–274. doi:10.1093/molbev/msu300
  • GitHub - neherlab/treetime: Maximum likelihood inference of time stamped phylogenies and ancestral reconstruction. https://github.com/neherlab/treetime.
  • Zelner JL, Murray MB, Becerra MC, et al. Identifying hotspots of multidrug-resistant tuberculosis transmission using spatial and molecular genetic data. J Infect Dis. 2016;213(2):287–294. doi:10.1093/infdis/jiv387
  • Guerra-Assunção JA, Fine PEM, Crampin AC, et al. Large-Scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. Elife. 2015;2015:1–17.
  • Walker TM, Ip CLC, Harrell RH, et al. Whole genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13:137–146. doi:10.1016/S1473-3099(12)70277-3
  • Jajou R, de Neeling A, van Hunen R, et al. Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: a population-based study. PLoS One. 2018;13:1–11.
  • Klinkenberg D, Backer J. Xavier Didelot, Caroline Colijn, JaccoWallinga bioRxiv 069195.
  • Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with python). Proceedings of the 9th python in science conference, p. 92–96; 2010. doi:10.25080/Majora-92bf1922-011
  • Sagulenko P, Puller V, Neher RA. Treetime: maximum-likelihood phylodynamic analysis. Virus Evol. 2018;4(1). doi:10.1093/ve/vex042
  • Comas I, Borrell S, Roetzer A, et al. Whole-genome sequencing of rifampicin-resistant Mycobacterium tuberculosis strains identifies compensatory mutations in RNA polymerase genes. Nat Genet. 2011 Dec 18;44(1):106–110. doi:10.1038/ng.1038
  • Purcell S, Neale B, Todd-Brown K, et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007 Sep;81(3);559–575. doi:10.1086/519795.
  • Zhdanova S, Heysell SK, Ogarkov O, et al. Primary multidrug-resistant Mycobacterium tuberculosis in 2 regions, Eastern Siberia, Russian Federation. Emerg Infect Dis. 2013;19(10):1649–1652. doi:10.3201/eid1910.121108
  • Alame Emane AK, Guo X, Takiff HE, et al. Highly transmitted M. tuberculosis strains are more likely to evolve MDR/XDR and cause outbreaks, but what makes them highly transmitted? Tuberculosis. 2021;129:102092.
  • Merker M, Blin C, Mona S, et al. Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage. Nat Genet. 2015;47(3):242–249. doi:10.1038/ng.3195
  • Dantas NGT, Suffys PN, Carvalho W da S, et al. Genetic diversity and molecular epidemiology of multidrug-resistant Mycobacterium tuberculosis in Minas Gerais State, Brazil. BMC Infect Dis. 2015;15(1):306.
  • Mou J, Griffiths SM, Fong HF, et al. Defining migration and its health impact in China. Public Health. 2015;129(10):1326–1334. doi:10.1016/j.puhe.2014.01.010
  • Dhavan P, Dias HM, Creswell J, et al. An overview of tuberculosis and migration. Int J Tuberc Lung Dis. 2017;21(6):610–623. doi:10.5588/ijtld.16.0917
  • Jia ZW, Jia XW, Liu YX, et al. Spatial analysis of tuberculosis cases in migrants and permanent residents, Beijing, 2000-2006. Emerg Infect Dis. 2008;14(9):1413–1419.
  • Wang L, Liu J, Chin DP. Progress in tuberculosis control and the evolving public-health system in China. Lancet. 2007;369(9562):691–696. doi:10.1016/S0140-6736(07)60316-X
  • DeRiemer K, García-García L, Bobadilla-del-Valle M, et al. Does DOTS work in populations with drug-resistant tuberculosis? Lancet. 2005;365(9466):1239–1245. doi:10.1016/S0140-6736(05)74812-1
  • Chakaya J, Khan M, Ntoumi F, et al. Global tuberculosis report 2020 - reflections on the global TB burden, treatment and prevention efforts. Int J Infect Dis. 2021;113(Suppl 1):S7–S12. doi:10.1016/j.ijid.2021.02.107