150
Views
0
CrossRef citations to date
0
Altmetric
Microbiology (Medical)

Metagenomics approaches for studying the human microbiome

&
Article: 2350166 | Received 01 Mar 2023, Accepted 02 Mar 2024, Published online: 09 May 2024

Abstract

The human body harbors an extremely complex and dynamic microbial community (10-100 trillion) of bacteria, archaea, viruses, and eukaryotes. The human microbiota plays a crucial role in the environment and human health under normal circumstances; nonetheless, dysfunction of the human microbiome has been associated with illnesses ranging from inflammatory bowel disease to multidrug-resistant infections. Culture-dependent human microbial exploration approaches are inadequate for discovering the diversity, abundance, and full genetic and metabolic potential of the whole microbial ecosystem. Hence, metagenomics analysis is a powerful advanced technology for comprehensively studying human microbial composition and diversity, exploring novel and antibiotic resistance genes, microbial metabolic pathways, functional dysbiosis, and co-evolution of the microbiome with the host. Therefore, owing to the increase in human microbiome sequencing projects in healthy and diseased people worldwide, it is feasible to explore the human microbiome using a metagenomics approach. Thus, this review focuses on the advancement of metagenomics for exploring the human microbiome, human microbiome metagenomics data processing, and analysis strategies. Currently used bioinformatics tools and their approaches are also discussed in the context of human microbial metagenomics.

Introduction

The human body harbors an extremely complex and dynamic microbial community (10-100 trillion) that includes bacteria, archaea, fungi, protests, and viruses (bacteriophages) (Jovel et al. Citation2018). Most live in the gut, particularly in the large intestine, which plays a fundamental role in well-being. Due to the limitation of exploring the human microbiome using culture-dependent methods, recently, powerful new culture-independent methodologies such as metagenomics technology have been developed, which improve the discovery of the human microbiome through global analysis of ecosystems (Nogueira and Botelho Citation2021).

Currently, there are two advanced sequence analysis technologies: amplicon sequencing (16S rDNA for bacteria and archaea, ITS sequencing for fungi, and 18S rDNA for Protista) to reveal ‘which microorganism is there?’(taxonomic diversity) in a given microbial community; while shotgun metagenomics sequencing focuses on the sequencing of the whole genome of the sample and is mainly targeted to answer the scientific question of ‘what are they doing?’ (Quince et al. Citation2017). Performing taxonomic profiling as well as functional analysis, can address the taxonomic composition and community composition, diversity and similarity of functional profiles, analysis of variation, and differential abundance (Wang et al. Citation2015).

The National Institutes of Health Road Map for Biomedical Research financed the Human Microbiome Project (HMP) in 2007, which was designed as a sizable, genome-scale community research project (Gevers et al. Citation2012). This project's primary goal was to create a baseline database of the microorganisms that are often present in and on human hosts, including descriptions of their typical phylogenetic, taxonomic, biogeographic, ecological, metabolic, and functional patterns. This baseline information is important to study the diversity, novel and antibiotic resistance genes and human microbiota pathway using amplicon (specific known marker genes) and shotgun metagenomics (entire metagenome) approaches (Gordon et al. Citation2007). The development of big data and bioinformatics has become one of the main engines fueling the burgeoning microbiome field of study. There were no scholarly journals that specifically address this issue, though. In this context, iMeta was established to support academic work that integrates big data, cutting-edge, state-of-the-art (bioinformatics) approaches, microbiome research, and these techniques (Liu et al. Citation2022).

This paper aims to review the advancement of metagenomics applications for human microbiome exploration, mainly gene amplicon and shotgun metagenomics sequencing approaches, data processing workflow and analyzing strategy with bioinformatics analysis tools.

Microbiome association in human health and disease mechanism

The human body harbors approximately 10–100 trillion extremely complex and dynamic microbial cell communities, including bacteria, archaea, fungi, protests, and viruses (bacteriophages) (Althani et al. Citation2016). The maximum microbial diversity was observed in the gut. The interaction between the human microbiota population and human health is still not well known and explored, but many epidemiological research results (Figure ) (Krishna et al. Citation2019) show that the global diminution in the diversity of the human microbiome is associated with various human diseases such as gastrointestinal cancers, diabetes, metabolic diseases, inflammatory bowel disease, respiratory diseases, and irritable bowel syndrome (Bakhtiar et al. Citation2013; Malla et al. Citation2019).

Figure 1. Representation of the human-associated microbiome (Krishna et al. Citation2019).

Figure 1. Representation of the human-associated microbiome (Krishna et al. Citation2019).

These findings suggest that the human microbiome community plays an important role, including defending the host from microbial attack, regulating metabolic processes, producing immunomodulatory metabolites, maintaining the immune system, synthesizing essential nutrients for the body, breaking different toxins and drugs, and promoting mucosal structure and function (Rooney et al. Citation2020).

A metagenomics approach has the ability to identify microbiome-based associations with diseases such as adenoma, colorectal cancer (CRC), liver cirrhosis (CIRR), inflammatory bowel disease (IBD), lung infections, respiratory RNA viral infections (Picornaviridae, Coronaviridae, Paramyxoviridae, and Orthomyxoviridae), type 2 diabetes (T2D), bloodstream infections, otitis, atherosclerotic cardiovascular disease (ACVD), and central nervous system infections, with a novel emphasis on gene-level, cross-disease associations. However, metagenomics in the medical diagnosis of pathogenic microorganisms faces many challenges, such as the ability to detect dead or dormant microorganisms by metagenomics which leads to incorrect decisions, the possibility of false-negative results in the case of intracellular bacterial infection due to the use of the wrong cell wall rupture method, and DNA contamination from environmental microorganisms or hosts prior to data analysis (Tierney et al. Citation2021).

Metagenomics application in exploration of human microbiome

There are countless human microbiome species that are not yet well discovered, because there is still a need to understand and answer challenging questions for the future, such as what they are and what they do, how our association with microbes has evolved, the forces that shape it, how this co-evolution impacts our health, and how the changes in the biosphere may affect it. Recently, powerful new culture-independent methodologies, such as metagenomics and next-generation sequencing technology (NGS), have been developed to extensively facilitate the discovery of the human microbiome through global analysis of ecosystems (Allaband et al. Citation2019). Metagenomics-based exploration of human microbes (identification and genotyping) uses two main strategies (Figure ) (Chiu and Miller Citation2019): targeted metagenomics (target gene amplification) and shotgun metagenomics (whole-genome shotgun sequencing) (Morgan et al. Citation2013).

Figure 2. General metagenomics strategy for human microbial discovery and analysis (Chiu and Miller Citation2019). Figure reproduced with permission from Chiu Charles (2024).

Figure 2. General metagenomics strategy for human microbial discovery and analysis (Chiu and Miller Citation2019). Figure reproduced with permission from Chiu Charles (2024).

Targeted gene amplification (16S rRNA) based human microbial exploration

Currently, the targeted metagenomics (16S/18S/ITS amplicon sequencing) approach is well established (Gao et al. Citation2021) and has become an increasingly common method for microbial identification as well as phylogeny and taxonomy studies of samples from complicated microbiomes or environments. The small subunit ribosomal (16S rRNA) gene is a housekeeping genetic marker that contains approximately 1,500 bp highly conserved regions (a targeted region for designing universal primers specific to known sequences shared by all bacteria) and nine different hypervariable regions (V1-V9: which are unique to each bacterial species and allow for classification or taxonomy) across different taxa (Kameoka et al. Citation2021).

Hence, universal primers have been designed for the entire 16S rRNA gene, which covers the conserved, variable, and hypervariable regions, and used to study bacterial phylogeny and taxonomic compartment. Most 16S rRNA-based microbiota profiling and genotyping protocols focus on the amplification and sequencing of V1 – V2, V3 – V4, V3-V5, V4, V6-V9, V1-V9 hypervariable regions using NGS technologies as a universal taxonomic marker to identify and catalog microbial profiles (Mancabelli et al. Citation2020). Of these, the V1 – V2 and V1 – V3 regions are the most reliable regions in the full-length 16S rRNA sequences, whereas most V3 to V6 regions (including V3, V4, V5, V6, V3 – V4, V4 – V5, V3 – V6, V4-V6, and V5 – V6) are more closely aligned with the SILVA SSU Ref 123NR database. Overall, V4 is the most prominent variable region of 16S rRNA for achieving good domain specificity, higher coverage, and a broader spectrum in the bacterial domain (Zhang et al. Citation2019). The V4 region has been suggested by the MetaHIT consortium as the gold standard 16S rRNA region for general human microbial community assessment across a range of very different environments (Osman et al. Citation2018). The V3 region is optimal for community outlining of archaea, and other regions, such as V1 – V2 and V3 – V4, have been suggested for genotyping archaea species in multifarious microbial communities (Bharti and Grimm Citation2021).

Fungal rDNA comprises the coding spacer region (rRNA markers of 18S, 5.8S, and 28S units), noncoding spacer region (ITSs; ITS1 + 5.8S + ITS2, ITS), and intergenic (LSU 25–28S and D1/D2 26S rDNA) spacer sequences. The variable gene regions of the internal transcribed spacer have been recommended for genotyping fungal species (Impullitti and Malvick Citation2013).

For microbiome profiling, 16S rRNA, which is the gold standard in microbial typing for bacteria and archaea, and 18S rRNA/ITS gene for fungi are first amplified by PCR with universal primers (Table ) annealed to conserved regions and then sequenced. The sequencing data are subjected to bioinformatics analysis using freely available analysis tools for taxonomic classification (Quantitative Insights into Microbial Ecology (QIIME), Mothur, DADA2, Phyloseq, and METAGENassist) (Shahi et al. Citation2019) with the following three important steps: data pre-processing and quality management, taxonomic profiling, and community characterization, in which the variable regions are used to discriminate between bacterial taxa (Matsuo et al. Citation2021).

Table 1. Standard 16S/18S/ITS amplicon metagenomics sequencing platform.

Presently, there are two approaches for analyzing amplicon sequence data, such as operational taxonomic unit (OTU) analysis and zero-radius OTUs (zOTUs) or amplicon sequence variant (ASV) analysis, which has been proposed as an alternative to OTUs because it corrects for sequencing errors using various de-noising methods (Maruyama et al. Citation2020). At a threshold of 97% sequence similarity, similar sequences are clustered into OTUs for OTU-based analysis. After clustering, sequences are subjected to microbial taxonomic profiling by comparison with well-known 16S rRNA databases like Greengenes (GG), SILVA, GenBank (http://www.ncbi.nlm.nih.gov), Ribosomal Database Project (RDP) II (http://rdp.cme.msu.edu), Genomic-based 16S rRNA Database (GRD) and All-Species Living Tree (LTP) (Abellan-Schneyder et al. Citation2021).

Evaluation of the complete 16S rRNA gene sequences using comparison software packages such as BLAST and CLUSTAL X are widely used to establish taxonomic relationships between microbiota with 98.65% similarity, which is now accepted as the cut-off for distinguishing species. To investigate the relationships between the microbiota in a community, phylogenetic trees can be created using the EzTaxon server (http://www.ezbiocloud.net/eztaxon), MEGA11.0, PHYLIP and EzEditor (Yoon et al. Citation2017).

Amplicon (targeted) metagenomics approaches are fast, low-cost, and have curated reference databases. They are widely used to identify (what are there) taxonomic classification, recognize novel pathogens, and determine the abundance (how much) of microbial species from complicated samples. The computational approaches PICRUSt and Genometraits have been successfully used to predict the functional composition of metagenomes using marker genes and a reference genome database (Ji and Nielsen Citation2015). Similarly, 18S rRNA and ITS sequencing follow the same strategy but target the ITS region found in fungal genomes and other eukaryotic organisms. Data from the ITS region are advantageous for fungal phylogenetic analysis because this region has more variation than the ribosomal sequences (18S). According to https://en.novogene.com/services/research-services/metagenome-sequencing/16s-18s-itsamplicon-metagenomic-sequencing/ the standard and recommended parameters for designing primers for 16S/18S/ITS amplicon metagenomics sequencing are listed in Table .

Shotgun metagenomics approaches for discovering human microbiome

Shotgun metagenomics is a culture-independent high-throughput sequencing method used to examine all microbes taken directly from an environmental sample by direct sequencing of their total DNA (metagenome). Metagenomics shotgun sequencing is a rapid and powerful tool for obtaining all genetic information in all organisms within a microbial community (Joseph and Pe’er Citation2021). This complex technique is more expensive, but useful when attempting to understand which microbes are present in the community and what their functional roles are (Dai et al. Citation2018), by providing insight into community taxonomic profiling and the functional potential of the microbial community. It also provides genetic information on potentially novel biocatalysts, genomic relationships between function and phylogeny for uncultured organisms, evolutionary profiles, function, and composition of the community using functional annotated databases or gene catalogs, such as SEED or KEGG (Kyoto Encyclopaedia of Genes and Genomes) which assemble the gene products of biological processes and microbial metabolic pathways (Mitra et al. Citation2011).

Metagenomics studies have also been used to explore novel antibiotic resistance genes (ARG) and their evolution in unknown uncultivable bacteria by detecting the antibiotic resistance profile of the microbial community (Wang et al. Citation2020) through alignment-based homology searches against an ARG reference database including ARDB, SARG, CARD, and ResFinder (Berglund et al. Citation2019). Typically, after the initial study design, a shotgun metagenomics study includes five main steps such as: (I) assembly, processing, and sequencing of the samples; (II) pre-processing of the sequencing reads; (III) sequence analysis to profile taxonomic, functional, and genomic features of the microbiome; (IV) statistical and biological post-processing analysis; and (V) validation (Quince et al. Citation2017).

Human microbiome metagenomics data processing and analysis using bioinformatics tools

Analyzing the metagenome of the human microbiome using NGS methods has significantly improved our understanding and knowledge of microbial diversity and abundance, microbial-associated metabolic pathways, and biological and ecological functions of microbiome communities (Bansal and Boucher Citation2019). However, microbiome analysis is affected by experimental conditions and computationally exhaustive downstream investigations, which might lead to biases and errors. According to Figure (Martin et al. Citation2018 (with some modification)), at each step technical sources of variation, such as sampling and storage methods, DNA extraction, or library preparation protocols, are generated and can be avoided by adopting homogeneous steps, as well as by calibrating measurements and normalizing data, allowing the comparison among samples by adjusting for individual technical variability (Sinha et al. Citation2015).

Figure 3. Basic metagenomics data analysis steps and currently used bioinformatics tools (Martin et al. Citation2018).

Figure 3. Basic metagenomics data analysis steps and currently used bioinformatics tools (Martin et al. Citation2018).

Bioinformatics software and tools are emerging for providing the capability to explore the taxonomic and functional composition of diverse metagenomes through metagenomics analysis by translating raw sequences into meaningful data. The general workflow and the most popular bioinformatics tools and algorithms for the analysis of human microbial metagenomics data are summarized in (Table ).

Table 2. Metagenomics data analysis steps and popular bioinformatics tools with their functions

Collection, processing, and sequencing of the samples

A standard metagenomics experiment comprises the following steps: collection and extraction of total DNA, library preparation, and sequencing of the human microbiome sample. Both the quality and accuracy of the metagenomics data can be affected by the protocols used for sample collection, preservation, and genome extraction. Therefore, these protocols must be effective for diverse microbial taxa; otherwise, sequencing results may only be DNA derived from easy-to-lyse microbes (D’Argenio et al. Citation2014).

The total genomes of the human microbiome samples are collected and extracted with a reasonable coverage of all present microbial genomes (DNA ≥ 3 μg (concentration ≥ 30 ng/μl; volume ≥ 30 μl, OD260/280 = 1.8∼2.0)). Frequently used DNA extraction methods and kits from environmental samples include Qiagen DNA Microbiome Kit, MoBIO DNA Extraction Kit, Epicenter Meta-G-Nome DNA Isolation Kit, Epicenter Metagenomics DNA Isolation Kit for water (Bryanskaya et al. Citation2021; Rahman et al. Citation2022). The sequencing library is created by fragmenting DNA into smaller pieces, followed by ligation of adapter sequences to the 5′ and/or 3′ ends of each DNA fragment. The final steps include library clean-up, amplification, and quantification, after which the library is ready for sequencing. The most popular kits for library construction are the Bioo Scientific NEXTflex PCR Free DNA Sequencing Kit, Illumina TruSeq PCR Free Library Preparation Kit, and Kapa Hyper Prep Kit (Simper et al. Citation2022).

Metagenomics sequence data pre-processing and quality control methods (de-duplication, trimming, and decontamination)

Quality control (QC) techniques (deduplication, trimming, and decontamination) are used to filter raw reads from low-quality biological samples (sequencing artifacts), adapter sequences, host-associated read contamination, and technical artifacts. Hence, deduplication is an essential QC step in metagenomics studies to remove unwanted duplications (Turner et al. Citation2011).

Primer adapters and low-quality bases also affect the quality of reads but can be identified and removed by trimming steps and software. For shotgun metagenomics data, it is recommended to select trimming software (cutadapt or fastqMcf) that can remove low-quality bases from both the termini of each sequence. For 16S rRNA gene sequence data, it is recommended to trim sequences along the entire length by beginning from the 5’ end and with a high-quality threshold (Karlsson et al. Citation2014). The most popular software for performing data quality checks is FastQC and KneadData for quality control and removal of host DNA from the metagenome sequence. Other tools such as DeconSeq, Trimmomatic, sickle and BBTools are also broadly applicable, using Bowtie and BWA for quality trimming and contamination exclusion. Generally, read length, base quality (Phred score), ambiguous base calls, homopolymers, sequence complexity, and CG content are basic parameters for measuring the quality of metagenomics data (Patel and Jain Citation2012).

Taxonomic binning and profiling (clustering) of bacterial sequences

Taxonomic binning is another analysis step to cluster sequence reads into a group based on their sequence similarity (97%) using BLAST, MEGAN, MG-RAST) and/or composition and assigned to specific taxa or operational taxonomic units (phylotypes). There are two main approaches for OTU clustering, including de novo OTU clustering (taxonomy dependent), which compares all sequences to each other for clustering of OTUs with utilized taxonomic information from a reference database, and a taxonomy independent approach that searches sequences against a 16S rRNA reference database (usually Greengenes, RDP, SILVA), and sequences failing to match the reference database are discarded (Martin et al. Citation2018). The OTUs are compared against a reference database (NCBI RefSeq, KEGG) or specific microbial databases, such as Gene Catalog from MetaHIT and MEDUSA, to assign the taxonomical classification or diversity within and between the samples (Qian et al. Citation2020).

MetaPhlAn2 and HUMAnN2 are popular bioinformatics tools for taxonomic classification and profiling of metabolic pathways in the human microbiome. Currently, there are at least 20 bioinformatics software tools (Table ) used for taxonomic classification of metagenomic sequence data, among which the most recommended is ultra-fast classifier Kraken 2 which provides reliable, accurate, and fast results at the species level (de Sá et al. Citation2018).

Sequence assembly/data validation and building consensus sequence

Sequence assembly is a technique for aligning and merging sequence reads to reconstruct the original genomic sequence for discovering and determining new species and novel functional gene sequences of microbiome genomes. Metagenomics datasets are composed of a mixture of reads belonging to multiple organisms with various levels of taxonomic relationships. Assembly can be either de novo (which combines reads into contiguous sequences without using a reference genome) or reference-based (which uses known sequences from the organism itself or the phylogenetically closest organism). Recently, several efficient software tools, such as MEGAHIT, metaSPAdes, RayMeta, MetaVelvet, IDBA-UD, SOAPdenovo2, and Omega have been developed to assemble metagenomic clean reads into contigs, and the genes are predicted from contigs using MetaProdigal, MetaGeneAnnotator, MetaGeneMark, Glimmer-MG, MetaGUN, FragGeneScan, Orphelia, and Prokka software (Martin Citation2011; Qian et al. Citation2020).

Functional (metagenome) annotation

Functional annotation is the process of investigating functional elements along the sequence of a genome, and describes the function of the product of a predicted gene using bioinformatics tools in silico and various reference databases. Functional annotation can be nucleotide-level annotation (identifying the physical location of DNA sequences); protein-level annotation (determining the possible functions of genes by comparison with reference protein sequence databases, such as UniProt, NCBI RefSeq and SMART, using homology-based tools); and process-level annotation (identifying the pathways and processes in which different genes interact) (Qian et al. Citation2020).

In genome annotation, the sequence reads are mapped to a reference genome using various mapper algorithms including, BLAST suite, or faster BLAST tools (SMALT, Bowtie2, BWA, DIAMOND and GEM Mapper) to query gene and protein databases, respectively. To perform functional annotation, the number of reads that are mapped to genes and proteins are transformed into tables that represent coverage and abundance. These tables are then integrated with information from publicly available database resources that cover various functional properties (SEED, KEGG, MetaCyc and HUMAnN) to assess the presence/absence and abundance of metabolic pathways; ARDB, CARD, and Resfams to determine antibiotic resistance genes; protein family annotations (PFAM), gene ontology (GO), and clusters of orthologous groups (COG) for microbial diversity (Truong et al. Citation2016). The most popular bioinformatics software programs and tools for metagenomics data analysis have been listed in Table .

Furthermore, many useful tools for microbiome research have been developed for various applications such as ggtree for phylogenetic visualization (Xu et al. Citation2022), ImageGP for data visualization (Chen et al. Citation2022), ggClusterNet for network visualization and calculation (Wen et al. Citation2022).

Conclusion

The metagenomics approach holds enormous potential for exploring and identifying microbiome diversity and abundance, revealing novel antibiotic resistance genes and microbial pathways, and uncovering functional dysbiosis. With advances in metagenomics, it is now possible to assess the human microbiome as a whole ecosystem and investigate the relationship between the microbiota and the role of this interaction in host health. Metagenomics holds great promise for exploring the human microbiome; however, designing cohesive microbial DNA extraction and purification techniques, enhancing computational algorithms, and compiling all necessary reference databases are important. In addition to bacteria, few studies have addressed eukaryotes and viruses using the metagenomics approach; therefore, future studies on the human microbiome will depend on metagenomics, and more effort is needed urgently.

Author contributions

BM and DY contributed to the study concept and design. BM and DY collected and sorted out the literature, wrote the first draft and edited. All authors contributed to the article and approved the submitted version.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

Data sharing does not apply to this article as no new data were created or analyzed in this study.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

  • Abellan-Schneyder I, Matchado MS, Reitmeier S, Sommer A, Sewald Z, Baumbach J, List M, Neuhaus K. 2021. Primer, pipelines, parameters: issues in 16S rRNA gene sequencing. Msphere. 6(1):e01202–e01220. doi:10.1128/mSphere.01202-20.
  • Allaband C, McDonald D, Vázquez-Baeza Y, Minich JJ, Tripathi A, Brenner DA, Loomba R, Smarr L, Sandborn WJ, Schnabl B. 2019. Microbiome 101: studying, analyzing, and interpreting gut microbiome data for clinicians. Clin Gastroenterol Hepatol. 17(2):218–230. doi:10.1016/j.cgh.2018.09.017.
  • Althani AA, Marei HE, Hamdi WS, Nasrallah GK, El Zowalaty ME, Al Khodor S, Al-Asmakh M, Abdel-Aziz H, Cenciarelli C. 2016. Human microbiome and its association with health and diseases. J Cell Physiol. 231(8):1688–1694. doi:10.1002/jcp.25284.
  • Anderson MJ, Walsh DC. 2013. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing? Ecol Monogr. 83(4):557–574. doi:10.1890/12-2010.1.
  • Bakhtiar SM, LeBlanc JG, Salvucci E, Ali A, Martin R, Langella P, Chatel J-M, Miyoshi A, Bermúdez-Humarán LG, Azevedo V. 2013. Implications of the human microbiome in inflammatory bowel diseases. FEMS Microbiol Lett. 342(1):10–17. doi:10.1111/1574-6968.12111.
  • Bansal, V., & Boucher, C. 2019. Sequencing technologies and analyses: where have we been and where are we going? (Vol. 18, pp. 37–41): Elsevier.
  • Berglund F, Österlund T, Boulund F, Marathe NP, Larsson D, Kristiansson E. 2019. Identification and reconstruction of novel antibiotic resistance genes from metagenomes. Microbiome. 7(1):1–14. doi:10.1186/s40168-019-0670-1.
  • Bharti R, Grimm DG. 2021. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform. 22(1):178–193. doi:10.1093/bib/bbz155.
  • Bono H, Kasukawa T, Furuno M, Hayashizaki Y, Okazaki Y. 2002. FANTOM DB: database of functional annotation of RIKEN mouse cDNA clones. Nucleic Acids Res. 30(1):116–118. doi:10.1093/nar/30.1.116.
  • Bryanskaya AV, Shipova AA, Rozanov AS, Volkova OA, Lazareva EV, Uvarova YE, Goryachkovskaya TN, Peltek SE. 2021. Metagenomics dataset used to characterize microbiome in water and sediments of the lake Solenoe (Novosibirsk region, Russia). Data Brief. 34:106709. doi:10.1016/j.dib.2020.106709.
  • Buza TJ, McCarthy FM, Burgess SC. 2007. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome. BMC Gen. 8:1–10. doi:10.1186/1471-2164-8-1.
  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 7(5):335–336. doi:10.1038/nmeth.f.303.
  • Chen T, Liu YX, Huang L. 2022. ImageGP: an easy-to-use data visualization web server for scientific researchers. Imeta. 1(1):e5. doi:10.1002/imt2.5.
  • Chiu CY, Miller SA. 2019. Clinical metagenomics. Nat Rev Genet. 20(6):341–355. doi:10.1038/s41576-019-0113-7.
  • Dai D, Rhoads WJ, Edwards MA, Pruden A. 2018. Shotgun metagenomics reveals taxonomic and functional shifts in hot water microbiome due to temperature setting and stagnation. Front Microbiol. 9:2695. doi:10.3389/fmicb.2018.02695.
  • D’Argenio V, Casaburi G, Precone V, Salvatore F. 2014. Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines. BioMed Res Int. 2014.
  • de Sá PH, Guimarães LC, das Graças DA, de Oliveira Veras AA, Barh D, Azevedo V, da Silva ALdC, Ramos RT. 2018. Next-generation sequencing and data analysis: strategies, tools, pipelines and protocols. In Omics Technologies and Bio-Engineering (pp. 191–207). Elsevier.
  • Douglas GM, Beiko RG, Langille MG. 2018. Predicting the functional potential of the microbiome from marker genes using PICRUSt. Microbiome Anal: Methods Prot. 169–177. doi:10.1007/978-1-4939-8728-3_11.
  • Eaton WD, Hamilton DA. 2023. Enhanced carbon, nitrogen and associated bacterial community compositional complexity, stability, evenness, and differences within the tree-soils of Inga punctata along an age gradient of planted trees in reforestation plots. Plant Soil. 484(1-2):327–346. doi:10.1007/s11104-022-05793-8.
  • Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N. 2018. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods. 15(11):962–968. doi:10.1038/s41592-018-0176-y.
  • Gao B, Chi L, Zhu Y, Shi X, Tu P, Li B, Yin J, Gao N, Shen W, Schnabl B. 2021. An introduction to next generation sequencing bioinformatic analysis in gut microbiome studies. Biomolecules. 11(4):530. doi:10.3390/biom11040530.
  • Gevers D, Pop M, Schloss PD, Huttenhower C. 2012. Bioinformatics for the human microbiome project. PLoS Comput Biol. 8(11):e1002779. doi:10.1371/journal.pcbi.1002779.
  • Gordon JI, Turnbaugh P, Ley R, Hamady M, Fraser-Liggett C, Knight R. 2007. The human microbiome project. Nature. 449(7164):804–810.
  • Impullitti A, Malvick D. 2013. Fungal endophyte diversity in soybean. J Appl Microbiol. 114(5):1500–1506. doi:10.1111/jam.12164.
  • Ji B, Nielsen J. 2015. New insight into the gut microbiome through metagenomics. Adv Genomics Genet. 5:77–91.
  • Joseph TA, Pe’er I. 2021. An introduction to whole-metagenome shotgun sequencing studies. Methods Mol Biol. 2243:107–122. doi:10.1007/978-1-0716-1103-6_6.
  • Jovel J, Dieleman LA, Kao D, Mason AL, Wine E. 2018. The human gut microbiome in health and disease. Metagenomics. 197–213.
  • Kameoka S, Motooka D, Watanabe S, Kubo R, Jung N, Midorikawa Y, Shinozaki NO, Sawai Y, Takeda AK, Nakamura S. 2021. Benchmark of 16S rRNA gene amplicon sequencing using Japanese gut microbiome data from the V1–V2 and V3–V4 primer sets. Bmc Genomics. 22(1):1–10. doi:10.1186/s12864-021-07746-4.
  • Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 7:e7359. doi:10.7717/peerj.7359.
  • Karlsson FH, Nookaew I, Nielsen J. 2014. Metagenomic data utilization and analysis (MEDUSA) and construction of a global gut microbial gene catalogue. PLoS Comput Biol. 10(7):e1003706. doi:10.1371/journal.pcbi.1003706.
  • Krishna SB, Dubey A, Malla MA, Kothari R, Upadhyay CP, Adam JK, Kumar A. 2019. Integrating microbiome network: establishing linkages between plants, microbes and human health. Open J Med Microbiol. 13(1.
  • Lin H, Peddada SD. 2020. Analysis of microbial compositions: a review of normalization and differential abundance analysis. NPJ Biofilms Microbiomes. 6(1):60. doi:10.1038/s41522-020-00160-w.
  • Liu YX, Chen T, Li D, Fu J, Liu SJ 2022. iMeta: Integrated meta-omics for biology and environments (Vol. 1, pp. e15). Wiley Online Library.
  • Malla MA, Dubey A, Kumar A, Yadav S, Hashem A, Abd_Allah EF. 2019. Exploring the human microbiome: the potential future role of next-generation sequencing in disease diagnosis and treatment. Front Immunol. 9:2868. doi:10.3389/fimmu.2018.02868.
  • Mancabelli L, Milani C, Lugli GA, Fontana F, Turroni F, van Sinderen D, Ventura M. 2020. The impact of primer design on amplicon-based metagenomic profiling accuracy: detailed insights into bifidobacterial community structure. Microorganisms. 8(1):131. doi:10.3390/microorganisms8010131.
  • Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.J. 17(1):10–12. doi:10.14806/ej.17.1.200.
  • Martin TC, Visconti A, Spector TD, Falchi M. 2018. Conducting metagenomic studies in microbiology and clinical research. Appl Microbiol Biotechnol. 102:8629–8646. doi:10.1007/s00253-018-9209-9.
  • Maruyama H, Masago A, Nambu T, Mashimo C, Okinaga T. 2020. Amplicon sequence variant-based oral microbiome analysis using QIIME 2. J Osaka Dent Univ. 54(2):273–281.
  • Matsuo Y, Komiya S, Yasumizu Y, Yasuoka Y, Mizushima K, Takagi T, Kryukov K, Fukuda A, Morimoto Y, Naito Y. 2021. Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution. BMC Microbiol. 21:1–13. doi:10.1186/s12866-021-02094-5.
  • Mitra S, Rupek P, Richter DC, Urich T, Gilbert JA, Meyer F, Wilke A, Huson DH. 2011. MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity. BMC Bioinformatics. 12(1):1–8. doi:10.1186/1471-2105-12-1.
  • Morgan XC, Segata N, Huttenhower C. 2013. Biodiversity and functional genomics in the human microbiome. Trends Genet. 29(1):51–58. doi:10.1016/j.tig.2012.09.005.
  • Muegge BD, Kuczynski J, Knights D, Clemente JC, González A, Fontana L, Henrissat B, Knight R, Gordon JI. 2011. Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science. 332(6032):970–974. doi:10.1126/science.1198719.
  • Nagpal S, Singh R, Yadav D, Mande SS. 2020. MetagenoNets: comprehensive inference and meta-insights for microbial correlation networks. Nucleic Acids Res. 48(W1):W572–W579. doi:10.1093/nar/gkaa254.
  • Namiki T, Hachiya T, Tanaka H, Sakakibara Y. 2011. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine.
  • Nayfach S, Bradley PH, Wyman SK, Laurent TJ, Williams A, Eisen JA, Pollard KS, Sharpton TJ. 2015. Automated and accurate estimation of gene family abundance from shotgun metagenomes. PLoS Comput Biol. 11(11):e1004573. doi:10.1371/journal.pcbi.1004573.
  • Nogueira T, Botelho A. 2021. Metagenomics and other omics approaches to bacterial communities and antimicrobial resistance assessment in aquacultures. Antibiotics. 10(7):787. doi:10.3390/antibiotics10070787.
  • Osman MA, Neoh HM, Ab Mutalib NS, Chin SF, Jamal R. 2018. 16S rrna gene sequencing for deciphering the colorectal cancer Gut microbiome: current protocols and workflows. Front Microbiol. 9:767. doi:10.3389/fmicb.2018.00767.
  • Patel RK, Jain M. 2012. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 7(2):e30619. doi:10.1371/journal.pone.0030619.
  • Qian X-B, Chen T, Xu Y-P, Chen L, Sun F-X, Lu M-P, Liu Y-X. 2020. A guide to human microbiome research: study design, sample collection, and bioinformatics analysis. Chin Med J. 133(15):1844–1855. doi:10.1097/CM9.0000000000000871.
  • Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. 2017. Erratum: Corrigendum: Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 35(12):1211–1211. doi:10.1038/nbt1217-1211b.
  • Rahman MA, Rajput A, Prakash A, Chariar VM. 2022. Metagenomics—an approach for selection of oil degrading microbes and its application in remediation of oil pollution. In: Advances in Oil-Water Separation. Elsevier; p. 319–335.
  • Rooney CM, Mankia K, Emery P. 2020. The role of the microbiome in driving RA-related autoimmunity. Front Cell Dev Biol. 8:538130. doi:10.3389/fcell.2020.538130.
  • Ruiz-Perez CA, Conrad RE, Konstantinidis KT. 2021. Propedia: a database for protein–peptide identification based on a hybrid clustering algorithm. BMC Bioinformatics. 22:1–16. doi:10.1186/s12859-020-03881-z.
  • Shahi SK, Zarei K, Guseva NV, Mangalam AK. 2019. Microbiota analysis using two-step PCR and next-generation 16S rRNA gene sequencing. J Vis Exp. 152:e59980.
  • Simper M, Della Coletta L, Gaddis S, Lin K, Mikulec C, Takata Y, Tomida M, Zhang D, Tang D, Estecio M. 2022. Commercial ChIP-Seq library preparation kits performed differently for different classes of protein targets.
  • Sinha R, Abnet CC, White O, Knight R, Huttenhower C. 2015. The microbiome quality control project: baseline study design and future directions. Genome Biol. 16:1–6. doi:10.1186/s13059-015-0841-8.
  • Tierney BT, Tan Y, Kostic AD, Patel CJ. 2021. Gene-level metagenomic architectures across diseases yield high-resolution microbiome diagnostic indicators. Nat Commun. 12(1):2907. doi:10.1038/s41467-021-23029-8.
  • Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N. 2016. Erratum: MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods. 13(1):101–101. doi:10.1038/nmeth0116-101b.
  • Turner S, Loren L, Yuki B, Christopher S, Dana C, Andrew T. 2011. Mariza de A, Kimberly FD, Jonathan LH, Geoffrey H, Gail J. Lan J, Iftikhar JK, Rongling L, Hua L, Teri AM, Martha M, Catherine AM, Andrew NM, Daniel BM, Justin EP, Elizabeth WP, Luke VR, Russell AW, Rebecca LZ, Marylyn DR: Quality control procedures fo r genome-wide association studies. Curr Prot Hum Genet. 1–19.
  • Wang J, Xiong K, Zhao S, Zhang C, Zhang J, Xu L, Ma A. 2020. Long-term effects of multi-drug-resistant tuberculosis treatment on gut microbiota and its health consequences. Front Microbiol. 11:53. doi:10.3389/fmicb.2020.00053.
  • Wang W-L, Xu S-Y, Ren Z-G, Tao L, Jiang J-W, Zheng S-S. 2015. Application of metagenomics in the human gut microbiome. World J Gastroenterol. 21(3):803. doi:10.3748/wjg.v21.i3.803.
  • Wen T, Xie P, Yang S, Niu G, Liu X, Ding Z, Xue C, Liu YX, Shen Q, Yuan J. 2022. ggClusterNet: An R package for microbiome network analysis and modularity-based multiple network layouts. Imeta. 1(3):e32. doi:10.1002/imt2.32.
  • Xu S, Li L, Luo X, Chen M, Tang W, Zhan L, Dai Z, Lam TT, Guan Y, Yu G. 2022. Ggtree: a serialized data object for visualization of a phylogenetic tree and annotation data. Imeta. 1(4):e56. doi:10.1002/imt2.56.
  • Yoon S-H, Ha S-M, Kwon S, Lim J, Kim Y, Seo H, Chun J. 2017. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol. 67(5):1613–1617. doi:10.1099/ijsem.0.001755.
  • Zhang X, Li L, Butcher J, Stintzi A, Figeys D. 2019. Host immunoglobulin G selectively identifies pathobionts in pediatric inflammatory bowel diseases. Microbiome. 7(1):1–12. doi:10.1186/s40168-018-0604-3.