126
Views
0
CrossRef citations to date
0
Altmetric
Book Reviews

Sparse Graphical Modeling for High Dimensional Data: A Paradigm of Conditional Independence Tests

Faming Liang and Bochao Jia, New York: Chapman and Hall/CRC, 2023, 150 pp., $120.00, ISBN 9780429061189

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon

Technological advancements have led to a major metamorphosis in numerous fields, including science. With easier access to computer technology, scientists can now collect, store, and dissect data on a previously unknown scale. This has opened the door to the collection of high-dimensional data from areas such as health, environment, society, and a wider range of sciences. Interestingly, the high-dimensional data collected often has a fairly small sample size compared to its limitations. This book is a timely answer to that need. The collection of developments contained in a single source makes this book rich in approaches for sparse graphical modeling. In this environment, sparse graphical modeling is a key approach in breaking down the complexity of high-dimensional data into corridors that can be further accessed for statistical inference purposes. This book consists of nine chapters that review various aspects of high-dimensional graphical modeling, providing an in-depth view of current explorations and practices in this field.

Chapter 1 “Few Graphical Modeling” begins with a brief description of high-dimensional graphical models and introduces the unified paradigm of tentative independence tests for high-dimensional statistical inference. An introductory conception of a few graphical modeling, which allows the breakdown of high-dimensional systems into simpler corridors for further statistical inference, is tied in. The two main data types used, Gaussian data and non-Gaussian data, are followed with their explanations. This is followed by a discussion of a fundamental issue in graphical models, namely the use of conditional independence tests in high-dimensional scenarios.

Chapter 2 “Gaussian Graphical Models” describes in detail the construction of graphical models for high-dimensional Gaussian data using the conditional independence test approach. This system allows the construction of sparse graphical models by assuming tentative independence between arbitrary variables. The ψ-learning system and regularization styles are similar to nodal regression, graphical Lasso, Bayesian regularization, and structural equation modeling are described. Multivariate Gaussian distributions, simulation studies, and application of the ψ learning method to model relationships between genes in biological systems and recover sparse gene regulatory networks is described. Details on the use of auxiliary data, thickness, parallel implementation, and some important points regarding the ψ-learning system in the Gaussian graphical modeling environment are given.

Chapter 3 “Gaussian Graph Modeling with Missing Data” provides another view on the use of the ψ-learning system in handling missing data in Gaussian graph modeling. Two algorithms to handle missing data cases are described. First, the MissGLasso algorithm, which is an extension of the graphical Lasso algorithm, is used to model a set of multivariate distributed Gaussian random variables with mean and covariance matrix. Furthermore, The ψ-learning method is used in imputation-regularization optimization (IRO) algorithms to model the relationship between variables in high-dimensional Gaussian data containing missing data. The IRO method was applied in the analysis of gene expression in the yeast Saccharomyces cerevisiae in response to environmental changes. This application provides researchers with deeper knowledge to understand how genes respond to environmental changes and identify complex relationships between genes that may not be identified by conventional methods.

The previous chapters discussed how to graphically model the Gaussian under the assumption that the data is homogeneous, that is, all samples come from the same multivariate Gaussian distribution, although missing data is allowed. In addition, Chapter 4 “Gaussian Graphical Modeling for Heterogeneous Data” introduces a Gaussian graphical model learning system with a ψ-learning approach for heterogeneous data, which provides a medium to integrate structure information from groups hidden in the data. The approach to modeling miscellaneous data using mixed Gaussian graphical modeling is deeply tied. Insights into the application of the method on real datasets downloaded from The Cancer Genome Atlas (TCGA) to study gene regulatory networks associated with survival time of breast cancer patients are presented. This provides valuable insights in the understanding of biological mechanisms underlying breast cancer patient survival and its potential for the development of more targeted therapies.

Chapter 5 “Poisson Graphical Models” describes the use of Poisson graphical models to study gene regulatory networks with next-generation sequencing (NGS) data. NGS has revolutionized transcriptome studies, for example, through RNA sequencing (RNA-seq). The method’s steps of random effects model based transformation to make NGS data continuous, then transforming the data that has been made continuous into Gaussian through semiparametric transformation, and finally applying the ψ-learning method to recover gene regulatory networks are described. The application of methods in identifying patterns of association between genes to provide valuable insights in understanding gene regulatory networks with RNA-seq data in the context of molecular biology is given.

Chapter 6 “Mixed Graphical Models” discusses p-learning methods for learning mixed graphical models focusing on Gaussian and multinomial data. This method refers to models consisting of continuous and discrete random variables based on Bayesian network theory and the concept of Markov blankets. This method helps identify patterns of relationships between variables, understand network structure, and reveal important information in mixed data. One application of the p-learning method described is understanding genetic relationships in the context of breast cancer. In this study, a dataset from The Cancer Genome Atlas (TCGA) is used, which consists of microarray mRNA gene expression data, mutation data, and DNA methylation data. The consistency of the p-learning Method, which examines the extent to which the method is consistent with appropriate assumptions is also described.

Chapter 7 “Joint Estimation of Multiple Graphical Models” introduces the fast hybrid Bayesian integrative analysis (FHBIA) method for joint estimation of multiple Gaussian graphical models. This method can combine information from different models to produce more accurate estimates. The application of the FHBIA method in the analysis of real datasets collected from the TEDDY group (which stands for The Environmental Determinants of Diabetes in the Young) can identify the central genes for the disease and find changes in the gene regulatory network structure as the disease progresses. The estimation approach, simulation studies to test the performance, consistency, and computational complexity in various data analysis applications are described.

Chapter 8 “Nonlinear and Non-Gaussian Graphical Models” introduces the multiple regression method for studying nonlinear and non-Gaussian graphical models. The method is very general and in theory, can be applied to various types of data provided the necessary variable selection procedures and conditional independence tests are available. Simulation studies using low- and high-dimensional data, consistency, and computational complexity in various data analysis applications are described.

Chapter 9 “High-Dimensional Inference with the Aid of Sparse Graphical Modeling” discusses the development and importance of approaching methods involving Markov neighbor regression (MNR) for high-dimensional statistical inference. Furthermore, related works using high-dimensional linear regression models with other methods such as multi-sample-splitting, ridge projection, post-selection inference, and residual-type bootstrapping are described. Illustrative examples are given to explain the concept and application of the MNR method in the context. The application of the MNR method in simulation studies to test confidence interval construction, simulation studies to test variable selection methods, causal structure discovery, and computational complexity in the context of high-dimensional inference with the help of sparse graphical modeling is described in detail.

Each chapter contains a problem section to test the reader’s understanding of the material presented. By solving the problems in each chapter, readers can test their understanding and practice their ability to apply the concepts they have learned in different contexts. This helps readers deepen their understanding of the material and improve their data analysis skills in high-dimensional statistics.

This book is highly recommended for statistical researchers working in high-dimensional graphical modeling, data scientists, graduate students, and graduates in statistics, biostatistics, biology, computing, or various disciplines. This book provides readers with an in-depth understanding of various methods and techniques in modern data analysis, especially in mixed data, high-dimensional data, and graphical models. Valuable insights on how to overcome challenges in complex data analysis, practical guidance on applying various statistical methods, and modeling funds in various fields make this book very worthy of being one of the best references. By reading this book, readers can expand their knowledge of statistics for high-dimensional graphical models under the framework of multiple conditional independent tests and apply innovative approaches in their research and data analysis.

Vira Ananda
Statistics Research Group, Master Program in Mathematics, Institut Teknologi Bandung, Indonesia
[email protected]
Visi Komala Sari
Master Program in Physics Teaching, Institut Teknologi Bandung, Indonesia
Anisah
Master Program in Physics Teaching, Institut Teknologi Bandung, Indonesia
Utriweni Mukhaiyar
Statistics Research Group, Master Program in Mathematics, Institut Teknologi Bandung, Indonesia

Additional information

Funding

The authors would like to express their sincere gratitude and appreciation to Lembaga Pengelola Dana Pendidikan - LPDP (Indonesia Endowment Fund for Education) under the auspices of the Ministry of Finance of the Republic of Indonesia for providing financial support to the authors. The valuable support and assistance greatly facilitated this publication.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.