Abstract
To explore a noninvasive method for diagnosis of SEA-thalassemia and to investigate whether the regional factors affect the accuracy of this method. The method involved using a public database and bioinformatics software to construct parental haplotypes for proband and predicting fetal genotypes using relative haplotype dosage. We screened and downloaded sequencing data of couples who were both SEA-thalassemia carriers from the China National Genebank public data platform, and matched the sequencing data format with that of the reference panel using Ubuntu system tools. We then used Beagle software to construct parental haplotypes, predicted fetal haplotypes by relative haplotype dosage. Finally, we used Hidden Markov Model and Viterbi algorithm to determine fetal pathogenic haplotypes. All noninvasive fetal genotype diagnosis results were compared with gold standard gap-PCR electrophoresis results. Our method was successful in diagnosing 13 families with SEA-thalassemia carriers. The best diagnostic results were obtained when Southern Chinese Han was used as the reference panel, and 10 families showed full agreement between our noninvasive diagnostic results and the gap-PCR electrophoresis results. The accuracy of our method was higher when using a Chinese Han as the reference panel for haplotype construction in the Southern Chinese Han region as opposed to Beijing Chinese region. The combined use of public databases and relative haplotype dosage for diagnosing SEA-thalassemia is a feasible approach. Our method produces the best noninvasive diagnostic results when the test samples and population reference panel are closely matched in both ethnicity and geography. When constructing parental haplotypes with our method, it is important to consider the effect of region in addition to population background alone.
Authors’ contributions
DL: study design, data curation, and manuscript writing; XN: experiment, data curation, and manuscript writing; FL: experiment and data analysis; CN: data collection and analysis; TW: conceptualization, funding acquisition, and supervise study; YT: conceptualization, funding acquisition, and supervise study. All authors agreed to the publication of the final version of the manuscript.
Disclosure statement
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.
Data availability statement
The data used in this study were obtained from public databases, where the reference panel data for the construction of parental reference haplotypes were obtained from 1000 Genomes haplotypes phase3(ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/ 20130502); 13 SEA-thalassemia carriers sequencing data families from CNGB Nucleotide Sequence Archive (CNSA) (https://ftp.cngb.org/pub/CNSA/data2/CNP0000644/ Variation; the run scripts for the HMM and Viterbi algorithms for predicting fetal genotype were derived from https://github.com/liserjrqlxue/NIPT-Thalassemia [Citation35].
The following open software were used: Beagle 4.1https://faculty.washington.edu/browning/beagle/b4_html [Citation33].