106
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An application of logistic regression model to the study of constitutional imbalances in human chromosomal fragile sites

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon &
Article: 2331110 | Received 23 Mar 2023, Accepted 08 Mar 2024, Published online: 29 Mar 2024

Abstract

Chromosomal fragile sites (CFSs) are loci or regions susceptible to spontaneous or induced occurrence of breaks and rearrangements. They are classified in two main categories, rare and common, depending on their frequency in the population. In order to identify which CFSs are influential or significant in the occurrence of deletions and duplications (chromosomal constitutional imbalances), we propose a logistic regression analysis for the CFS data set, since the underlying response variable is categorical, specifically binary (deletion or duplication). Some results are presented here as an informative preliminary contribution to understand the frailty of these CFS in increasing/decreasing of the deletion odds. This study has implications for our comprehension of human pathogenesis.

1 Introduction

Chromosomal fragile sites (CFSs) are loci or regions susceptible to spontaneous or induced occurrence of breaks and rearrangements, see Durkin and Glover (Citation2007). They are classified in two main categories, rare and common, depending on their frequency in the population, see Speicher (Citation2010). Common fragile sites are present in all chromosomes from all individuals, while rare fragile sites are observed in only a small proportion of the population and inherited in a Mendelian manner, see Speicher (Citation2010). Although a further subdivision was initially made based on the type of inducing chemicals, recently aphidicolin was able to induce not only all types of common CFSs, but also rare fragile sites, see Mrasek et al. (Citation2010). In spite of their completely different DNA sequences, common and rare fragile sites share common characteristics, namely the formation of stable secondary DNA structures, the presence of highly flexible DNA sequences and an unfavorable nucleosome assembly, see Lukusa and Fryns (Citation2008).

Rare fragile sites, namely FRAXA, FRAXE, FRA12A, and FRA11B, have been linked to mental retardation, see Lubs, Stevenson, and Schwartz (Citation2012), Winnepenninckx et al. (Citation2007), and Debacker and Kooy (Citation2007). The mutational mechanisms involved in these rare fragile sites have been identified as nucleotide-repeat expansion, CGG or CCG, see Verkerk et al. (Citation1991), Gu et al. (Citation1996), Winnepenninckx et al. (Citation2007), and Jones et al. (Citation2000).

Common fragile sites have been largely studied due to the frequency of their expression and particular sensitivity to problems encountered during DNA replication and its consequences, see Franchitto (Citation2013). They are frequently rearranged in tumor cells and a causative role for fragile sites in the generation of cancer-specific chromosomal rearrangements has been reported, see Mangelsdorf et al. (Citation2000), Mimori et al. (Citation1999), and Burrow et al. (Citation2009). In vitro and in vivo evidence shows that rare fragile sites are associated with chromosomal rearrangements (for review see Lukusa and Fryns (Citation2008)). The latter is also given by the reports of clinical cases of fragile X syndrome due to FMR1 deletions, see Gedeon et al. (Citation1992) and Wohrle et al. (Citation1992).

The involvement of chromosomal instability associated with common fragile sites in germline rearrangements leading to human diseases was documented by the finding of deletions and duplications in FRA6E in the autosomal recessive juvenile Parkinsonism patients as well as deletions in FRAXC in patients with Duchenne and Becker muscular dystrophy, see Mitsui and Tsuji (Citation2011).

The data in study involve 4535 patients and have collected at DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources (http://decipher.sanger.ac.uk/) and we have previously found a predominance of constitutional chromosomal rearrangements in human chromosomal fragile sites, see Sequeira et al. (Citation2013). Recently, we found that FRA15C, FRA16A, and FRAXB are the most often involved CFSs in rearrangements occurring in the human genome, see Prata Gomes et al. (Citation2016).

However, the frequency of deletions and duplications could be different because duplications result from crossovers in trans, while deletions can result from crossovers both in cis and in trans, see Liu et al. (Citation2012). Thus, our main goal is to identify which CFSs are influential or significant in the occurrence of deletions and duplications (chromosomal constitutional imbalances) because this can contribute to the comprehension of the mechanisms leading to human disease. In order to achieve the study goals, logistic regression models are employed, since the underlying response variable is categorical (i.e., binary: deletion or duplication), see Hosmer et al. (Citation2013) and Agresti (Citation2002). Logistic regression is one of the most employed statistical analyses in binary outcome models especially in medical research, see Hosmer et al. (Citation1991) and Fungtammasan et al. (Citation2012). However other statistical methods have been proposed for the classification of human chromosomes such as Habbema (Citation1979), Ledley et al. (Citation1980), and Hou et al. (Citation2001). The first two have used the probability of classification in a discriminant analysis as a measure of the confidence of correct identification, whereas Hou et al. (Citation2001) has tested the randomness of breakpoints under either the proportional probability model or the equiprobability model. Yoo et al. (Citation2012) have investigated four methods—logistic regression, logic regression, classification tree, and random forests—to compare results for identifying important genes or gene–gene and gene–environmental interactions.

2 Data and methods

2.1 Data

The NCBI Map Viewer (Build 37.2) was used to determine the genomic position of fragile and non-fragile sites. DECIPHER provided us with the consented DECIPHER patient data, namely the positions of the duplicated and deleted regions (hg19). For 4535 patients, we recorded the occurrence of constitutional imbalances in human chromosomal fragile sites, specifically the number of deletions and the number of duplications out of constitutional imbalances in each one of 237 CFSs.

2.2 Descriptive analysis of data

For CFS data set, we have information of the number of deletions and the number of duplications for each CFS of the 237 CFSs related to 23 chromosomes. shows the list of CFSs by chromosome, including those CFSs with zero deletions or zero duplications and the observed deletion mean proportions, without zero deletions or zero duplications. Chromosome 11 has the highest deletion mean proportions (73%), whereas chromosome X has the lowest corresponding value (29%). Chromosomes with smaller numbers (e.g. chromosomes 1 and 3) have the higher deletion mean proportions in relation to chromosomes with larger numbers, including X. The former also has more CFSs than the latter but they also have a greater presence of zero deletions and zero duplications, apart from chromosome 11 that has the highest deletion zero presence.

Table 1: Chromosomal fragile sites (CFSs) information.

displays both the frequency of deletions and duplications (left plots) and observed deletion proportion by CFS for all 237 CFSs. There are 45 CFSs with zero deletions or zero duplications. For the remaining 192 CFSs, the numbers of deletions and duplications range (1–140) and (1–115), respectively. In addition, the deletion number has a higher mean and median than the duplication number, specifically their means are 14.34 and 12.48, while their medians are 9 and 7. In fact, the highest frequencies of deletions and duplications occur for their lowest values, while their large values are infrequent for the two variables (see left plots in ). Plotting the observed proportion of deletions against CFS, we informally noted some non-linear CFS effect in this proportion which leads us to think that the linear model would not be a good choice to fit CFS data set (see right plot in ). Besides, the deletion proportion is decreasing for that CFS ordination that corresponds from the chromosomes with smaller numbers to those with larger numbers, including chromosome X.

Figure 1: Frequency of deletions (top left) and duplications (bottom left) and observed deletion proportion by CFS (right).

Figure 1: Frequency of deletions (top left) and duplications (bottom left) and observed deletion proportion by CFS (right).

2.3 Logistic regression model

For CFS data modeling, we assume the logistic regression model focusing on the probability of deletion (π) out of chromosomal constitutional imbalance, which is deletion or duplication, in each human CFS. That model is one particular case of Generalized Linear Models, see Faraway (2106), that deal with different types of response variable (Y) e.g. categorical one for which we assume a discrete probability distribution, and different types of link function that aims to relate the response mean to the linear predictor η, formed by a set of explanatory variables, x=(x1,xp).

Formally, let Y be a random variable representing the number of successes (deletions) out of the total of occurrences n (deletions and duplications). Assume Y has a Binomial distribution with parameters n and π, where π is a function of η=xβ and β is the vector of the regression parameters associated with x, representing the CFSs with deletions or duplications. Thus, logistic regression model is defined as (1) π(x)=exp(xβ)/(1+exp(xβ)).(1)

It should be noted that, under the logistic regression model (1), the linear predictor η=xβ, which is defined on the real line, does not lead to expected values E(Y|x)=π(x) outside the interval (0,1). This would no longer be guaranteed if we had opted for other ways of liking the expected values, which is here a probability, and the linear preditor.

Logistic regression model or logit model is often conveniently defined in terms of odds or log-odds, i.e., (1) is equivalent to (2) log(odds)log(π(x)1π(x))=η=xβ.(2)

Odds are sometimes a better scale than probability to represent chance. For instance, odds equals to 3 here means that, for three occurrences of deletion, there will be one occurrence of duplication. One mathematical advantage of odds is that they are unbounded above which makes them more convenient for some modeling purposes, see Faraway (2106). Although logistic models have been widely used for binary data analysis, due mainly to a simple interpretation of their parameters, there are other models, for further details, see Hosmer et al. (Citation2013).

The statistical inference on the regression coefficient vector β can be based on the corresponding likelihood function only (frequentist approach) or adding some prior information (Bayesian approach). For simplicity, we have done a frequentist data analysis that has been implemented in most statistical software, e.g., R, see R Core Team (Citation2017).

3 Results and discussion

3.1 Statistical analysis

Taking into account that the study main goal is to identify which CFSs are influential in the occurrence of deletions and duplications, we fitted logistic regression model (2) for CFS data set, excluding the zero deletions and zero duplications CFSs. We have initially considered the saturated model (MS), i.e., the number of CFSs (192 cases) equals to the number of regression parameters, and the reduced model (MR) that is obtained from MS considering its non-significant CFSs as a reference category. It was impossible to observe repetitions at CFS level. Therefore, we are aware that our saturated model is limited namely for predicting but it is sufficient to obtain the first results in relation to our main objective.

For simplicity, we display in and only the inferential results for significant parameter for logistic regression model MR, whose Akaike Information Criterion (AIC) best-model-fit measure is 916.7 which is less than 1023.6 (MS model AIC value) (Akaike Citation1973). That means regression parameter whose the corresponding Wald test has P-value <0.05. Another information criterion is Bayesian Information Criterion (BIC) whose MR value is 1108.9 which is less than 1649.0 (MS model BIC value) according to each other. As measure of goodness of fit, deviance test with p-value = 0.061>0.05 allow us not reject MR model. We also obtain the McFadden’s pseudo-R2=0.9903 (close to 1), indicating very good predictive ability, see McFadden (Citation1974). In terms of predictive power, choosing a cutoff value 51.84% (median) and classifying all predicted values above that as predicting an event (positive case), the sensitivity is 88.53% and specificity is 32.61%. So, model MR performs reasonable for this cutoff, missing over 10% of all true positive cases. Notice that we have considered cell mean parameterization, i.e. βj, which is regression parameter of the j-th CFS, is interpreted as (3) βj=log(oddsj) or exp(βj)=oddsj,(3)

Table 2: Statistical inference for logistic regression model MR (Wald test p-value <0.05)–Part one.

Table 3: Statistical inference for logistic regression model MR (Wald test p-value <0.05)–Part two.

j=1,,192. Positive log(oddsj) (βj>0) or equivalently oddsj>1 (exp(βj)>1) indicates an increasing odds of deletions at CFS j, i.e., the probability of deletion is larger than the probability of duplication. For instance, estimate β̂1=1.02 (FRA1A) with p-value = <0.0001 of the Wald test for hypothesis H0:β1=0 points out there is evidence to conclude that CFS FRA1A is significantly influential to the odds of deletions. Moreover, exp(β̂1)=2.79 (FRA1A) means the probability of deletion is almost three times bigger than probability of duplication at CFS FRA1A.

Contrary to and , presents the inferential results of the estimated odds with the respective 95% confidence intervals for all 192 CFS based on model MS. We noticed that several CFSs have significantly contributed to increase the odds of deletion, e.g., FRA1R (odds = 13.9), FRA2K (12.1), FRA1Q (11.6), FRA18C (9.6) and FRA3K (9.3), whereas FRA12D (0.04), FRA12E (0.07), FRA8I (0.08), FRA7D (0.11), FRA4 (0.11), and FRA3J (0.11) have an decreasing contribution in the odds of deletion (i.e., increasing odds of duplication). The highest significant CFS effects in the odds of deletion are located at FRA1A,FRA3J, FRA4D, FRA9K, FRA16F, FRA19A, FRAXB, and FRAXEF. Again the first chromosomes listed/plotted have an important factor in increasing the probability of deletion, while the last ones act in increasing the probability of duplication. For fragile sites FRA1R, FRA2K, FRA3K, and FRA12D we obtained large standard errors, contributing to a large/small upper limit of the confidence interval, i.e., a low model precision when there are low empirical percentage of duplication or deletion out of aberration. Finally, a residual analysis confirmed no problem in the fit of the logistic model, whose deviance residuals ranged from–2.31 to 3.21 with a zero median.

Figure 2: Chromosomal fragile site effects in the log-odds of deletion: estimates and 95% confidence intervals (CI).

Figure 2: Chromosomal fragile site effects in the log-odds of deletion: estimates and 95% confidence intervals (CI).

3.2 Discussion

Multiple fragile sites increase the chance of deletion over duplication. These include FRA1Q, FRA1R, FRA2K, FRA3K, FRA5F, FRA5N, and FRA18C. FRA1Q is located at 1q32 and the 1q32.2-q32.3 deletion was described in patients with dysmorphic features and facial clefts due to deletion of the IRF6 gene, which is involved in the Van der Woude syndrome, see Nevado et al. (Citation2014). The FRA1R is located at 1q41 and chromosome 1q41 microdeletion give rise to seizures and developmental delay without facial dysmorphism or organ defects, see Jun et al. (Citation2013). Severe intellectual disability, omphalocele, hypospadia and high blood pressure can be due to a deletion at 2q22.1q22.3, see Mulatinho et al. (Citation2012). This deletion is co-localized with the fragile site FRA2K, see Mulatinho et al. (Citation2012). The long arm of chromosome 3, at 3q12, includes the FRA3K and can be partially deleted in patients with severe psychomotor retardation and multiple dysmorphic features, see Okada et al. (Citation1987). FRA5F is a common fragile site, aphidicolin type, located at 5q21. Deletion of chromosomal region 5q that was confined to the region 5q21-q22 was found in association with familial adenomatous polyposis, dysmorphic features, and mild mental retardation, see Raedle et al. (Citation2001). Developmental delay and dysmorphic features were described due to a deletion of the whole of band 5q23, see Rivera et al. (Citation1990), the location of FRA5N. FRA18C, was identified in the father of a patient with an 18q22-qter deletion and the Beckwith-Wiedemann syndrome, see Debacker et al. (Citation2007). This fragile site is rich in flexibility islands and AT-rich sequences, see Debacker et al. (Citation2007).

Among others, the fragile sites FRA4, FRA7D, FRA8I, FRA12D, FRA12E, and FRA3J are significant, with a higher chance of occurrence of duplications than the occurrence of deletions. FRA4 is located at 4q32 and duplication involving this band was described with discrete phenotype, see Maltby et al. (Citation1999). FRA7D is a common fragile site, aphidicolin type, located at 7p13. A duplication of the 7p13-p22.1 was described in association with psychomotor retardation, hypotonia and dysmorphic features, see Papadopoulou et al. (Citation2006). FRA8I is located at 8p11-q11 and constitutional trisomy 8p11.21-q11.21 can contribute to leukaemogenesis, see Ripperger et al. (Citation2011). FRA12D is a rare fragile site, folic acid type, located at 12q24.13 (https://www.ncbi.nlm.nih.gov/gene/2451). Genomic Duplication of PTPN11 locus at 12q24.13 is a possible cause of Noonan Syndrome. This syndrome is a genetically heterogeneous and relatively common disorder caused most frequently by activating mutations in PTPN11, see Graham et al. (Citation2009). FRA12E is a common, aphidicolin type, fragile site located at 12q24. Duplication at 12q24.1 give rise to Holt-Oram syndrome, see Kimura et al. (Citation2015), which is characterized by skeletal abnormalities of the hands and arms and heart problems (https://ghr.nlm.nih.gov/condition/holt-oram-syndrome).

This work could contribute to the study of the mechanisms involved in the chromosomal constitutional imbalances that lead to human disease. The logistic regression strategy used and the results presented demonstrate that it can discern fragile sites that can promote chromosomal deletions/rearrangements with potential very useful implications in clinical genetics and pathology as a whole. Beside the fact that most clinical outcomes are defined as binary form, logistic regression also requires less assumptions as compared to multiple linear regression or Analysis of Covariance (ANCOVA) but it must be used with caution. Namely, we concluded on the significance of the Fragile Sites based on simultaneous hypothesis tests without making a correction to these tests. Cai et al. (Citation2023) very recently developed a unified statistical inference framework for high-dimensional binary generalized linear models with general link functions. They provided important insights on the adaptivity of optimal credible intervals with respect to the sparsity of the regression parameters, framed in a genetic scenario. Unfortunately, we could not take into account that issue due to the difficulty in its computational implementation. Even the usual Bonferroni method has no effect in practice on a large number of simultaneous tests as in our case study. In addition, a simple rule of thumb as a basis for sample size estimation for logistic regression particularly for observational studies was discussed in Bujang et al. (Citation2018). At last, even in our scenario with observation total close to regression parameter total, we also plan to further study the interaction between significant fragile sites or the introduction of other covariates into the model.

Acknowledgments

This study makes use of data generated by the DECIPHER community. A full list of centers who contributed to the generation of the data is available from http://decipher.sanger.ac.uk and via email from [email protected]. We also thank two anonymous referees and the editor for their helpful comments and suggestions that have substantially improved the presentation of this article.

Data availability statement

The data that support the findings of this study are available from the corresponding author, [Prata Gomes], upon reasonable request.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is funded by national funds through the FCT - Fundação para a Ciência e a Tecnologia, I.P., under the scope of the projects UIDB/00297/2020 (https://doi.org/10.54499/UIDB/00297/2020) and UIDP/00297/2020 (https://doi.org/10.54499/UIDP/00297/2020) (Center for Mathematics and Applications), UIDB/00006/2020 (https://doi.org/10.54499/UIDB/00006/2020) and UIDP/00006/2020 (https://doi.org/10.54499/UIDP/00006/2020) (CEAUL - Centre of Statistics and its Applications) and UID/BIM/00009/2016 (Centre for Toxicogenomics and Human Health (TOXOMICS)).

References

  • Agresti A. 2002. Categorical data analysis. 2nd ed. New York: Wiley.
  • Akaike H. 1973. Information theory and an extension of the maximum likelihood principle. In Petrov BN, Csaki BF, editors. Second International Symposium on Information Theory. Budapest: Academiai Kiado. p. 267–281.
  • Bujang MA, Sa’at N, Sidik TMITAB, Joo LC. 2018. Sample size guidelines for logistic regression from observational studies with large population: emphasis on the accuracy between statistics and parameters based on real life clinical data. Malaysian J Med Sci. 25(4):122–130.
  • Burrow AA, Williams LE, Pierce LC, Wang YH. 2009. Over half of breakpoints in gene pairs involved in cancer-specific recurrent translocations are mapped to human chromosomal fragile sites. BMC Genom. 10:59.
  • Cai TT, Guo Z, Mac R. 2023. Statistical inference for high-dimensional generalized linear models with binary outcomes. J Amer Stat Assoc. 118(542):1319–1332.
  • Debacker K, Kooy RF. 2007. Fragile sites and human disease. Hum Mol Genet. 16(Spec No.2):R150–158.
  • Debacker K, Winnepenninckx B, Ben-Porat N, FitzPatrick D, Van Luijk R, Scheers S, Kerem B, Frank Kooy R. 2007. FRA18C: a new aphidicolin-inducible fragile site on chromosome 18q22, possibly associated with in vivo chromosome breakage. J Med Genet. 44(5):347–352.
  • Durkin SG, Glover TW. 2007. Chromosome fragile sites. Annu Rev Genet. 41:169–192.
  • Faraway JJ. 2016. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. 2nd ed. Boca Raton: Chapman and Hall/CRC.
  • Franchitto A. 2013. Genome instability at common fragile sites: searching for the cause of their instability. Biomed Res Int. 730714.
  • Fungtammasan A, Walsh E, Chiaromonte F, Eckert K. A, Makova KD. 2012. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome? Genome Res. 22:993–1005.
  • Gedeon AK, Baker E, Robinson H, Partington MW, Gross B, Manca A, Korn B, Poustka A, Yu S, Sutherland GR, Mulley JC. 1992. Fragile X syndrome without CCG amplification has an FMR1 deletion. Nat Genet. 1:341–344.
  • Graham JM, Kramer N, Bejjani BA, Thiel CT, Carta C, Neri G, Tartaglia M, Zenker M. 2009. Genomic duplication of PTPN11 is an uncommon cause of Noonan syndrome. Amer J Med Genet. 149A(10):2122–2128.
  • Gu Y, Shen Y, Gibbs RA, Nelson DL. 1996. Identification of FMR2, a novel gene associated with the FRAXE CCG repeat and CpG island. Nat Genet. 13(1):109–113.
  • Habbema JDF. 1979. Statistical methods for classification of human chromosomes. Biometrics 35(1):103–118.
  • Hosmer Jr, DW, Taber S, Lemeshow S. 1991. The importance of assessing the fit of logistic regression models: a case study. Amer J Public Health. 81(12):1630–1635.
  • Hosmer Jr, DW, Lemeshow S, Sturdivant RX. 2013. Applied logistic regression. 3rd ed. New York: Wiley.
  • Hou CD, Chiang J, Tai JJ. 2001. Identifying chromosomal fragile sites from a hierarchical-clustering point of view. Biometrics 57(2):435–440.
  • Jones C, Müllenbach R, Grossfeld P, Auer R, Favier R, Chien K, James M, Tunnacliffe A, Cotter F. 2000. Co-localisation of CCG repeats and chromosome deletion breakpoints in Jacobsen syndrome: evidence for a common mechanism of chromosome breakage. Hum Mol Genet. 9(8):1201–1208.
  • Jun KR, Hur YJ, Lee JN, Kim HR, Shin JH, Oh, SH, Lee JY, Seo EJ. 2013. Clinical characterization of DISP1 haploinsufficiency: a case report. Eur J Med Genet. 56(6):309–313.
  • Kimura M, Kikuchi A, Ichinoi N, Kure S. 2015. Novel TBX5 duplication in a Japanese family with Holt-Oram syndrome. Pediatr Cardiol. 36(1):244–247.
  • Ledley RS, Ing PS, Lubs HA. 1980. Human chromosome classification using discriminant analysis and Bayesian probability. Comput Biol Med. 10(4):209–218.
  • Liu P, Carvalho CM, Hastings PJ, Lupski JR. 2012. Mechanisms for recurrent and complex human genomic rearrangements. Curr Opin Genet Dev. 22:211–220.
  • Lubs HA, Stevenson RE, Schwartz CE. 2012. Fragile X and X-linked intellectual disability: four decades of discovery. Amer J Hum Genet. 90(4):579–590.
  • Lukusa T, Fryns JP. 2008. Human chromosome fragility. Biochim Biophys. Acta 1779:3–16.
  • Maltby EL, Barnes IC, Bennett CP. 1999. Duplication involving band 4q32 with minimal clinical effect. Amer J Med Genet. 83(5):431.
  • Mangelsdorf M, Ried K, Woollatt E, Dayan S, Eyre H, Finnis M, Hobson L, Nancarrow J, Venter D, Baker E, Richards RI. 2000. Chromosomal fragile site FRA16D and DNA instability in cancer. Cancer Res. 60(6):1683–1689.
  • McFadden D. 1974. Conditional logit analysis of qualitative choice behavior. In Zarembka P, editor. Frontiers in econometrics. New York: Academic Press. p. 105–142.
  • Mimori K, Druck T, Inoue H, Alder H, Berk L, Mori M, Huebner K, Croce CM. 1999. Cancer-specific chromosome alterations in the constitutive fragile region FRA3B. Proc Natl Acad Sci USA. 96(13):7456–7461.
  • Mitsui J, Tsuji S. 2011. Common chromosomal fragile sites: breakages and rearrangements in somatic and germline cells. Atlas Genet Cytogenet Oncol Haematol. http://atlasgeneticsoncology.org/Deep/ChromFragSitesID20098.html
  • Mrasek K, Schoder C, Teichmann AC, Behr K, Franze B, Wilhelm K, Blaurock N, Claussen U, Liehr T, Weise A. 2010. Global screening and extended nomenclature for 230 aphidicolin-inducible fragile sites, including 61 yet unreported ones. Int J Oncol. 36:929–940.
  • Mulatinho MV, de Carvalho Serao CL, Scalco F, Hardekopf D, Pekova S, Mrasek K, Liehr T, Weise A, Rao N, Llerena JC. 2012. Severe intellectual disability, omphalocele, hypospadia and high blood pressure associated to a deletion at 2q22.1q22.3: case report. Mol Cytogenet. 5(1):30.
  • Nevado J, Mergener R, Palomares-Bralo M, Souza KR, Vallespín, E, Mena R, Martínez-Glez V, Mori M.Á, Santos F, García-Miñaur S, García-Santiago F, Mansilla E, Fernández L, de Torres ML, Riegel M, Lapunzina P. 2014. New microdeletion and microduplication syndromes: a comprehensive review. Genet Mol Biol. 37(1 Suppl):210–219.
  • Okada N, Hasegawa T, Osawa M, Fukuyama Y. 1987. A case of de novo interstitial deletion 3q. J Med Genet. 24(5):305—308.
  • Papadopoulou E, Sifakis S, Sarri C, Gyftodimou J, Liehr T, Mrasek K, Kalmanti M, Petersen MB. 2006. A report of pure 7p duplication syndrome and review of the literature. Amer J Med Genet A. 140(24):2802–2806.
  • Prata Gomes D, Sequeira IJ, Figueiredo C, Rueff J, Brás, A. 2016. The human chromosomal fragile sites more often involved in constitutional deletions and duplications - a genetic and statistical assessment. AIP Conf Proc 1790:080003.
  • Raedle J, Friedl W, Engels H, Koenig R, Trojan J, Zeuzem S. 2001. A de novo deletion of chromosome 5q causing familial adenomatous polyposis, dysmorphic features, and mild mental retardation. Amer J Gastroenterol. 96(10):3016–3020.
  • Ripperger T, Tauscher M, Praulich I, Pabst B, Teigler-Schlegel A, Yeoh A, Göhring G, Schlegelberger B, Flotho C, Niemeyer CM, Steinemann D. 2011. Constitutional trisomy 8p11.21-q11.21 mosaicism: a germline alteration predisposing to myeloid leukaemia. Br J Haematol. 155(2):209–217.
  • Rivera H, Simi P, Rossi S, Pardelli L, Di Paolo MC. 1990. A constitutional 5q23 deletion. J Med Genet. 27(4):267–268.
  • R Core Team. 2017. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org
  • Sequeira I, Mexia J, Santiago J, Mamede R, Silva E, Santos J, Faria D, Rueff J, Brás, A. 2013. Predominance of constitutional chromosomal rearrangements in human chromosomal fragile sites. Open J Genet. 3:8–13.
  • Speicher MR. 2010. Chromosomes. In: Speicher MR, Antonarakis SE, Motulsky AG, editors. Vogel and Motulsky’s human genetics problems and approaches. Berlin, Heidelberg: Springer Verlag. p. 55–138.
  • Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, Richards S, Victoria MF, Zhang FP, et al. 1991. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65:905–914.
  • Wohrle D, Kotzot D, Hirst MC, Manca A, Korn B, Schmidt A, Barbi G, Rott HD, Poustka A, Davies KE, Steibach P. 1992. A microdeletion of less than 250 kb, including the proximal part of the FMR-I gene and the fragile-X site, in a male with the clinical phenotype of fragile X syndrome. Amer J Human Genet. 51:299–306.
  • Winnepenninckx B, Debacker K, Ramsay J, Smeets D, Smits A, FitzPatrick DR, Kooy RF. 2007. CGG-repeat expansion in the DIP2B gene is associated with the fragile site FRA12A on chromosome 12q13.1. Amer J Hum Genet. 80(2):221–231.
  • Yoo W, Ference BA, Cote ML, Schwartz A. 2012. A comparison of logistic regression, logic regression, classification tree, and random forests to identify effective gene-gene and gene-environmental interactions. Int J Appl Sci Technol. 2(7):268.