137
Views
13
CrossRef citations to date
0
Altmetric
Reviews

Mining large heterogeneous data sets in drug discovery

Pages 995-1004 | Published online: 28 Aug 2009

Bibliography

  • Milne GM. Pharmaceutical productivity – the imperative for new paradigms. Annu Rep Med Chem 2003;38:383-96
  • Slater T, Bouton C, Huang ES. Beyond data integration. Drug Discov Today 2008;13:584-9
  • Williams AJ. A perspective of publicly accessible/open-access chemistry databases. Drug Discov Today 2008;13:495-501
  • Williams AJ. Public chemical compound databases. Curr Opin Drug Discov Devel 2008;11:393-404
  • Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 2008;4:217-41
  • PubChem substance data source information. Available from: http://pubchem.ncbi.nlm.nih.gov/sources/sources.cgi. [Last accessed 16 June 2009]
  • Irwin JJ, Shoichet BK. ZINC – a free database of commercially available compounds for virtual screening. J Chem Inf Model 2005;45:177-82
  • ChemSpider. Available from: http://www.chemspider.com. [Last accessed 16 June 2009]
  • eMolecules. Available from: http://www.emolecules.com. [Last accessed 16 June 2009]
  • Apweiler R, Bairoch A, Wu CH. Protein sequence databases. Curr Opin Chem Biol 2004;8:76-80
  • Uniprot Consortium. The universal protein resource. Nucleic Acids Res 2007;35:193-7
  • Uniprot. Available from: http://www.uniprot.org. [Last accessed 22 June 2009]
  • Bernstein FC, Koetzle TF, Williams GJB, et al. The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535-42
  • Research Collaboratory for Structural Bioinformatics. Available from: http://home.rcsb.org. [Last accessed 22 June 2009]
  • The protein data bank. Available from: http://www.pdb.org. [Last accessed 22 June 2009]
  • Schaefer CF. Pathway databases. Ann NY Acad Sci 2004;1020:77-91
  • Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res 2008;36:480-4
  • Kanehisa M, Goto S, Hattori M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006;34:354-7
  • Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30
  • Karp PD, Ouzounis CA, Moore-Kochlacs C, et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 2005;19:6083-9
  • Cochrane G, Akhtar R, Bonfield J, et al. Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res 2009;37(Database issue):D19-25
  • Benson DA, Karsch-Mizrachi I, Lipman DJ, et al. GenBank. Nucleic Acids Res 2008;36:25-30
  • NCBI nucleotide databases. Available from: http://www.ncbi.nlm.nih.gov/About/tools/restable_nuc.html. [Last accessed 29 June 2009]
  • Dalkikic M, Costello J, Clark WT, Radivojac P. From protein-disease associations to disease informatics. Front Biosci 2008;13:3391-407
  • Radivojac P, Peng K, Clark WT, et al. An integrated approach to inferring gene-disease associations in humans. Proteins Struct Funct Bioinform 2008;72:1030-7
  • Disease gene database. Available from: http://www.proteinlounge.com/ disease_proteins.asp. [Last accessed 26 June 2009]
  • Thorisson GA, Muilu J, Brookes AJ. Genotype–phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009;10:9-18
  • Kaharaman A, Avramov A, Nashev LG, et al. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics 2005;21:418-20
  • Wild DJ, Beckman R. The future of chemical information searching. In: Banville D, editor, Chemical information mining: facilitating literature-based discovery. CRC Press; 2008
  • PubMed. Available from: http://www.ncbi.nlm.nih.gov/pubmed. [Last accessed 22 June 2009]
  • PubMed central. Available from: http://pubmedcentral.nih.gov. [Last accessed 22 June 2009]
  • NIH public access poicy. Available from: http://publicaccess.nih.gov. [Last accessed 22 June 2009]
  • Han J, Kamber M. Data mining: concepts and techniques. 1st edition. Morgan Kaufmann; 2000
  • Wang H, Klinginsmith J, Dong X, et al. Chemical data mining of the NCI human tumor cell line database. J Chem Inf Model 2007;47(6):2063-76
  • Brown N. Chemoinformatics – an introduction for computer scientists. ACM Comput Surv 2009;41:2
  • Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-402
  • MacQueen JB. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability 1967;281-97
  • Kaufman L, Rousseeuw PJ. Findings in groups of data: an introduction to cluster analysis. John Wiley; 1990
  • Ng RT, Han J. Efficient and effective clustering methods for spatial data mining. 1994 International Conference Very Large Data Bases (VLDB'94), Santiago, Chile 1994:144-55
  • Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering methods for very large databases; 1996 ACM-SIGMOD International Conference Management of Data (SIGMOD'96), Montreal, Canada. 1996:103-14
  • Downs GM, Barnard JM. Clustering methods and their uses in computational chemistry. Rev Comput Chem 2002;18:1-40
  • Guha S, Rastogi R, Shim K. Cure: an efficient clustering algorithm for large databases. 1996 ACM-SIGMOD International Conference Management of Data (SIGMOD'96) Seattle, WA. 1998;73-84
  • Karypsis G, Han EH, Kumar V. CHAMELEON: a hierarchical clustering alorithm using dynamic modeling. Computer 1999;68-75
  • Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases. 1996 International Conference of Knowledge Discovery and Data Mining (KDD'97), Portland OR. 1996:226-31
  • Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: ordering points to identify the clustering structure. 1999 ACM-SIGMOD International Conference Management of Data (SIGMOD'99) Philadelphia, PA. 1999:49-60
  • Hoschka P, Klogsen W. A support system for interpreting statistical data. In knowledge discovery in databases. AIAA/MIT Press; 1991:325-46
  • Wang W, Yang J, Muntz RR. STING: a statistical information grid approach to spatial data mining; 1997 International Conference of Very Large Databases (VLDB'97), Athens, Greece. 1997:186-95
  • Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: a multi-resolution clustering approach for very large spatial databases; 1998 International Conference of Very Large Data Bases (VLDB'98), New York. 1998:428-39
  • Agrawa R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. 1998 ACM-SIGMOD International Conference Management of Data (SIGMOD'98), Seattle, WA. 1998:94-105
  • Wild DJ, Blankley CJ. Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward's clustering. J Chem Inf Comput Sci 2000;40:155-62
  • Wild DJ, Blankley CJ. VisualiSAR: a web-based application for clustering, structure browsing and SAR study. J Mol Graph Model 1999;17:85-9
  • Brown RD, Martin YC. Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci 1996;36:572-84
  • Sturn A, Quackenbush J, Trajanoski Z. Genesis: cluster analysis of microarray data. Bioinformatics 2002;18:207-8
  • Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. PNAS 1998;25:14863-8
  • Witten IH, Eibe F. Data mining: practical machine learning tools and techniques. second edition. Morgan Kaufmann; 2006
  • Carlin BP, Louis TA. Bayesian methods for data analysis. third edition. Chapman & Hall/CRC; 2008
  • Liaw A, Wiener M. Classification and regression by randomforest. R News 2002;2(3):18-22
  • Rusinko III A, Farmen MW, Lambert CG, et al. Analysis of a large structure/biological activity data set using recursive partitioning. J Chem Inf Comput Sci 1999;39(6):1017-26
  • Ihaka R, Gentleman RR. A language for data analysis and graphics. J Comput Graphical Stat 1996;5:299-314
  • SPSS. Avilable from: http://www.spss.com. [Last accessed 6 July 2009]
  • Agrawal R, Imielinski R, Swami A. Mining association rules between sets of items in large databases, ACM-SIGMOD International Conference Management of Data (SIGMOD'93). 1993;207-16
  • Kersey P, Apweiler R. Linking publication, gene and protein data. Nat Cell Biol 2006;8:11
  • Durinck S, Moreau Y, Kasprzyk A, et al. BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 2005;21:3439-40
  • Xtractor: data mining simplified. Available from: http://www.xtractor.in. [Last accessed 22 June 2009]
  • Belleau F, Nolin M, Tourigny N, et al. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008;03:706-16
  • Royal society of chemistry project prospect. Available from: http://www.projectprospect.org. [Last accessed 22 June 2009]
  • Wishart DS, Knox C, Guo AC, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 2008;36 (Database issue):D901-6
  • Schreyer A, Blundell T. CREDO: a protein-ligand interaction database for drug discovery. Chem Biol Drug Des 2009;73(2):157-67
  • Günther S, Kuhn M, Dunkel M, et al. SuperTarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res 2008;36 (Database issue):D919-22
  • Berners-Lee T, Hendler J, Lassila O. The semantic web. Sci Am 2001;284(5):34-43
  • Hendler J, Berners-lee T, Miller E. Integrating applications on the semantic web. J Inst Electrical Eng Japan 2002;122(10):676-80
  • XML. Available from: http://www.w3.org/XML. [Last accessed 4 August 2009]
  • Murray-Rust P, Rzepa HS. Chemical markup language and XML part I. Basic principles. J Chem Inf Comput Sci 1999;39(6):928-42
  • Hucka M, Finney A, Sauro HM, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003;19(4):524-31
  • XML subsets for the life sciences. Available from: http://www.visualgenomics.ca/gordonp/xml/. [Last accessed 4 August 2009]
  • OWL. Available from: http://www.w3.org/TR/OWL-guide. [Last accessed 4 August 2009]
  • RDF. Available from: http://www.w3.org/RDF. [Last accessed 4 August 2009]
  • Gardner SP. Ontologies and semantic data integration. Drug Discov Today 2005;10(14):1001-7
  • RSS guide. Available from: http://www.xml.com/pub/a/2002;/12/ 18/dive-into-xml.html. [Last accessed 4 August 2009]
  • Murray-Rust P, Rzepa HS. Towards the chemical semantic web. An introduction to RSS. Internet J Chem 2003;6:Article 4
  • WSDL. Available from: http://www.w3.org/TR/WSDL. [Last accessed 4 August 2009]
  • SOAP. Available from: http://www.w3.org/TR/SOAP. [Last accessed 4 August 2009]
  • REST wiki. Available from: http://rest.blueoxen.net/cgi-bin/wiki.pl. [Last accessed 4 August 2009]
  • UDDI. Available from: http://uddi.xml.org. [Last accessed 4 August 2009]
  • Hendler J. Is there an intelligent agent in your future? Nat Web Matters 1999
  • Wooldridge M, Jennings N. Intelligent agents: theory and practice. Knowledge Eng Rev 1995;10(2)
  • An inference engine for RDF. Available from: http://www.agfa.com/w3c/2002/02/thesis/An_inference_engine_for_RDF.html. [Last accessed 4 August 2009]
  • Dong X, Gilbert KE, Guha R, et al. Web service infrastructure for chemoinformatics, J Chem Inf. Model 2007;47:1303-7
  • Hur J, Wild DJ. PubChemSR: a search and retrieval tool for PubChem. Chem Cent J 2008;2:11
  • Willighagen E, O'Boyle NM, Gopalakrishnan H, et al. Userscripts for the life sciences. BMC Bioinformatics 2007;8:487
  • Torrey Path. Available from: http://www.torreypath.com. [Last accessed 26 June 2009]
  • Wild DJ. Strategies for using information effectively in early-stage drug discovery. In: Ekins S editor, Computer applications in pharmaceutical research and development. Wiley-Interscience, Hoboken; 2006
  • Hassan M, Brown RD, Varma-O'Brien S, Rogers D. Cheminformatics analysis and learning in a data pipelining environment. Mol Divers 2006;10:283-99
  • Berthold MR, Cebron N, Dill F, et al. KNIME: The Konstanz Information Miner. In: Preisach, et al, editor, Data analysis, machine learning and applications. Springer; 2008
  • Hull D, Wolstencroft K, Stevens R, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004;20:3045-54
  • Inforsense. Available from: http://www.inforsense.com. [Last accessed 6 July 2009]
  • Dong X, Wild DJ. An automatic drug discovery workflow generation tool using semantic web technologies. Proceedings of the 4th IEEE conference on eScience. 2008:652-7
  • Wild DJ. Grand challenges for cheminformatics. J Cheminformatics 2009;1:1

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.