Search in:

Expert Opinion on Drug Discovery Volume 4, 2009 - Issue 10

Submit an article Journal homepage

137

Views

CrossRef citations to date

Altmetric

Reviews

Mining large heterogeneous data sets in drug discovery

David J Wild Director of Cheminformatics Program, Assistant Professor of Informatics, Indiana Universtiy, School of Informatics and Computing, 901 E. 10th St., Bloomington, IN 47408, USA +1 812 856 1848; +1 608 541 5402; [email protected]

Pages 995-1004 | Published online: 28 Aug 2009

Cite this article
https://doi.org/10.1517/17460440903233738

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

Bibliography

Milne GM. Pharmaceutical productivity – the imperative for new paradigms. Annu Rep Med Chem 2003;38:383-96
Web of Science ®Google Scholar
Slater T, Bouton C, Huang ES. Beyond data integration. Drug Discov Today 2008;13:584-9
PubMed Web of Science ®Google Scholar
Williams AJ. A perspective of publicly accessible/open-access chemistry databases. Drug Discov Today 2008;13:495-501
PubMed Web of Science ®Google Scholar
Williams AJ. Public chemical compound databases. Curr Opin Drug Discov Devel 2008;11:393-404
PubMed Web of Science ®Google Scholar
Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 2008;4:217-41
Google Scholar
PubChem substance data source information. Available from: http://pubchem.ncbi.nlm.nih.gov/sources/sources.cgi. [Last accessed 16 June 2009]
Google Scholar
Irwin JJ, Shoichet BK. ZINC – a free database of commercially available compounds for virtual screening. J Chem Inf Model 2005;45:177-82
PubMed Web of Science ®Google Scholar
ChemSpider. Available from: http://www.chemspider.com. [Last accessed 16 June 2009]
Google Scholar
eMolecules. Available from: http://www.emolecules.com. [Last accessed 16 June 2009]
Google Scholar
Apweiler R, Bairoch A, Wu CH. Protein sequence databases. Curr Opin Chem Biol 2004;8:76-80
PubMed Web of Science ®Google Scholar
Uniprot Consortium. The universal protein resource. Nucleic Acids Res 2007;35:193-7
PubMed Web of Science ®Google Scholar
Uniprot. Available from: http://www.uniprot.org. [Last accessed 22 June 2009]
Google Scholar
Bernstein FC, Koetzle TF, Williams GJB, et al. The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535-42
PubMed Web of Science ®Google Scholar
Research Collaboratory for Structural Bioinformatics. Available from: http://home.rcsb.org. [Last accessed 22 June 2009]
Google Scholar
The protein data bank. Available from: http://www.pdb.org. [Last accessed 22 June 2009]
Google Scholar
Schaefer CF. Pathway databases. Ann NY Acad Sci 2004;1020:77-91
PubMed Web of Science ®Google Scholar
Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res 2008;36:480-4
PubMed Web of Science ®Google Scholar
Kanehisa M, Goto S, Hattori M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006;34:354-7
PubMed Web of Science ®Google Scholar
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30
PubMed Web of Science ®Google Scholar
Karp PD, Ouzounis CA, Moore-Kochlacs C, et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 2005;19:6083-9
Google Scholar
Cochrane G, Akhtar R, Bonfield J, et al. Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res 2009;37(Database issue):D19-25
Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ, et al. GenBank. Nucleic Acids Res 2008;36:25-30
PubMed Web of Science ®Google Scholar
NCBI nucleotide databases. Available from: http://www.ncbi.nlm.nih.gov/About/tools/restable_nuc.html. [Last accessed 29 June 2009]
Google Scholar
Dalkikic M, Costello J, Clark WT, Radivojac P. From protein-disease associations to disease informatics. Front Biosci 2008;13:3391-407
PubMed Web of Science ®Google Scholar
Radivojac P, Peng K, Clark WT, et al. An integrated approach to inferring gene-disease associations in humans. Proteins Struct Funct Bioinform 2008;72:1030-7
PubMed Web of Science ®Google Scholar
Disease gene database. Available from: http://www.proteinlounge.com/ disease_proteins.asp. [Last accessed 26 June 2009]
Google Scholar
Thorisson GA, Muilu J, Brookes AJ. Genotype–phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009;10:9-18
PubMed Web of Science ®Google Scholar
Kaharaman A, Avramov A, Nashev LG, et al. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics 2005;21:418-20
PubMed Web of Science ®Google Scholar
Wild DJ, Beckman R. The future of chemical information searching. In: Banville D, editor, Chemical information mining: facilitating literature-based discovery. CRC Press; 2008
Google Scholar
PubMed. Available from: http://www.ncbi.nlm.nih.gov/pubmed. [Last accessed 22 June 2009]
Google Scholar
PubMed central. Available from: http://pubmedcentral.nih.gov. [Last accessed 22 June 2009]
Google Scholar
NIH public access poicy. Available from: http://publicaccess.nih.gov. [Last accessed 22 June 2009]
Google Scholar
Han J, Kamber M. Data mining: concepts and techniques. 1st edition. Morgan Kaufmann; 2000
Google Scholar
Wang H, Klinginsmith J, Dong X, et al. Chemical data mining of the NCI human tumor cell line database. J Chem Inf Model 2007;47(6):2063-76
PubMed Web of Science ®Google Scholar
Brown N. Chemoinformatics – an introduction for computer scientists. ACM Comput Surv 2009;41:2
Google Scholar
Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389-402
PubMed Web of Science ®Google Scholar
MacQueen JB. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability 1967;281-97
Google Scholar
Kaufman L, Rousseeuw PJ. Findings in groups of data: an introduction to cluster analysis. John Wiley; 1990
Google Scholar
Ng RT, Han J. Efficient and effective clustering methods for spatial data mining. 1994 International Conference Very Large Data Bases (VLDB'94), Santiago, Chile 1994:144-55
Google Scholar
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering methods for very large databases; 1996 ACM-SIGMOD International Conference Management of Data (SIGMOD'96), Montreal, Canada. 1996:103-14
Google Scholar
Downs GM, Barnard JM. Clustering methods and their uses in computational chemistry. Rev Comput Chem 2002;18:1-40
Web of Science ®Google Scholar
Guha S, Rastogi R, Shim K. Cure: an efficient clustering algorithm for large databases. 1996 ACM-SIGMOD International Conference Management of Data (SIGMOD'96) Seattle, WA. 1998;73-84
Google Scholar
Karypsis G, Han EH, Kumar V. CHAMELEON: a hierarchical clustering alorithm using dynamic modeling. Computer 1999;68-75
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases. 1996 International Conference of Knowledge Discovery and Data Mining (KDD'97), Portland OR. 1996:226-31
Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: ordering points to identify the clustering structure. 1999 ACM-SIGMOD International Conference Management of Data (SIGMOD'99) Philadelphia, PA. 1999:49-60
Google Scholar
Hoschka P, Klogsen W. A support system for interpreting statistical data. In knowledge discovery in databases. AIAA/MIT Press; 1991:325-46
Google Scholar
Wang W, Yang J, Muntz RR. STING: a statistical information grid approach to spatial data mining; 1997 International Conference of Very Large Databases (VLDB'97), Athens, Greece. 1997:186-95
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: a multi-resolution clustering approach for very large spatial databases; 1998 International Conference of Very Large Data Bases (VLDB'98), New York. 1998:428-39
Google Scholar
Agrawa R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. 1998 ACM-SIGMOD International Conference Management of Data (SIGMOD'98), Seattle, WA. 1998:94-105
Google Scholar
Wild DJ, Blankley CJ. Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward's clustering. J Chem Inf Comput Sci 2000;40:155-62
PubMedGoogle Scholar
Wild DJ, Blankley CJ. VisualiSAR: a web-based application for clustering, structure browsing and SAR study. J Mol Graph Model 1999;17:85-9
PubMed Web of Science ®Google Scholar
Brown RD, Martin YC. Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci 1996;36:572-84
Google Scholar
Sturn A, Quackenbush J, Trajanoski Z. Genesis: cluster analysis of microarray data. Bioinformatics 2002;18:207-8
PubMed Web of Science ®Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. PNAS 1998;25:14863-8
Google Scholar
Witten IH, Eibe F. Data mining: practical machine learning tools and techniques. second edition. Morgan Kaufmann; 2006
Google Scholar
Carlin BP, Louis TA. Bayesian methods for data analysis. third edition. Chapman & Hall/CRC; 2008
Google Scholar
Liaw A, Wiener M. Classification and regression by randomforest. R News 2002;2(3):18-22
Google Scholar
Rusinko III A, Farmen MW, Lambert CG, et al. Analysis of a large structure/biological activity data set using recursive partitioning. J Chem Inf Comput Sci 1999;39(6):1017-26
PubMedGoogle Scholar
Ihaka R, Gentleman RR. A language for data analysis and graphics. J Comput Graphical Stat 1996;5:299-314
Google Scholar
SPSS. Avilable from: http://www.spss.com. [Last accessed 6 July 2009]
Google Scholar
Agrawal R, Imielinski R, Swami A. Mining association rules between sets of items in large databases, ACM-SIGMOD International Conference Management of Data (SIGMOD'93). 1993;207-16
Google Scholar
Kersey P, Apweiler R. Linking publication, gene and protein data. Nat Cell Biol 2006;8:11
PubMed Web of Science ®Google Scholar
Durinck S, Moreau Y, Kasprzyk A, et al. BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 2005;21:3439-40
PubMed Web of Science ®Google Scholar
Xtractor: data mining simplified. Available from: http://www.xtractor.in. [Last accessed 22 June 2009]
Google Scholar
Belleau F, Nolin M, Tourigny N, et al. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008;03:706-16
Google Scholar
Royal society of chemistry project prospect. Available from: http://www.projectprospect.org. [Last accessed 22 June 2009]
Google Scholar
Wishart DS, Knox C, Guo AC, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 2008;36 (Database issue):D901-6
PubMed Web of Science ®Google Scholar
Schreyer A, Blundell T. CREDO: a protein-ligand interaction database for drug discovery. Chem Biol Drug Des 2009;73(2):157-67
PubMed Web of Science ®Google Scholar
Günther S, Kuhn M, Dunkel M, et al. SuperTarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res 2008;36 (Database issue):D919-22
Google Scholar
Berners-Lee T, Hendler J, Lassila O. The semantic web. Sci Am 2001;284(5):34-43
PubMed Web of Science ®Google Scholar
Hendler J, Berners-lee T, Miller E. Integrating applications on the semantic web. J Inst Electrical Eng Japan 2002;122(10):676-80
Google Scholar
XML. Available from: http://www.w3.org/XML. [Last accessed 4 August 2009]
Google Scholar
Murray-Rust P, Rzepa HS. Chemical markup language and XML part I. Basic principles. J Chem Inf Comput Sci 1999;39(6):928-42
Google Scholar
Hucka M, Finney A, Sauro HM, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003;19(4):524-31
PubMed Web of Science ®Google Scholar
XML subsets for the life sciences. Available from: http://www.visualgenomics.ca/gordonp/xml/. [Last accessed 4 August 2009]
Google Scholar
OWL. Available from: http://www.w3.org/TR/OWL-guide. [Last accessed 4 August 2009]
Google Scholar
RDF. Available from: http://www.w3.org/RDF. [Last accessed 4 August 2009]
Google Scholar
Gardner SP. Ontologies and semantic data integration. Drug Discov Today 2005;10(14):1001-7
PubMed Web of Science ®Google Scholar
RSS guide. Available from: http://www.xml.com/pub/a/2002;/12/ 18/dive-into-xml.html. [Last accessed 4 August 2009]
Google Scholar
Murray-Rust P, Rzepa HS. Towards the chemical semantic web. An introduction to RSS. Internet J Chem 2003;6:Article 4
Google Scholar
WSDL. Available from: http://www.w3.org/TR/WSDL. [Last accessed 4 August 2009]
Google Scholar
SOAP. Available from: http://www.w3.org/TR/SOAP. [Last accessed 4 August 2009]
Google Scholar
REST wiki. Available from: http://rest.blueoxen.net/cgi-bin/wiki.pl. [Last accessed 4 August 2009]
Google Scholar
UDDI. Available from: http://uddi.xml.org. [Last accessed 4 August 2009]
Google Scholar
Hendler J. Is there an intelligent agent in your future? Nat Web Matters 1999
Google Scholar
Wooldridge M, Jennings N. Intelligent agents: theory and practice. Knowledge Eng Rev 1995;10(2)
Google Scholar
An inference engine for RDF. Available from: http://www.agfa.com/w3c/2002/02/thesis/An_inference_engine_for_RDF.html. [Last accessed 4 August 2009]
Google Scholar
Dong X, Gilbert KE, Guha R, et al. Web service infrastructure for chemoinformatics, J Chem Inf. Model 2007;47:1303-7
Google Scholar
Hur J, Wild DJ. PubChemSR: a search and retrieval tool for PubChem. Chem Cent J 2008;2:11
PubMedGoogle Scholar
Willighagen E, O'Boyle NM, Gopalakrishnan H, et al. Userscripts for the life sciences. BMC Bioinformatics 2007;8:487
PubMedGoogle Scholar
Torrey Path. Available from: http://www.torreypath.com. [Last accessed 26 June 2009]
Google Scholar
Wild DJ. Strategies for using information effectively in early-stage drug discovery. In: Ekins S editor, Computer applications in pharmaceutical research and development. Wiley-Interscience, Hoboken; 2006
Google Scholar
Hassan M, Brown RD, Varma-O'Brien S, Rogers D. Cheminformatics analysis and learning in a data pipelining environment. Mol Divers 2006;10:283-99
PubMed Web of Science ®Google Scholar
Berthold MR, Cebron N, Dill F, et al. KNIME: The Konstanz Information Miner. In: Preisach, et al, editor, Data analysis, machine learning and applications. Springer; 2008
Google Scholar
Hull D, Wolstencroft K, Stevens R, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004;20:3045-54
PubMed Web of Science ®Google Scholar
Inforsense. Available from: http://www.inforsense.com. [Last accessed 6 July 2009]
Google Scholar
Dong X, Wild DJ. An automatic drug discovery workflow generation tool using semantic web technologies. Proceedings of the 4th IEEE conference on eScience. 2008:652-7
Google Scholar
Wild DJ. Grand challenges for cheminformatics. J Cheminformatics 2009;1:1
PubMedGoogle Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Mining large heterogeneous data sets in drug discovery

Bibliography

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Mining large heterogeneous data sets in drug discovery

Bibliography

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date