Open source tools and public data sources

doi:10.1017/CBO9780511989421.012

As technology evolves, the drug discovery science is quickly becoming data science rather than purely biological sciences or chemistry in the traditional sense. Scientists are generating experimental and clinical data at ever-faster paces, many of which become publicly available. How to best use these data and realize the full potential of the data becomes a continuous topic throughout the drug discovery process.

In this section, rather than provide an exhaustive list of the resources available to the scientist, we focus on the few widely used (free or commercial) data sources and analysis tools.

Literature and textual resources

1. eUtils

The Entrez Programming Utilities (E-utilities, or eUtils) are a set of nine server-side programs that provide a stable interface into the Entrez query and database system at the National Center for Biotechnology Information (NCBI). The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. The E-utilities are therefore the structured interface to the Entrez system, which currently includes 38 databases covering a variety of biomedical data, including nucleotide and protein sequences, gene records, three-dimensional molecular structures, and the biomedical literature.

2. EMBASE

With extensive international journal and conference coverage, Embase is a key resource for generating systematic reviews, supporting effective evidence-based medicine and drug and medical device tracking. Embase facilitates the clinical decision-making process and allows you to get to market faster, while still ensuring the required drug safety and pharmacovigilance.

3. MeSH

MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.

Genomics resources

1. OMIM

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. Its official home is omim.org.

2. dbVar

A database of genomic structural variation including insertions, deletions, duplications, inversions, deletion-insertions, mobile element insertions, translocations, and complex rearrangements.

3. GEO

GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles.

Book contents

Appendix II - Open source tools and public data sources

Summary

Access options

Book contents

Appendix II - Open source tools and public data sources

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive