Book contents
- Frontmatter
- Contents
- List of contributors
- Foreword: The future of drug discovery and healthcare
- Acknowledgments
- 1 The art and science of the drug discovery pipeline
- 2 Computational approaches to drug target identification
- 3 Understanding human disease knowledge through text mining
- 4 Integrating translational biomarkers into drug development
- 5 Computational phenotypic assessment of small molecules in drug discovery
- 6 Data visualization and the DDP process
- 7 Information visualization – important IT considerations
- 8 Example of computational biology at the new drug application (NDA) and regulatory approval stages
- 9 Clinical trial failures and drug repositioning
- Appendix I Additional knowledge-based analysis approaches
- Appendix II Open source tools and public data sources
- Index
- Plate section
- References
3 - Understanding human disease knowledge through text mining
Published online by Cambridge University Press: 05 February 2016
- Frontmatter
- Contents
- List of contributors
- Foreword: The future of drug discovery and healthcare
- Acknowledgments
- 1 The art and science of the drug discovery pipeline
- 2 Computational approaches to drug target identification
- 3 Understanding human disease knowledge through text mining
- 4 Integrating translational biomarkers into drug development
- 5 Computational phenotypic assessment of small molecules in drug discovery
- 6 Data visualization and the DDP process
- 7 Information visualization – important IT considerations
- 8 Example of computational biology at the new drug application (NDA) and regulatory approval stages
- 9 Clinical trial failures and drug repositioning
- Appendix I Additional knowledge-based analysis approaches
- Appendix II Open source tools and public data sources
- Index
- Plate section
- References
Summary
The aim of text mining in biomedicine is to extract valuable information from large amounts of biomedical text. For this purpose it borrows techniques from fields such as natural language processing (NLP), information retrieval (IR), information extraction (IE), and artificial intelligence (AI). However, many of these techniques need to be adapted to the particularities of biomedical text, because this text possesses a unique diversity of vocabularies and writing styles, as can be seen in clinical narratives, regulatory reports, and scientific articles. For example, an NLP algorithm that recognized sentences in newspapers would need to be adjusted for biomedical text, because periods that do not separate sentences are used more frequently in biomedical text than in newspapers, which would disorient the NLP algorithm (Tomanek et al., 2007). The particular information needs in biomedicine have also led to the development of specialized text-mining techniques for extracting knowledge specific to the biomedical domain, such as, for example, molecular events, perturbations and interactions.
Pharmaceutical companies are data-intensive organizations whose success depends on their ability to efficiently process large quantities of data from internal and external sources. Much valuable knowledge is locked within textual sources such as patents, clinical records, conference abstracts, and full-text articles. The growth of these textual sources means that even experts on a subject matter cannot cope with the content appearing in their niche. For example, more than 27,000 articles mentioning diabetes were listed in PubMed during the year 2013. Text mining enables the processing of such documents within practical time frames and impacting every stage of the drug discovery pipeline.
Before the late 1990s, IR was the main research field that dealt with biomedical documents. Its main focus was on improving access to literature records from biomedical databases such as Medline, a comprehensive database of scientific abstracts managed by the US National Library of Medicine (NLM). Then, in 1996, the launch of PubMed made available the majority of Medline content online (Canese, 2006). This event was followed by an increase in research about biomedical documents with a scope broader than IR. Such research was coined “text mining” due to the emergence of data and text mining during the same period (Rodriguez-Esteban, 2008). The first publication dealing with biomedical text that used the name “text mining” came from the National Institutes of Health (NIH) in 1999 (Tanabe et al., 1999).
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2016
References
- 1
- Cited by