Cambridge Catalogue  
  • Help
Home > Catalogue > The Text Mining Handbook
The Text Mining Handbook
Google Book Search

Search this book

Details

  • Page extent: 422 pages
  • Size: 253 x 177 mm
  • Weight: 0.898 kg

Library of Congress

  • Dewey number: 005.74
  • Dewey version: 22
  • LC Classification: QA76.9.D343 F45 2006
  • LC Subject headings:
    • Data mining--Handbooks, manuals, etc

Library of Congress Record

Hardback

 (ISBN-13: 9780521836579)




Contents

Preface page x
I.   Introduction to Text Mining 1
     I.1  Defining Text Mining 1
     I.2  General Architecture of Text Mining Systems 13
II.   Core Text Mining Operations 19
     II.1  Core Text Mining Operations 19
     II.2  Using Background Knowledge for Text Mining 42
     II.3  Text Mining Query Languages 51
III.   Text Mining Preprocessing Techniques 57
     III.1  Task-Oriented Approaches 58
     III.2  Further Reading 62
IV.   Categorization 64
     IV.1  Applications of Text Categorization 65
     IV.2  Definition of the Problem 66
     IV.3  Document Representation 68
     IV.4  Knowledge Engineering Approach to TC 70
     IV.5  Machine Learning Approach to TC 70
     IV.6  Using Unlabeled Data to Improve Classification 78
     IV.7  Evaluation of Text Classifiers 79
     IV.8  Citations and Notes 80
V.   Clustering 82
     V.1  Clustering Tasks in Text Analysis 82
     V.2  The General Clustering Problem 84
     V.3  Clustering Algorithms 85
     V.4  Clustering of Textual Data 88
     V.5  Citations and Notes 92
VI.   Information Extraction 94
     VI.1  Introduction to Information Extraction 94
     VI.2  Historical Evolution of IE: The Message Understanding Conferences and Tipster 96
     VI.3  IE Examples 101
     VI.4  Architecture of IE Systems 104
     VI.5  Anaphora Resolution 109
     VI.6  Inductive Algorithms for IE 119
     VI.7  Structural IE 122
     VI.8  Further Reading 129
VII.   Probabilistic Models for Information Extraction 131
     VII.1  Hidden Markov Models 131
     VII.2  Stochastic Context-Free Grammars 137
     VII.3  Maximal Entropy Modeling 138
     VII.4  Maximal Entropy Markov Models 140
     VII.5  Conditional Random Fields 142
     VII.6  Further Reading 145
VIII.   Preprocessing Applications Using Probabilistic and Hybrid Approaches 146
     VIII.1  Applications of HMM to Textual Analysis 146
     VIII.2  Using MEMM for Information Extraction 152
     VIII.3  Applications of CRFs to Textual Analysis 153
     VIII.4  TEG: Using SCFG Rules for Hybrid Statistical–Knowledge-Based IE 155
     VIII.5  Bootstrapping 166
     VIII.6  Further Reading 175
IX.   Presentation-Layer Considerations for Browsing and Query Refinement 177
     IX.1  Browsing 177
     IX.2  Accessing Constraints and Simple Specification Filters at the Presentation Layer 185
     IX.3  Accessing the Underlying Query Language 186
     IX.4  Citations and Notes 187
X.   Visualization Approaches 189
     X.1  Introduction 189
     X.2  Architectural Considerations 192
     X.3  Common Visualization Approaches for Text Mining 194
     X.4  Visualization Techniques in Link Analysis 226
     X.5  Real-World Example: The Document Explorer System 237
XI.   Link Analysis 244
     XI.1  Preliminaries 244
     XI.2  Automatic Layout of Networks 246
     XI.3  Paths and Cycles in Graphs 250
     XI.4  Centrality 251
     XI.5  Partitioning of Networks 259
     XI.6  Pattern Matching in Networks 272
     XI.7  Software Packages for Link Analysis 273
     XI.8  Citations and Notes 274
XII.   Text Mining Applications 275
     XII.1  General Considerations 276
     XII.2  Corporate Finance: Mining Industry Literature for Business Intelligence 281
     XII.3  A “Horizontal” Text Mining Application: Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 297
     XII.4  Life Sciences Research: Mining Biological Pathway Information with GeneWays 309
Appendix A: DIAL: A Dedicated Information Extraction Language for Text Mining 317
     A.1  What Is the DIAL Language? 317
     A.2  Information Extraction in the DIAL Environment 318
     A.3  Text Tokenization 320
     A.4  Concept and Rule Structure 320
     A.5  Pattern Matching 322
     A.6  Pattern Elements 323
     A.7  Rule Constraints 327
     A.8  Concept Guards 328
     A.9  Complete DIAL Examples 329
Bibliography 337
Index 391

printer iconPrinter friendly versionemail iconEmail a colleague AddThis