Book contents
- Frontmatter
- Content
- Acknowledgements
- 1 Introduction
- 2 What is a thesaurus?
- 3 Tools for subject access and retrieval
- 4 What a thesaurus is used for
- 5 Why use a thesaurus?
- 6 Types of thesaurus
- 7 The format of a thesaurus
- 8 Building a thesaurus 1: vocabulary collection
- 9 Vocabulary control 1: selection of terms
- 10 Vocabulary control 2: form of entry
- 11 Building a thesaurus 2: term extraction from document titles
- 12 Building a thesaurus 3: vocabulary analysis
- 13 The thesaural relationships
- 14 Building a thesaurus 4: introducing internal structure
- 15 Building a thesaurus 5: imposing hierarchy
- 16 Building a thesaurus 6: compound subjects and citation order
- 17 Building a thesaurus 7: conversion of the taxonomy to alphabetical format
- 18 Building a thesaurus 8: creating the thesaurus records
- 19 Managing and maintaining the thesaurus: thesaurus software
- 20 Conclusion
- Glossary
- Bibliography
- Appendix 1 Sample titles for thesaurus vocabulary
- Appendix 2 Sample terms for the thesaurus
- Appendix 3 Facets at stage 1 of analysis
- Appendix 4 Facets at stage 2 of analysis
- Appendix 5 Completed systematic display
- Appendix 6 Thesaurus entries for sample page
- Index
5 - Why use a thesaurus?
Published online by Cambridge University Press: 09 June 2018
- Frontmatter
- Content
- Acknowledgements
- 1 Introduction
- 2 What is a thesaurus?
- 3 Tools for subject access and retrieval
- 4 What a thesaurus is used for
- 5 Why use a thesaurus?
- 6 Types of thesaurus
- 7 The format of a thesaurus
- 8 Building a thesaurus 1: vocabulary collection
- 9 Vocabulary control 1: selection of terms
- 10 Vocabulary control 2: form of entry
- 11 Building a thesaurus 2: term extraction from document titles
- 12 Building a thesaurus 3: vocabulary analysis
- 13 The thesaural relationships
- 14 Building a thesaurus 4: introducing internal structure
- 15 Building a thesaurus 5: imposing hierarchy
- 16 Building a thesaurus 6: compound subjects and citation order
- 17 Building a thesaurus 7: conversion of the taxonomy to alphabetical format
- 18 Building a thesaurus 8: creating the thesaurus records
- 19 Managing and maintaining the thesaurus: thesaurus software
- 20 Conclusion
- Glossary
- Bibliography
- Appendix 1 Sample titles for thesaurus vocabulary
- Appendix 2 Sample terms for the thesaurus
- Appendix 3 Facets at stage 1 of analysis
- Appendix 4 Facets at stage 2 of analysis
- Appendix 5 Completed systematic display
- Appendix 6 Thesaurus entries for sample page
- Index
Summary
In the early development of the thesaurus, when it was essentially still a keyword list, the keywords or terms were usually drawn from the documents to be indexed. Later on this casual approach to term selection and use was abandoned in favour of much more control over the vocabulary. This was because there are some considerable advantages in using a controlled indexing language rather than ‘uncontrolled’ or natural language.
Natural language indexing means the selection and assignment of indexing terms, usually taken from the titles or text of the documents themselves, without any reference to a standard list of terms. The indexer chooses whatever seems appropriate for the material in hand (or in automatic indexing systems text analysis software identifies important terms based on their frequency and position, terms in the title or near the beginning of the document scoring more highly than others). Natural language can be seen to have several advantages: it is easy to use; no one has to be trained to use it; no one has to spend time compiling or maintaining an indexing vocabulary; the indexing terms match closely the vocabulary of the subject; new terms will be adopted naturally as they occur in the literature, and out-dated terms will equally naturally fall out of use; and there is reason to believe that the terms will match those chosen by searchers, particularly in technical or research literature.
Nevertheless, these advantages can be more than offset by the disadvantages of natural language use, and the effect it has on the efficiency of retrieval. For most people, the most familiar sort of searching in an uncontrolled environment is internet searching. Although retrieving some information is not difficult, performing an effective search can be tiresome and time-consuming, if not impossible. Usually search engines can only match search terms against the textual content, and all the possible forms and variations of a search term, including variant spellings and synonyms, must be entered individually if every relevant item is to be retrieved.
Added to this problem at the search stage is a comparable problem at the indexing stage, namely a lack of general agreement as to what individual documents might be about. Determining the content of documents (sometimes referred to as the aboutness of an item) is a very subjective process.
- Type
- Chapter
- Information
- Essential Thesaurus Construction , pp. 38 - 39Publisher: FacetPrint publication year: 2006