Compound noun segmentation based on lexical data extracted from corpus

JUNTAE YOON

doi:10.1017/S1351324901002637

Compound noun segmentation based on lexical data extracted from corpus

Published online by Cambridge University Press: 25 July 2001

JUNTAE YOON

Show author details

JUNTAE YOON: Affiliation:
IRCS, University of Pennsylvania, 3401 Walnut St., Suite 400A, Philadelphia, PA 19104-6228, USA; e-mail: jtyoon@linc.cis.upenn.edu

Article contents

Abstract

Get access

Rights & Permissions

Abstract

Compound noun segmentation is one of the crucial problems in Korean language processing because a series of nouns in Korean may appear without space in real text, which makes it difficult to identify its morphological constituents. This paper presents an effective method of Korean compound noun segmentation based on lexical data extracted from a corpus. The segmentation consists of two tasks: First, it uses a Hand-Build Segmentation Dictionary (HBSD) to segment compound nouns which frequently occur or need an exceptional process. Second, a segmentation algorithm using data from a corpus is proposed, where simple nouns and their frequencies are stored in a Simple Noun Dictionary (SND) for segmentation. The analysis is executed based on modified tabular parsing using min-max operation. Our experiments have shown a very effective accuracy rate of about 97.29%, which turns out to be very effective.

Type: Research Article
Information: Natural Language Engineering , Volume 7 , Issue 2 , June 2001 , pp. 167 - 185

DOI: https://doi.org/10.1017/S1351324901002637 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

Compound noun segmentation based on lexical data extracted from corpus

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests