PISIT' S THAI NATURAL LANGUAGE PROCESSING LABORATORY
This lab is formed since August 26, 1998
e-mail: [email protected]
For C7 members, please check this C7 address list.

KEYWORDS
Thai Natural Language Processing Lab., words segmentation, dictionaries, algorithms, Thai text-to-speech.
PERFORMANCE COMPARISON OF THAI WORD SEPARATION ALGORITHMS

Pisit Promchan ([email protected])
Telecom Asia Corp. Public Co. Ltd.
2nd flr., 4th bldg., TOT
Changwatana Rd., BKK, Thailand

Yunyong Teng-amnuay ([email protected])
Department Of Computer Engineering, Chulalongkorn University
Bangkok 10330, Thailand

ABSTRACT
This papar presents a performance comparison of word-separation algorithms for Thai language. The research surveyed existing algorithms. A synthesis of performance indicators was attempted together with a development of measurement methodology. A body of Thai reference data was collected to validate the accuracy of Thai word separation. Experimental results show that the longest-word pattern-matching algorithm gives the most accurate output words while the backtracking algorithm gives the least error words. Word-usage-frequency algorithm gives the highest valid words ratio per number of words in its dictionary. The usage of ambiguity dictionary gives the best ambiguous case resolution, whereas the shortest-word pattern-matching algorithm gives the highest number of output words.
Full paper with pdf format click here [NCSEC'98]


This page hosted by � Get your own Free Home Page
Hosted by www.Geocities.ws

1