Chinese Word Segmentation And its Effects on Chinese Information Retrieval Public Deposited
Downloadable ContentDownload PDF
- Last Modified
- February 28, 2019
- Affiliation: School of Information and Library Science
- This experiment tests the effectiveness of Chinese information retrieval using a segmenter that is developed with dictionary-based Maximum Forward Matching algorithm. IRTOOLS, an IR system developed at UNC Chapel Hill, is used as the platform. This study finds that less accurate segmentation will not necessarily yield worse information retrieval results. As a matter of fact, allowing two-character words only in the dictionary produced the best retrieval results in terms of precision and recall. Allowing longer words in the dictionary will lead to the missing of index words -- the problem of over-specification. However, long-word indexing can produce better results when the long-word is also used in queries.
- Date of publication
- April 2003
- Resource type
- Rights statement
- In Copyright
- Newby, Gregory B.
- Degree granting institution
- University of North Carolina at Chapel Hill
- 46 p.
- Open access
This work has no parents.
|Chinese Word Segmentation And its Effects on Chinese Information Retrieval||2019-05-14||Public||