Chinese Word Segmentation And its Effects on Chinese Information Retrieval Public Deposited

Downloadable Content

Download PDF
Last Modified
  • February 28, 2019
Creator
  • Wen, Li
    • Affiliation: School of Information and Library Science
Abstract
  • This experiment tests the effectiveness of Chinese information retrieval using a segmenter that is developed with dictionary-based Maximum Forward Matching algorithm. IRTOOLS, an IR system developed at UNC Chapel Hill, is used as the platform. This study finds that less accurate segmentation will not necessarily yield worse information retrieval results. As a matter of fact, allowing two-character words only in the dictionary produced the best retrieval results in terms of precision and recall. Allowing longer words in the dictionary will lead to the missing of index words -- the problem of over-specification. However, long-word indexing can produce better results when the long-word is also used in queries.
Date of publication
Subject
Resource type
Rights statement
  • In Copyright
Advisor
  • Newby, Gregory B.
Degree granting institution
  • University of North Carolina at Chapel Hill
Language
Extent
  • 46 p.
Access
  • Open access
Parents:

This work has no parents.

Items