Volume 4 - Issue 3
Research on restricted-domain Chinese information extraction
Abstract
With the rapid growth of information on the web, how to extract and analyze latent information from semi-structured pages to improve the search service is of paramount importance and has led to an increase in Information Extraction (IE) research. In this paper, we reviewed the existing IE techniques and presented a novel restricted-domain Chinese IE method based on text block classification using natural language processing (NLP) statistics and rule methods. First we turn a text into a set of text blocks, and segment them into appropriate units by lexical analysis and named entity recognition, classify the blocks into positive and negative according to whether there is factual information contained, then extract information by context rule parser in the positive blocks and finally fill them in the items of a predefined template. The system developed under this method acquired good result in the experiment.
Paper Details
PaperID: 48549100543
Author's Name: Lu, X., Huang, H., Lin, C., Shi, S.
Volume: Volume 4
Issues: Issue 3
Keywords: Information extraction, Restricted-domain, Template, Text block classification
Year: 2008
Month: June
Pages: 779-786