Research on restricted-domain Chinese information extraction – Journal of Computational Information Systems

Volume 4 - Issue 3

Research on restricted-domain Chinese information extraction

Abstract

With the rapid growth of information on the web, how to extract and analyze latent information from semi-structured pages to improve the search service is of paramount importance and has led to an increase in Information Extraction (IE) research. In this paper, we reviewed the existing IE techniques and presented a novel restricted-domain Chinese IE method based on text block classification using natural language processing (NLP) statistics and rule methods. First we turn a text into a set of text blocks, and segment them into appropriate units by lexical analysis and named entity recognition, classify the blocks into positive and negative according to whether there is factual information contained, then extract information by context rule parser in the positive blocks and finally fill them in the items of a predefined template. The system developed under this method acquired good result in the experiment.

Paper Details

PaperID: 48549100543

Author's Name: Lu, X., Huang, H., Lin, C., Shi, S.

Volume: Volume 4

Issues: Issue 3

Keywords: Information extraction, Restricted-domain, Template, Text block classification

Year: 2008

Month: June

Pages: 779-786