Volume 4 - Issue 3
Statistics-rule based hierarchical web page classification
Abstract
Statistics-based classification methods are common-used in hierarchical web classification. However, classification precision of statistics-based methods often drops when categories are very similar to each other because of feature overlapping. Due to the nature of hierarchical web classification, categories sharing the same parent (e.g., leaf categories in the hierarchy) are often very similar to each other. Poor precision is therefore often observed on leaf categories using statistics-based classification methods with top-down level-based approach. To improve the classification precision, we propose to use rule-based classification methods on top of statistics-based methods in hierarchical web classification. Experiments showed that our method performed well on our education web collections.
Paper Details
PaperID: 48549091870
Author's Name: Tan, J.
Volume: Volume 4
Issues: Issue 3
Keywords: Hierarchical web classification, Rule-based classification, Statistics-based Classification, Statistics-rule based classification
Year: 2008
Month: June
Pages: 771 - 778