Efficient crawling strategy for topical web information
Efficient topical crawling strategy is essential for topic-specific search engines. Most existing crawling strategy only focused on either precision of the collected pages or the crawling speed. In this paper, an efficient crawling strategy for topical web information is introduced, which uses link structure of pages as well as semantic similarity of pages. The novelty of our method is that it is able to effectively extract pages with a high degree of relevancy to a specific topic by incorporating word similarity and ontology, and further more can achieve a respectable coverage at a rapid rate. Evaluation showed that our approach has promising results.
Author's Name: Lin, K.
Volume: Volume 3
Issues: Issue 3
Keywords: Crawling strategy, Ontology, Word similarity matrix, WordNet