Volume 10 - Issue 10
Improvements of HITS algorithm based on content analysis
Abstract
HITS algorithm developed by Jon Kleinberg made use of the link structure of the web network in order to discover and rank pages relevant to a particular topic. But it only took account of the hyperlink structure and completely excluded contents of web pages. Moreover, it ignored the fact that degrees of the importance of many links may be different. Therefore, this algorithm will lead to topic drifts. In this paper, we propose an improved HITS algorithm based on the full text, the anchor text and the close textual context of the hyperlink. This method firstly computes the relevance between arbitrary two pages based on page topic similarity and meta-information similarity. Then, by using the relevance, a new adjacency matrix is constructed to iteratively calculate authorities and hubs. Preliminary experiments show the new algorithm improves the efficiency and quality of query, reduce the theme drifts.
Paper Details
PaperID: 84901828104
Author's Name: Tian, X., Du, Y., Song, W., Liu, W., Meng, Q.
Volume: Volume 10
Issues: Issue 10
Keywords: Anchor text, Authority, HITS algorithm, Hub, Similarity
Year: 2014
Month: May
Pages: 4049 - 4058