Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(2):200-208; doi:10.1093/ietisy/e91-d.2.200
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by ASANO, Y.
Right arrow Articles by NISHIZEKI, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Special Section on Foundations of Computer Science -- Papers -- Scoring Algorithms

Improvements of HITS Algorithms for Spam Links

Yasuhito ASANO1, Yu TEZUKA2 and Takao NISHIZEKI3

1 The author is with the Department of Information Sciences, Faculty of Science and Engineering, Tokyo Denki University, Saitama-ken, 350–0394 Japan. E-mail: asano{at}j.dendai.ac.jp, 2 The author is with Mobile Entertainment Category, Victor Company of Japan, Limited, Maebashi-shi, 371–8543 Japan., 3 The author is with the Graduate School of Information Sciences, Tohoku University, Sendai-shi, 980–8579 Japan.


   Abstract

The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including Bharat's improved HITS, abbreviated to BHITS, proposed by Bharat and Henzinger cannot be used to find related pages any more on today's Web, due to an increase of spam links. In this paper, we first propose three methods to find "linkfarms," that is, sets of spam links forming a densely connected subgraph of a Web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarms by employing name servers, is most suitable for finding related pages on today's Web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.

Key Words: scoring algorithm, Web pages, HITS, BHITS, PageRank, search engine, Web graph, spam page, spam links


Manuscript received January 25, 2007. Manuscript revised June 3, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.