Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- Contents Technology and Web Information Systems |
Accelerating Web Content Filtering by the Early Decision Algorithm
1 The authors are with High-speed Lab, Department of Computer Science, National Chiao Tung University, Taiwan. E-mail: pclin{at}cis.nctu.edu.tw; mdliu{at}cis.nctu.edu.tw, 2 The author is with Department of Computer Science, National Chiao Tung University, Taiwan. E-mail: ydlin{at}cis.nctu.edu.tw, 3 The author is with Department of Information Management, National Taiwan University of Science and Technology, Taiwan. E-mail: laiyc{at}cs.ntust.edu.tw
| Abstract |
|---|
Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.
Key Words: Web filtering, text classification, World Wide Web, early decision
Manuscript received February 16, 2006. Manuscript revised August 25, 2007.