Copyright © 2006 The Institute of Electronics, Information and Communication Engineers
Special Section on Parallel/Distributed Computing and Networking -- Papers -- Grid Computing |
DRIC: Dependable Grid Computing Framework
1 The authors are with Cluster and Grid Computing Lab., Huazhong University of Science and Technology, Wuhan, China. E-mail: hjin{at}hust.edu.cn
Grid computing presents a new trend to distributed and Internet computing to coordinate large scale resources sharing and problem solving in dynamic, multi-institutional virtual organizations. Due to the diverse failures and error conditions in the grid environments, developing, deploying, and executing applications over the grid is a challenge, thus dependability is a key factor for grid computing. This paper presents a dependable grid computing framework, called DRIC, to provide an adaptive failure detection service and a policy-based failure handling mechanism. The failure detection service in DRIC is adaptive to users' QoS requirements and system conditions, and the failure-handling mechanism can be set optimized based on decision-making method by a policy engine. The performance evaluation results show that this framework is scalable, high efficiency and low overhead.
Key Words: dependable computing, grid computing, failure detection, fault tolerance
Manuscript received April 1, 2005. Manuscript revised August 16, 2005.