Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Special Section on Knowledge-Based Software Engineering -- Papers -- Software Engineering |
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator
1 The authors are with the Graduate School of Information Science and Technology, Osaka University, Suita-shi, 565–0871 Japan. E-mail: o-mizuno{at}ist.osaka-u.ac.jp
This paper describes a novel approach for detecting fault-prone modules using a spam filtering technique. Fault-prone module detection in source code is important for the assurance of software quality. Most previous fault-prone detection approaches have been based on using software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. Because of the increase in the need for spam e-mail detection, the spam filtering technique has progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in such a way that the source code modules are considered text files and are applied to the spam filter directly. To show the applicability of our approach, we conducted experimental applications using source code repositories of Java based open source developments. The result of experiments shows that our approach can correctly predict 78% of actual fault-prone modules as fault-prone.
Key Words: fault-prone module, prediction, spam filter
Manuscript received July 2, 2007. Manuscript revised October 15, 2007.
Reference
[1] ArgoUML Project. http://argouml.tigris.org/ [2] P. Bellini, I. Bruno, P. Nesi, and D. Rogai, "Comparing fault-proneness estimation models," Proc. 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS '05), pp.205–214, 2005. [3] bogofilter. http://bogofilter.sourceforge.net/ [4] L.C. Briand, W.L. Melo, and J. Wust, "Assessing the applicability of fault-proneness models across object-oriented software projects," IEEE Trans. Softw. Eng., vol.28, no.7, pp.706–720, 2002. [5] S. Chhabra, W.S. Yerazunis, and C. Siefkes, "Spam filtering using a Markov random field model with variable weighting schemas," Proc. Fourth IEEE International Conference on Data Mining (ICDM 2004), pp.347–350, 2004. [6] CRM114 — the Controllable Regex Mutilator. http://crm114.sourceforge.net/ [7] G. Denaro and M. Pezze, "An empirical evaluation of fault-proneness models," Proc. 24th International Conference on Software Engineering (ICSE '02), pp.241–251, 2002. [8] S. Diehl, H. Gall, and A.E. Hassan, ed., Proc. 2006 International Workshop on Mining Software Repositories (MSR 2006), Shanghai, China, ACM, May, 2006. [9] Eclipse Project. http://www.eclipse.org/ [10] P. Graham, Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pp.121–129, O'Reilly Media, 2004. [11] L. Guo, B. Cukic, and H. Singh, "Predicting fault prone modules by the dempster-shafer belief networks," Proc. 18th IEEE International Conference on Automated Software Engineering (ASE '03), pp.249–252, 2003. [12] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Trans. Inf. Syst., vol.22, no.1, pp.5–53, 2004. [13] T.M. Khoshgoftaar and E.B. Allen, "Logistic regression modeling of software quality," International Journal of Reliability, Quality and Safety Engineeering, vol.6, no.4, pp.303–317, 1999. [14] T.M. Khoshgoftaar and E.B. Allen, "Controlling overfitting in classification tree models of software quality," Empir. Softw. Eng., vol.6, no.1, pp.59–79, 2001. [15] T.M. Khoshgoftaar, E.B. Allen, and J. Deng, "Using regression trees to classify fault-prone software modules," IEEE Trans. Reliab., vol.51, no.4, pp.455–462, 2002. [16] T.M. Khoshgoftaar and N. Seliya, "Software quality classification modeling using SPRINT decision tree algorithm," Proc. 14th International Conference on Tools with Artificial Intelligence, pp.365–374, 2002. [17] T.M. Khoshgoftaar and N. Seliya, "Comparative assessment of software quality classification techniques: An empirical study," Empir. Softw. Eng., vol.9, pp.229–257, 2004. [18] T.M. Khoshgoftaar, R. Shan, and E.B. Allen, "Using product, process, and execution metrics to predict fault-prone software modules with classification trees," Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE '00), pp.301–310, 2000. [19] D.M. Marks, Testing very big systems, McGraw-Hill, 1992. [20] T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Trans. Softw. Eng., vol.33, no.1, pp.2–13, Jan. 2007. [21] O. Mizuno and T. Kikuno, "Training on errors experiment to detect fault-prone software modules by spam filter," 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE2007), pp.405–414, Dubrovnik, Croatia, 2007. [22] POPFile. http://popfile.sourceforge.net/ [23] Postini Inc., Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news_events/pr/pr120606.php [24] N. Seliya, T.M. Khoshgoftaar, and S. Zhong, "Analyzing software quality with limited fault-proneness defect data," Proc. Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE '05), pp.89–98, 2005. [25] C. Siefkes, F. Assis, S. Chhabra, and W.S. Yerazunis, "Combining winnow and orthogonal sparse bigrams for incremental spam filtering," Proc. Conference on Machine Learning (ECML)/European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004. [26] J. Sliwerski, T. Zimmermann, and A. Zeller, "When do changes induce fixes? (on fridays.)," Proc. Mining Software Repository 2005, pp.24–28, 2005. [27] SpamAssassin. http://spamassassin.apache.org/index.html [28] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, and A. Wesslén, Experimentation in software engineering: An introduction, Kluwer Academic Publishers, 2000.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Request Permissions
![]()
Google Scholar ![]()
![]()
Articles by MIZUNO, O.
![]()
Articles by KIKUNO, T.
![]()
Social Bookmarking ![]()
![]()
What's this?