Copyright © 2008 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- Pattern Recognition |
Improving Automatic Text Classification by Integrated Feature Analysis
1 The authors are with the Graduate School of Engineering, Mie University, Tsu-shi, 514–8507, Japan. E-mail: busagala{at}hi.info.mie-u.ac.jp; ohyama{at}hi.info.mie-u.ac.jp; waka{at}hi.info.mie-u.ac.jp; kimura{at}hi.info.mie-u.ac.jp
Feature transformation in automatic text classification (ATC) can lead to better classification performance. Furthermore dimensionality reduction is important in ATC. Hence, feature transformation and dimensionality reduction are performed to obtain lower computational costs with improved classification performance. However, feature transformation and dimension reduction techniques have been conventionally considered in isolation. In such cases classification performance can be lower than when integrated. Therefore, we propose an integrated feature analysis approach which improves the classification performance at lower dimensionality. Moreover, we propose a multiple feature integration technique which also improves classification effectiveness.
Key Words: text classification/categorization, feature transformation, dimension reduction, principal component analysis, canonical discriminant analysis, integrated feature analysis, multiple feature integration
Manuscript received June 19, 2007. Manuscript revised November 7, 2007.
Reference
[1] F. Sebastiani, "Machine learning in automated text categorization," ACM Comput. Surv., vol.34, no.1, pp.1–47, 2002. [2] S. Lam and L. Lee, "Feature reduction for neural network based text categorization," Proc. DASFAA-99, 6th IEEE International Conference on Database Advanced systems for advanced applications, pp.195–202, Hsinchu, TW, 1999. [3] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed., John Wiley & Sons, 2001. [4] M. Wang and J. Nie, "A latent semantic structure model for text classification," Proc. ACM SIGIR workshop on Mathematical/formal methods in information retrieval, Toronto, Canada, 2003. [5] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed., Academic Press, Inc, 1990. [6] H. Kim, P. Howland, and H. Park, "Dimension reduction in text classification with support vector machines," J. Machine Learning Research, vol.6, pp.37–53, 2005. [7] W. Lam and Y. Han, "Automatic textual document categorization based on generalized instance sets and a metamodel," IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no.5, pp.628–633, 2003. [8] J. Rennie, L. Shih, J. Teevan, and D. Karger, "Tackling the poor assumptions of naive bayes text classifiers," Proc. 12th International Conference on Machine Learning (ICML), pp.616–623, Washington DC, 2003. [9] T. Zhang and F. Oles, "Text categorization based on regularized linear classification methods," Information Retrieval Journal, vol.4, pp.5–31, 2001. [10] P. Soucy and G. Mineau, "Beyond TFIDF weighting for text categorization in the vector space model," Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp.1130–1135, 2005. [11] Y. Yang, "A study on thresholding strategies for text categorization," Proc. 24th ACM/SIGIR International Conference on Research and Development in Information Retrieval (SIGIR), pp.137–145, 2001. [12] T. Joachims, "Text categorization with support vector machines: Learning with many relevant features," Proc. 10th European Conference on Machine Learning, pp.137–142, 1998. [13] Y. Yang and X. Liu, "A re-examination of text categorization methods," Proc. Twenty-First International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.42–49, 1999. [14] S.T. Dumais, J.C. Platt, D. Hecherman, and M. Sahami, "Inductive learning algorithms and representations for text categorization," Proc. 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, Nov. 1998, ed. G. Gardarin, J.C. French, N. Pissinou, K. Makki, and L. Bouganim, pp.148–155, ACM, 1998. [15] H. Li and K. Yamanishi, "Text classification using ESC-based stochastic decision lists," Proc. CIKM-99, 8th ACM International Conference on Information and Knowledge Management, Kansas City, US, pp.122–130, ACM Press, New York, US, 1999. [16] Y. Yang and X. Liu, "A re-examination of text categorization methods," 22nd Annual International SIGIR, Berkley, pp.42–49, Aug. 1999. [17] K.S. Jones, "A statistical interpretation of term specificity and its application in retrieval," J. Documentation, vol.28, pp.11–12, 1972. [18] P. Verboon and I.A. vander Lans Pychometrika Journal, vol.59, no.4, pp.48–507, 1994.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Full Text (PDF)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Request Permissions
![]()
Google Scholar ![]()
![]()
Articles by BUSAGALA, L. S.P.
![]()
Articles by KIMURA, F.
![]()
Social Bookmarking ![]()
![]()
What's this?