Skip Navigation

IEICE Transactions on Information and Systems 2008 E91-D(4):1101-1109; doi:10.1093/ietisy/e91-d.4.1101
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by BUSAGALA, L. S.P.
Right arrow Articles by KIMURA, F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- Pattern Recognition

Improving Automatic Text Classification by Integrated Feature Analysis

Lazaro S.P. BUSAGALA1, Wataru OHYAMA1, Tetsushi WAKABAYASHI1 and Fumitaka KIMURA1

1 The authors are with the Graduate School of Engineering, Mie University, Tsu-shi, 514–8507, Japan. E-mail: busagala{at}hi.info.mie-u.ac.jp; ohyama{at}hi.info.mie-u.ac.jp; waka{at}hi.info.mie-u.ac.jp; kimura{at}hi.info.mie-u.ac.jp

Feature transformation in automatic text classification (ATC) can lead to better classification performance. Furthermore dimensionality reduction is important in ATC. Hence, feature transformation and dimensionality reduction are performed to obtain lower computational costs with improved classification performance. However, feature transformation and dimension reduction techniques have been conventionally considered in isolation. In such cases classification performance can be lower than when integrated. Therefore, we propose an integrated feature analysis approach which improves the classification performance at lower dimensionality. Moreover, we propose a multiple feature integration technique which also improves classification effectiveness.

Key Words: text classification/categorization, feature transformation, dimension reduction, principal component analysis, canonical discriminant analysis, integrated feature analysis, multiple feature integration


Manuscript received June 19, 2007. Manuscript revised November 7, 2007.

Reference

[1] F. Sebastiani, "Machine learning in automated text categorization," ACM Comput. Surv., vol.34, no.1, pp.1–47, 2002.

[2] S. Lam and L. Lee, "Feature reduction for neural network based text categorization," Proc. DASFAA-99, 6th IEEE International Conference on Database Advanced systems for advanced applications, pp.195–202, Hsinchu, TW, 1999.

[3] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed., John Wiley & Sons, 2001.

[4] M. Wang and J. Nie, "A latent semantic structure model for text classification," Proc. ACM SIGIR workshop on Mathematical/formal methods in information retrieval, Toronto, Canada, 2003.

[5] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed., Academic Press, Inc, 1990.

[6] H. Kim, P. Howland, and H. Park, "Dimension reduction in text classification with support vector machines," J. Machine Learning Research, vol.6, pp.37–53, 2005.

[7] W. Lam and Y. Han, "Automatic textual document categorization based on generalized instance sets and a metamodel," IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no.5, pp.628–633, 2003.

[8] J. Rennie, L. Shih, J. Teevan, and D. Karger, "Tackling the poor assumptions of naive bayes text classifiers," Proc. 12th International Conference on Machine Learning (ICML), pp.616–623, Washington DC, 2003.

[9] T. Zhang and F. Oles, "Text categorization based on regularized linear classification methods," Information Retrieval Journal, vol.4, pp.5–31, 2001.

[10] P. Soucy and G. Mineau, "Beyond TFIDF weighting for text categorization in the vector space model," Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp.1130–1135, 2005.

[11] Y. Yang, "A study on thresholding strategies for text categorization," Proc. 24th ACM/SIGIR International Conference on Research and Development in Information Retrieval (SIGIR), pp.137–145, 2001.

[12] T. Joachims, "Text categorization with support vector machines: Learning with many relevant features," Proc. 10th European Conference on Machine Learning, pp.137–142, 1998.

[13] Y. Yang and X. Liu, "A re-examination of text categorization methods," Proc. Twenty-First International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.42–49, 1999.

[14] S.T. Dumais, J.C. Platt, D. Hecherman, and M. Sahami, "Inductive learning algorithms and representations for text categorization," Proc. 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, Nov. 1998, ed. G. Gardarin, J.C. French, N. Pissinou, K. Makki, and L. Bouganim, pp.148–155, ACM, 1998.

[15] H. Li and K. Yamanishi, "Text classification using ESC-based stochastic decision lists," Proc. CIKM-99, 8th ACM International Conference on Information and Knowledge Management, Kansas City, US, pp.122–130, ACM Press, New York, US, 1999.

[16] Y. Yang and X. Liu, "A re-examination of text categorization methods," 22nd Annual International SIGIR, Berkley, pp.42–49, Aug. 1999.

[17] K.S. Jones, "A statistical interpretation of term specificity and its application in retrieval," J. Documentation, vol.28, pp.11–12, 1972.

[18] P. Verboon and I.A. vander Lans Pychometrika Journal, vol.59, no.4, pp.48–507, 1994.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by BUSAGALA, L. S.P.
Right arrow Articles by KIMURA, F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?