Skip Navigation

IEICE Transactions on Information and Systems 2007 E90-D(3):627-636; doi:10.1093/ietisy/e90-d.3.627
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by SATO, Y.
Right arrow Articles by NAKAMURA, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2007 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- VLSI Systems

Power Estimation of Partitioned Register Files in a Clustered Architecture with Performance Evaluation

Yukinori SATO1,2, Ken-ichi SUZUKI1 and Tadao NAKAMURA1

1 The authors are with Graduate School of Information Sciences, Tohoku University, Sendai-shi, 980–8579 Japan. E-mail: yukinori{at}archi.is.tohoku.ac.jp, 2 The author is with Sendai Software Development Center, FineArch Inc., Sendai-shi, 980–6108 Japan.

High power consumption and slow access of enlarged and multiported register files make it difficult to design high performance superscalar processors. The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is expect to overcome the register file issues. In the clustered architecture, the more a monolithic register file is partitioned, the lower power and faster access register files can be realized. However, the partitioning causes losses of IPC (instructions per clock cycle) due to communication among register files. Therefore, degree of partitioning has a strong impact on the trade-off between power consumption and performance. In addition, the organization of partitioned register files also affects the trade-off. In this paper, we attempt to investigate appropriate degrees of partitioning and organizations of partitioned register files in a clustered architecture to assess the trade-off. From the results of execute-driven simulation, we find that the organization of register files and the degree of partitioning have a strong impact on the IPC, and the configuration with non-consistent register files can make use of the partitioned resources more effectively. From the results of register file access time and energy modeling, we find that the configurations with the highly partitioned non-consistent register file organization can receive benefit of the partitioning in terms of operating frequency and access energy of register files. Further, we examine relationship between IPS (instructions per second) and the product of IPC and operating frequency of register files. The results suggest that highly partitioned non-consistent configurations tends to gain more advantage in performance and power.

Key Words: clustered architecture, partitioned register files, non-consistent register files, instruction level parallelism


Manuscript received December 5, 2005. Manuscript revised July 22, 2006.

References

[1] V. Zyuban and P. Kogge, "The energy complexity of register files," Proc. 1998 International Symposium on Low Power Electronics and Design, pp.305–310, 1998.

[2] J.L. Cruz, A. González, M. Valero, and N.P. Topham, "Multiple-banked register file architectures," Proc. 27th Annual International Symposium on Computer Architecture, pp.316–325, 2000.

[3] R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, "Reducing the complexity of the register file in dynamic superscalar processors," Proc. 34th Annual International Symposium on Microarchitecture, pp.237–248, 2001.

[4] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-effective superscalar processors," Proc. 24th Annual International Symposium on Computer Architecture, pp.206–218, 1997.

[5] J.M. Parcerisa and A. González, "Reducing wire delay penalty through value prediction," Proc. 33rd Annual International Symposium on Microarchitecture, pp.317–326, 2000.

[6] A. Aggarwal and M. Franklin, "An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors," Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pp.172–179, 2001.

[7] R.E. Kessler, "The alpha 21264 microprocessor," IEEE Micro, vol.19, no.2, pp.24–36, 1999.

[8] J. Llosa, M. Valero, and E. Ayguade, "Non-consistent dual register files to reduce register pressure," Proc. 1st IEEE Symposium on High-Performance Computer Architecture, pp.22–31, 1995.

[9] G. Reinman and N.P. Jouppi, "CACTI 2.0: An integrated cache timing and power model," tech. rep., WRL Research Report 2000/7, 2000.

[10] Y. Sato, K. Suzuki, and T. Nakamura, "An operand status based instruction steering scheme for clustered architectures," Proc. 2005 International Conference on Computer Design (CDES'05), pp.168–174, 2005.

[11] V.V. Zyuban and P.M. Kogge, "Inherently lower-power high-performance superscalar architectures," IEEE Trans. Comput., vol.50, no.3, pp.268–285, 2001.

[12] R. Desikan, D. Burger, and S.W. Keckler, "Measuring experimental error in microprocessor simulation," Proc. 28th Annual International Symposium on Computer Architecture, pp.266–277, 2001.

[13] D. Burger and T.M. Austin, "The simplescalar tool set, version 2.0," Compututer Architecture News, vol.25, no.3, pp.13–25, 1997.

[14] A. Seznec, E. Toullec, and O. Rochecouste, "Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors," Proc. 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp.383–394, 2002.

[15] I. Park, M.D. Powell, and T.N. Vijaykumar, "Reducing register ports for higher speed and lower energy," Proc. 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp.171–182, 2002.

[16] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems," Proc. 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp.330–335, 1997.

[17] J.L. Henning, "SPEC CPU2000: Measuring CPU performance in the new millennium," Computer, vol.33, no.7, pp.28–35, 2000.

[18] E. Borch and E. Tune, "Loose loops sink chips," Proc. 8th Ingternational Symposium on High-Performance Computer Architecture, pp.299–310, 2002.

[19] K. Farkas, N. Jouppi, and P. Chow, "Register file design considerations in dynamically scheduled processors," Proc. 2nd IEEE Symposium on High-Performance Computer Architecture, pp.40–51, 1996.

[20] M.D. Brown and Y.N. Patt, "Demand-only broadcast: Reducing register file and bypass power in clustered execution cores," tech. rep., The University of Texas at Austin, TR-HPS-2004-001, 2004.

[21] Y. Sato, K. Suzuki, and T. Nakamura, "Partitioned register file designs for clustered architectures," Journal of Information, vol.9, pp.119–134, Jan. 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by SATO, Y.
Right arrow Articles by NAKAMURA, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?