Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- VLSI Systems |
Power Estimation of Partitioned Register Files in a Clustered Architecture with Performance Evaluation
1 The authors are with Graduate School of Information Sciences, Tohoku University, Sendai-shi, 9808579 Japan. E-mail: yukinori{at}archi.is.tohoku.ac.jp, 2 The author is with Sendai Software Development Center, FineArch Inc., Sendai-shi, 9806108 Japan.
High power consumption and slow access of enlarged and multiported register files make it difficult to design high performance superscalar processors. The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is expect to overcome the register file issues. In the clustered architecture, the more a monolithic register file is partitioned, the lower power and faster access register files can be realized. However, the partitioning causes losses of IPC (instructions per clock cycle) due to communication among register files. Therefore, degree of partitioning has a strong impact on the trade-off between power consumption and performance. In addition, the organization of partitioned register files also affects the trade-off. In this paper, we attempt to investigate appropriate degrees of partitioning and organizations of partitioned register files in a clustered architecture to assess the trade-off. From the results of execute-driven simulation, we find that the organization of register files and the degree of partitioning have a strong impact on the IPC, and the configuration with non-consistent register files can make use of the partitioned resources more effectively. From the results of register file access time and energy modeling, we find that the configurations with the highly partitioned non-consistent register file organization can receive benefit of the partitioning in terms of operating frequency and access energy of register files. Further, we examine relationship between IPS (instructions per second) and the product of IPC and operating frequency of register files. The results suggest that highly partitioned non-consistent configurations tends to gain more advantage in performance and power.
Key Words: clustered architecture, partitioned register files, non-consistent register files, instruction level parallelism
Manuscript received December 5, 2005. Manuscript revised July 22, 2006.
References
[1] V. Zyuban and P. Kogge, "The energy complexity of register files," Proc. 1998 International Symposium on Low Power Electronics and Design, pp.305310, 1998.
[2] J.L. Cruz, A. González, M. Valero, and N.P. Topham, "Multiple-banked register file architectures," Proc. 27th Annual International Symposium on Computer Architecture, pp.316325, 2000.
[3] R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, "Reducing the complexity of the register file in dynamic superscalar processors," Proc. 34th Annual International Symposium on Microarchitecture, pp.237248, 2001.
[4] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-effective superscalar processors," Proc. 24th Annual International Symposium on Computer Architecture, pp.206218, 1997.
[5] J.M. Parcerisa and A. González, "Reducing wire delay penalty through value prediction," Proc. 33rd Annual International Symposium on Microarchitecture, pp.317326, 2000.
[6] A. Aggarwal and M. Franklin, "An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors," Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pp.172179, 2001.
[7] R.E. Kessler, "The alpha 21264 microprocessor," IEEE Micro, vol.19, no.2, pp.2436, 1999.
[8] J. Llosa, M. Valero, and E. Ayguade, "Non-consistent dual register files to reduce register pressure," Proc. 1st IEEE Symposium on High-Performance Computer Architecture, pp.2231, 1995.
[9] G. Reinman and N.P. Jouppi, "CACTI 2.0: An integrated cache timing and power model," tech. rep., WRL Research Report 2000/7, 2000.
[10] Y. Sato, K. Suzuki, and T. Nakamura, "An operand status based instruction steering scheme for clustered architectures," Proc. 2005 International Conference on Computer Design (CDES'05), pp.168174, 2005.
[11] V.V. Zyuban and P.M. Kogge, "Inherently lower-power high-performance superscalar architectures," IEEE Trans. Comput., vol.50, no.3, pp.268285, 2001.
[12] R. Desikan, D. Burger, and S.W. Keckler, "Measuring experimental error in microprocessor simulation," Proc. 28th Annual International Symposium on Computer Architecture, pp.266277, 2001.
[13] D. Burger and T.M. Austin, "The simplescalar tool set, version 2.0," Compututer Architecture News, vol.25, no.3, pp.1325, 1997.
[14] A. Seznec, E. Toullec, and O. Rochecouste, "Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors," Proc. 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp.383394, 2002.
[15] I. Park, M.D. Powell, and T.N. Vijaykumar, "Reducing register ports for higher speed and lower energy," Proc. 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp.171182, 2002.
[16] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "MediaBench: A tool for evaluating and synthesizing multimedia and communicatons systems," Proc. 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp.330335, 1997.
[17] J.L. Henning, "SPEC CPU2000: Measuring CPU performance in the new millennium," Computer, vol.33, no.7, pp.2835, 2000.
[18] E. Borch and E. Tune, "Loose loops sink chips," Proc. 8th Ingternational Symposium on High-Performance Computer Architecture, pp.299310, 2002.
[19] K. Farkas, N. Jouppi, and P. Chow, "Register file design considerations in dynamically scheduled processors," Proc. 2nd IEEE Symposium on High-Performance Computer Architecture, pp.4051, 1996.
[20] M.D. Brown and Y.N. Patt, "Demand-only broadcast: Reducing register file and bypass power in clustered execution cores," tech. rep., The University of Texas at Austin, TR-HPS-2004-001, 2004.
[21] Y. Sato, K. Suzuki, and T. Nakamura, "Partitioned register file designs for clustered architectures," Journal of Information, vol.9, pp.119134, Jan. 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||