Copyright © 2007 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- VLSI Systems |
Dynamic Reconfiguration of Cache Indexing in Embedded Processors
1 The authors are with the School of Computer Science and Engineering, Seoul National University, Seoul, 151742 Korea. E-mail: jihong{at}davinci.snu.ac.kr, 2 The author is with the School of Computer Science, Kookmin University, Seoul, 151742 Korea.
Cache performance optimization is an important design consideration in building high-performance embedded processors. Unlike general-purpose microprocessors, embedded processors can take advantages of application-specific information in optimizing the cache performance. One of such examples is to use modified cache index bits (over conventional index bits) based on memory access traces from key target embedded applications so that the number of conflict misses can be reduced. In this paper, we present a novel fine-grained cache reconfiguration technique which allows an intra-program reconfiguration of cache index bits, thus better reflecting the changing characteristics of a program execution. The proposed technique, called dynamic reconfiguration of index bits (DRIB), dynamically changes cache index bits in the function level. This compiler-directed and fine-grained approach allows each function to be executed using its own optimal index bits with no additional hardware support. In order to avoid potential performance degradation by frequent cache invalidations from reconfiguring cache index bits, we describe an efficient algorithm for selecting target functions whose cache index bits are reconfigured. Our algorithm ensures that the number of cache misses reduced by DRIB outnumbers the number of cache misses increased from cache invalidations. We also propose a new cache architecture, Two-Level Indexing (TLI) cache, which further reduces the number of conflict misses by intelligently dividing indexing steps into two stages. Our experimental results show that the DRIP approach combined with the TLI cache reduces the number of cache misses by 35% over the conventional cache indexing technique.
Key Words: cache indexing, cache organization, dynamic reconfiguration, embedded processor, microprocessor architecture
Manuscript received December 28, 2005. Manuscript revised July 2, 2006.
References
[1] T. Givargis, "Improved indexing for cache miss reduction in embedded systems," Proc. IEEE 2003 DAC, pp.875880, Anaheim, CA, June 2003.
[2] K. Patel, L. Benini, E. Macii, and M. Poncino, "Reducing cache misses by application-specific reconfigurable indexing," Proc. IEEE 2004 ICCAD, pp.125130, San Jose, CA, Nov. 2004.
[3] R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general purpose processor architectures," Proc. IEEE 2000 MICRO, pp.245257, Monterey, CA, Dec. 2000.
[4] P. Petrov and A. Orailoglu, "Towards effective embedded processors in codesigns: Customizable partitioned caches," Proc. IEEE 2001 CODES, pp.7984, Copenhagen, Denmark, April 2001.
[5] C. Zhang, F. Vahid, and W. Najjar, "A highly configurable cache architecture for embedded systems," Proc. IEEE 2003 ISCA, pp.136146, San Diego, CA, June 2003.
[6] D.H. Albonesi, "Selective cache ways: On-demand cache resource allocation," Proc. IEEE 1999 MICRO, pp.248259, Haifa, Israel, Nov. 1999.
[7] A. Seznec, "A case for two-way skewed-associative caches," Proc. IEEE 1993 ISCA, pp.169178, San Diego, CA, May 1993.
[8] A. Agarwal and S.D. Pudar, "Column-associative caches: A technique for reducing the miss rate of direct mapped caches," Proc. IEEE 2003 ISCA, pp.179180, San Diego, CA, May 1993.
[9] M. Kharbutli, K. Irwin, Y. Solihin, and J. Lee, "Using prime numbers for cache indexing to eliminate conflict misses," Proc. IEEE 2004 HPCA, pp.288299, Madrid, Spain, Feb. 2004.
[10] S. McFarling, "Program optimization for instruction caches," Proc. IEEE 1989 ASPLOS, pp.183191, Boston, Mass, April 1989.
[11] S.J.E. Wilton and N.P. Jouppi, "CACTI: An enhanced cache access and cycle time model," IEEE J. Solid-State Circuits, vol.31, no.5, pp.677688, May 1996.
[12] T.M. Austin, "The SimpleScalar/ARM Toolset," http://www.eecs.umich.edu/taustin/simplescalar
[13] A. Malik, B. Moyer, and D. Cermak, "A low power unified cache architecture providing power and performance flexibility," Proc. IEEE 2000 ISLPED, pp.241243, Rapallo, Italy, July 2000.
[14] A. Gonzalez, M. Valero, N. Topham, and J.M. Parcerisa, "Eliminating cache conflict misses through XOR-based placement functions," Proc. IEEE 1997 ICS, pp.7683, Vienna, Austria, July 1997.
[15] A.V. Aho, R. Sethi, and J.D. Ulman, Compilers - Principles, Techniques, and Tools, Addison-Wesley, 1986.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||