Skip to main content Accessibility help
×
Hostname: page-component-5c6d5d7d68-qks25 Total loading time: 0 Render date: 2024-08-18T07:14:41.325Z Has data issue: false hasContentIssue false

6 - The Cache Hierarchy

Published online by Cambridge University Press:  05 June 2012

Jean-Loup Baer
Affiliation:
University of Washington
Get access

Summary

We reviewed the basics of caches in Chapter 2. In subsequent chapters, when we looked at instruction fetch in the front-end and data load–store operations in the back-end, we assumed most of the time that we had cache hits in the respective first-level instruction and data caches. It is time now to look at the memory hierarchy in a more realistic fashion. In this chapter, our focus is principally on the cache hierarchy.

The challenge for an effective memory hierarchy can be summarized by two technological constraints:

  • With processors running at a few gigahertz, main memory latencies are now of the order of several hundred cycles.

  • In order to access first-level caches in 1 or 2 cycles, their size and associativity must be severely limited.

These two facts point to a hierarchy of caches: relatively small-size and small-associativity first-level instruction and data caches (L1 caches); a large second-level on-chip cache with access an order of magnitude slower than L1 accesses (L2 cache generally unified, i.e., holding both instructions and data); often in high-performance servers a third-level cache (L3) off chip, with latencies approaching 100 cycles; and then main memory, with latencies of a few hundred cycles. The goal of the design of a cache hierarchy is to keep a latency of one or two cycles for L1 caches and to hide as much as possible the latencies of higher cache levels and of main memory.

Type
Chapter
Information
Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 208 - 259
Publisher: Cambridge University Press
Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A. and Pudar, S., “Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 179–190Google Scholar
Baer, J.-L. and Wang, W.-H., “On the Inclusion Properties for Multi-Level Cache Hierarchies,” Proc. 15th Int. Symp. on Computer Architecture, 1988, 73–80Google Scholar
Crisp, R., “Direct Rambus Technology: The New Main memory Standard,” IEEE Micro, 17, 6, Nov.–Dec. 1997, 18–28CrossRefGoogle Scholar
Chen, T.-F. and Baer, J.-L., “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Trans. on Computers, 44, 5, May 1995, 609–623CrossRefGoogle Scholar
Calder, B., Grunwald, D., and Emer, J., “Predictive Sequential Associative Cache,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 244–253CrossRefGoogle Scholar
Conti, C., Gibson, D., and Pitkowsky, S., “Structural Aspects of the IBM System 360/85; General Organization,” IBM Systems Journal, 7, 1968, 2–14CrossRefGoogle Scholar
Chan, K., Hay, C., Keller, J., Kurpanek, G., Shumaker, F., and Zheng, J., “Design of the HP PA 7200 CPU,” Hewlett Packard Journal, 47, 1, Jan. 1996, 25–33Google Scholar
Cuppu, V., Jacob, B., Davis, B., and Mudge, T., “High-Performance DRAMs in Workstation Environments,” IEEE Trans. on Computers, 50, 11, Nov. 2001, 1133–1153CrossRefGoogle Scholar
Cooksey, R., Jourdan, S., and Grunwald, D., “A Stateless, Content-Directed Data Prefetching Mechanism,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 279–290CrossRefGoogle Scholar
Farkas, D. and Jouppi, N., “Complexity/Performance Trade-offs with Non-Blocking Loads,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 211–222Google Scholar
Hallnor, E. and Reinhardt, S., “A Fully Associative Software-Managed Cache Design,” Proc. 27th Int. Symp. on Computer Architecture, 2000, 107–116Google Scholar
Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th Int. Symp. on Computer Architecture, 1990, 364–373CrossRefGoogle Scholar
Joseph, D. and Grunwald, D., “Prefetching Using Markov Predictors,” Proc. 24th Int. Symp. on Computer Architecture, 1997, 252–263CrossRefGoogle Scholar
Kroft, D., “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 81–87Google Scholar
Kim, C., Burger, D., and Keckler, S., “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 2002, 211–222CrossRefGoogle Scholar
Kessler, R., Jooss, R., Lebeck, A., and Hill, M., “Inexpensive Implementations of Set-Associativity,” Proc. 16th Int. Symp. on Computer Architecture, 1989, 131–139CrossRefGoogle Scholar
Kalamatianos, J., Khalafi, A., Kaeli, D., and Meleis, W., “Analysis of Temporal-based Program Behavior for Improved Instruction Cache Performance,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 168–175CrossRefGoogle Scholar
Kalla, R., Sinharoy, B., and Tendler, J., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, 24, 2, Apr. 2004, 40–47CrossRefGoogle Scholar
Lai, A., Fide, C., and Falsafi, B., “Dead-block Prediction & Dead-block Correlation Prefetchers,” Proc. 28th Int. Symp. on Computer Architecture, 2001, 144–154CrossRefGoogle Scholar
Lin, W.-F., Reinhardt, S., and Burger, D., “Designing a Modern Memory Hierarchy with Hardware Prefetching,” IEEE Trans .on Computers, 50, 11, Nov. 2001, 1202–1218Google Scholar
Mowry, T., Lam, M., and Gupta, A., “Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors,” Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1992, 62–73Google Scholar
Pettis, K. and Hansen, R., “Profile Guided Code Positioning,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 25, Jun. 1990, 16–27Google Scholar
Peir, J.-K., Hsu, W., and Smith, A. J., “Functional Implementation Techniques for CPU Cache Memories,” IEEE Trans. on Computers, 48, 2, Feb. 1999, 100–110CrossRefGoogle Scholar
Palacharla, S. and Kessler, R., “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. 21st Int. Symp. on Computer Architecture, 1994, 24–33Google Scholar
Smith, A., “Cache Memories,” ACM Computing Surveys, 14, 3, Sep. 1982, 473–530CrossRefGoogle Scholar
Seznec, A., “A Case for Two-way Skewed-Associative Caches,” Proc. 20th Int. Symp. on Computer Architecture, 1993, 169–178Google Scholar
Tendler, J., Dodson, J., Fields, J., Jr., Le, H., and Sinharoy, B., “POWER4 System Microarchitecture,” IBM Journal of Research and Development, 46, 1, Jan. 2002, 5–24CrossRefGoogle Scholar
Vanderwiel, S. and Lilja, D., “Data Prefetch Mechanisms,” ACM Computing Surveys, 32, 2, Jun. 2000, 174–199CrossRefGoogle Scholar
Wong, W. and Baer, J.-L., “Modified LRU Policies for Improving Second-Level Cache Behavior,” Proc. 6th Int. Symp. on High-Performance Computer Architecture, 2000, 49–60Google Scholar
Zhang, Z., Zhu, Z., and Zhang, X., “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int. Symp. on Microarchitecture, 2000, 32–41Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • The Cache Hierarchy
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.007
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • The Cache Hierarchy
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.007
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • The Cache Hierarchy
  • Jean-Loup Baer, University of Washington
  • Book: Microprocessor Architecture
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811258.007
Available formats
×