source-based_snoop_filters [Snoop Filtering]

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/6/64568ded846973185cbf16258330b409.i failed

Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/6/64568ded846973185cbf16258330b409.i failed

Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/6/64568ded846973185cbf16258330b409.xhtml failed

- - Speculative Selective Requests
  - Software-Assisted Filters
- Coarse-Grain Filters

The unnecessary tag lookups that are filtered out by destination filters still require bandwidth for snoop requests and responses. If the requesting cache is able to predict in advance either that no other caches will have copies of the requested block, or that only certain cache might have copies, then it can avoid sending a snoop broadcast all together, or potentially send a multicast instead of a full broadcast. This is the approach used by source-based snoop filters, and the reduced network bandwidth provides both an energy savings and also a mechanism to allow snoop protocols to scale to more cores.

Speculative Selective Requests

Speculative selective requests (SSR) uses a single saturating counter and a supplier ID register at each core to predict when a remote cache might supply requested data in response to a snoop broadcast. When the same remote cache replies to multiple successive requests from the same core, that core's counter will saturate and future requests will be sent directly to that supplier cache. When the counter is un-saturated, or when a misprediction occurs, a full snoop broadcast is performed. For workloads that exhibit a large degree of supplier locality, such as SPLASH-2, data is often streamed from one cache to another, making SSR very effective. However, other workloads with less predictable behaviour see little benefit from SSR as the counter's seldom saturate, or potentially suffer performance losses if frequent mispredictions occur.

E. Atoofian, A. Baniasadi, and K. Aasaraai, “Speculative Supplier Identification for Reducing Power of Interconnects in Snoopy Cache Coherence Protocols”, In the ACM International Conference on Computing Frontiers, May 7-9, 2007.

Software-Assisted Filters

Ballapuram et al. also prosed a number of source-based snoop filters that avoided broadcasting snoops for certain accesses where those snoops were known to fail. The first filter avoids all snoop broadcasts for accesses to data in the stack of each thread. Such data is known to be private, and most accesses to the stack can be identified during decode by examining the source and destination operands of the instruction. This assumes certain software properties.

Another proposed filter mechanism adds a Snoop-Me-Not bit to all instructions that access memory. A number of approaches are suggested for how programmers, compilers, libraries and operating systems can work together to support such a scheme that would identify all memory accesses to data that is known to be private to a thread so processors only need to perform snoops when this bit is not set. Both of these filters require cache flushes on context switches, but the infrequency of context switches minimizes this overhead. Finally, Ballapuram et al. prospose a snoop filter targeted at instruction cache snoops generated to support self-modifying code (SMC). In some processors, SMC is supported by snooping the L1 instruction cache on each store to the data cache. If the compiler and system libraries can identify non-SMC code, they can put the processor in a non-SMC mode that avoids all of these unnecessary snoops.

All three of these filters proposed by Ballaprum et al. require minimal hardware support and offer the potential for eliminating large numbers of unnecessary snoop requests. However, they all involve some level of software support. Even the identifying stack accesses relies on using common conventions to access stack locations, and \emph{never} using those same methods to access non-stack data. The Snoop-Me-Not bit much more difficult to effectively support in software, requiring support from compilers, libraries, and potentially even programmer direction. The other filters described in this section are more conservative in their approaches, but rely soley on hardware to identify unnecessary snoops and are transparent to any software running on the system.

Chinnakrishnan S. Ballapuram, Ahmad Sharif, and Hsien-Hsin S. Lee. “Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors.” In Proceedings of the 13th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp.60-69, Seattle, WA, March, 2008.

Coarse-Grain Filters

A number of different works have proposed techniques that track coarse-grain sharing information and use Although these works propose different specific implementations, they all follow the same general approach. Each cache has some mechanism for identifying whether a region is present in the cache, and for attaching some state to the region. On a snoop broadcast, the normal snoop response is augmented with information to indicate which caches contain any blocks from the region containing the block addressed by the snoop. When no other caches contain any blocks in a region, the processor core that issued the snoop broadcast records that the region is non-shared and any future requests by that core avoid sending snoop broadcasts and directly access memory.