inclusive_snoop_filters [Snoop Filtering]

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/4/4258f6e28e85ed4b9e12d87faab7f8aa.i failed

Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/4/4258f6e28e85ed4b9e12d87faab7f8aa.i failed

Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.

Writing /fs1/eecg/moshovos/a/a3/moshovos/public_www/filter/data/cache/4/4258f6e28e85ed4b9e12d87faab7f8aa.xhtml failed

- Subset Filters
  - Straus et al.
- Superset Filters
  - Stream Registers
  - Counting Bloom Filter-Based Snoop Filters

Inclusive filters track all blocks in the cache that could possibly return positive responses to a snoop request. The key challenge for such an inclusive filter is providing an area- and energy-efficient structure that can represent this set of blocks.

Subset Filters

Straus et al.

Strauss et al. meet this challenge by focusing on filtering snoops for read requests and only tracking those lines that are in a supplier state and might provide data in response to a snoop. A subset filter tracks the subset of blocks in the cache that are in one of the supplier states. Blocks in shared state will never return a positive reply to a read snoop, so they are not tracked by the filter. A set-associative array contains the tags of all supplier blocks in the cache. To guarantee that all all supplier blocks are tracked by the filter, blocks are downgraded to shared state when the filter doesn't have space to track all supplier lines. This can result in additional write-backs to memory. In addition to this potential increase in memory bandwidth, the subset filter is also restricted in only filtering out snoops for read requests, and cannot filter out invalidate messages.

Strauss, K., Shen, X., and Torrellas, J. 2006. Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors. SIGARCH Comput. Archit. News 34, 2 (May. 2006), 327-338. DOI= http://doi.acm.org/10.1145/1150019.1136514

Superset Filters

An alternative to this type of subset filter is to use a superset filter that tracks a superset of all of the blocks in the cache. Such a filter sacrifices some precision by potentially tracking more blocks than the cache contains, but benefits by using a more space-efficient representation of those blocks. Two good examples of superset filters are stream registers and counting bloom filters.

Stream Registers

A stream register provides a very compact representation of a set of memory blocks. It comprises a base address and a mask. The mask indicates which bits are common for the addresses of \emph{all} of the blocks represented by the stream register and the base address indicates the values for these common bits. To test whether a block might be a member of the set represented by the stream register, the address of the block is XOR'd with the base address and the result is ANDed with the mask. If the result of this test is zero, then the block might be a member of the set. If the result is non-zero, then the address is definitely not in the set.

One or more stream registers can be used to track the blocks allocated to a cache and act as a snoop filter. The tag array only needs to perform lookups for snoops that find match the address and mask of the stream register. Unfortunately, stream registers cannot be updated to reflect which blocks have been evicted from the cache, and over time, they become saturated with most or all of the bits in the mask set to zero. Using multiple stream registers helps alleviate this problem. Each new block can be added to the register with the longest string of matching high-order bits, slowing down the rate at which the mask bits are zeroed. This makes the stream registers more precise, but they still become stale as blocks that are evicted from the cache are still represented by the stream registers long after they leave.

To address this problem, two sets of stream registers can be used: an active set, and a history set. The history set can be thought of as representing the original cache contents, and new blocks are added only to the active set. As more blocks are allocated to the cache, eventually all of the original blocks tracked by the history set will have been evicted from the cache. At this point, the history set can be discarded, and the active set can become the new history set. This provides a mechanism for periodically clearing stale information out of the stream registers and maintaining their precision. While it might seem complicated to detect the wrap-around condition when all blocks in the cache have been evicited, certain designs make this task easier. In particular, if a cache uses a round-robin, or FIFO, replacement policy, then an array of counters can be used to track when cache wrap-around has occured. This is the approach used in the Blue Gene/P which incorporates stream registers in its snoop filter.

Valentina Salapura, Matthias A. Blumrich, Alan Gara: Design and implementation of the blue gene/P snoop filter. HPCA 2008

Counting Bloom Filter-Based Snoop Filters

Moshovos et al. proposed Jetty which uses a counting Bloom filter to track all the blocks in the cache. This work focused on SMP systems where each processor has a relatively large L2 cache which is tracked by a counting Bloom filter. Jetty also includes as small table caching recent snoop requests that missed in the local cache.

JETTY: Snoop Filtering for Reduced Energy Consumption in SMP Servers, Andreas Moshovos, Gokhan Memik, Babak Falsafi and Alok Choudhary, Proc. of HPCA-7 (Symposium on High Performance Computer Architecture), Jan. 2001.

Ballapuram et al. proposed using counting Bloom filters to represent the contents of L1 caches in chip-multiprocessors. To support self-modifying code, some processors snoop the local L1 instruction cache on each store. Ballapuram et al. propose using a counting Bloom filter similar to Jetty to filter these snoops. In addition, Ballapuram et al. propose using counting Bloom filters to track the blocks in L1 data caches. Unlike the Jetty approach where all cache blocks are tracked in a single Bloom filter, Ballapuram et al. use multiple filters and only track non-stack data blocks. Since all stack data is private to each thread other cores should never send snoops for a core's private stack data. One filter tracks only blocks in Modified or Exclusive state, while the other tracks only lines in the Shared state. Since more snoops access blocks in shared state, tracking them in a seperate filter reduces the rate of false positives.

Ballapuram, C. S., Sharif, A., and Lee, H. S. 2008. Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. In Proceedings of the 13th international Conference on Architectural Support For Programming Languages and Operating Systems (Seattle, WA, USA, March 01 - 05, 2008). ASPLOS XIII. ACM, New York, NY, 60-69. DOI= http://doi.acm.org/10.1145/1346281.1346290