An extreme example that requires very little area and energy overhead is the filter introduced by Atoofian and Banisadi. This filter consists of a table of saturating counters at each cache, with one counter for each processor core in the system. On a snoop request to read data, the cache only performs a tag lookup if the counter corresponding to the core that sent the snoop request is saturated, otherwise it speculatively returns a negative reply. If a core incorrectly returns a negative response, either another core will have a copy of the same block and return a positive reply, or the requester will not receive any positive replies and it will re-broadcast its request and force all cores to perform lookups. A saturating counter is incremented when a cache supplies data in response to a snoop request from the coresponding core, and it is reset to zero when a snoop request from the coresponding core fails to find the data in the cache. This snoop filter requires very little on-chip area and consumes very little energy, and for some workloads that exhibit high levels of supplier locality, such as SPLASH-2, this filter works very well. However, for workloads that do not exhibit supplier locality, the need to re-broadcast snoops when a snoop is filtered incorrectly can result in significant performance losses.

E. Atoofian and A. Baniasadi, “Using Supplier Locality in Power-Aware Interconnects and Caches in Chip Multiprocessors”, Journal of Systems Architecture, vol. 54, no. 5, pp. 507-518, October 2007.