Previous PACTs: If you are looking for
PaCT (Parallel Computing Technologies), please follow this link: PaCT-2007. |
The Seventeenth International Conference on
|
Sunday, October 26 |
|
17:30 – 19:00 Rescheduled |
Reception |
Monday, October 27 |
|
8:00 - 8:30 |
Breakfast |
8:30 - 9:30 |
Keynote 1:
Norm Rubin, AMD - GPU Evolution: Will Graphics Morph Into Compute? |
9:30 - 10:00 |
Break |
10:00 - 12:00 |
Session 1: Compilation Outer-Loop
Vectorization - Revisited for Short SIMD Architectures, Dorit Nuzman
and Ayal Zaks Redundancy
Elimination Revisited, Keith Cooper, Jason Eckhardt and Ken Kennedy Exploiting Loop-Dependent Stream Reuse for Stream Processors, Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, Gen Li and Guibin Wang Feature Selection and Policy Optimization for 3D Instruction Placement using Reinforcement Learning, Katherine Coons, Behnam Robatmili, Matthew Taylor, Doug Burger and Kathryn McKinley |
12:00 – 1:30 |
Lunch (provided) |
1:30 – 3:00 |
Session 2: CMP Architecture
Design Core Cannibalization
Architecture: Improving Lifetime Chip Performance for Multicore Processors
in the Presence of Hard Faults, Bogdan Romanescu and Daniel Sorin Pangaea:
A Tightly-Coupled IA32 Heterogeneous Chip Multiprocessor, Henry Wong,
Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H.
Wang, Gautham Chinya, Ankur Khandelwal Groen, Hong Jiang, and Hong Wang Skewed Redundancy, Gordon Bell and Mikko Lipasti |
3:00 – 3:30 |
Break |
3:30 – 5:00 |
Session 3A: Analyzing
Applications The PARSEC
Benchmark Suite: Characterization and Architectural Implications,
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh and Kai Li Visualizing
Potential Parallelism in Sequential Programs, Graham Price, John
Giacomoni and Manish Vachharajani Characterizing and Modeling the Behavior of Context Switch Misses, Fang Liu, Fei Guo and Yan Solihin |
Session 3B: I/O Optimizations MCAMP:
Communication Optimization on Massively Parallel Machines with Hierarchical
Scratch-pad Memory, Hiroshige Hayashizaki, Yutaka Sugawara, Mary Inaba
and Kei Hiraki Profiler
and Compiler Assisted Adaptive I/O Prefetching for Shared Storage Caches,
Seung Woo Son, Sai Prashanth Muralidhara, Ozcan Ozturk, Mahmut Kandemir,
Ibrahim Kolcu and Mustafa Karakoy Optimizing One-Sided Communication of Multiple Disjoint Memory Regions, Costin Iancu |
|
Tuesday, October 28 |
|
8:00 - 8:30 |
Breakfast |
8:30 - 9:30 |
Keynote 2: Saman Amarasinghe, MIT - (How) Can Programmers Conquer the Multicore Menace? |
9:30 - 10:00 |
Break |
10:00 - 11:30 |
Session 4: Multicore Memory
Hierarchy Design (Part 1) Distributed
Cooperative Caching, Enric Herrero Abellanas, José González González
and Scalable
and Reliable Communication for Hardware Transactional Memory, Seth
Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev
Balasubramonian Improving Support for Locality and Fine-Grain Sharing in Chip Multiprocessors, Hemayet Hossain, Sandhya Dwarkadas and Michael Huang |
11:30 – 1:00 |
Lunch (provided) |
1:00 – 2:30 |
Session 5: Reconfigurable
Architecture Optimization Edge-centric
Modulo Scheduling for Coarse-Grained Reconfigurable Architectures,
Hyunchul Park, Kevin Fan, Scott Mahlke, Taewook Oh, Heeseok Kim and
Hong-seok Kim Chip multi-processor
global power management with multi-optimization power-saving strategies,
Ke Meng and Russ Joseph Multitasking Workload Scheduling on Flexible-Core Chip Multiprocessors, Divya P. Gulati, Changkyu Kim, Simha Sethumadhavan, Stephen W. Keckler and Doug Burger |
2:30 – 3:00 |
Break |
3:00 – 4:30 |
Session 6: Multicore Memory
Hierarchy Design ( Part 2) Leveraging
On-Chip Networks for Cache Migration in Chip Multiprocessors, Noel
Eisley, Li-Shiuan Peh and Li Shang Adaptive
Insertion Policies for Managing Shared Caches on CMPs, Aamer Jaleel,
William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely and Joel
Emer Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors, Yunlian Jiang, Xipeng Shen, Jie Chen and Rahul Tripathi |
4:30 – 5:00 |
Break |
5:00 – 6:00 |
Session 7: Multithreading
Improvements An
Adaptive Resource Partitioning Algorithm for SMT Processors, Meeting Points: Using Thread Criticality to Adapt Multicore Hardware to Parallel Regions, Qiong Cai, Jose Gonzalez, Ryan Rakvic, Grigorios Magklis, Pedro Chaparro and Antonio Gonzalez |
6:30 – 9:30 |
Special Event: Visit the CN Tower Reception at the Horizons Café and Dinner at the 360 Restaurant |
Wednesday, October 29 |
|
8:00 - 8:30 |
Coffee Break |
8:30 – 10:00 |
Session 8: Middleware and
Runtime Systems Prediction
Models for Multi-dimensional Power-Performance Optimization on Many Cores,
Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S.
Nikolopoulos, Bronis R. de Supinski and Martin Schulz Mars: A
MapReduce Framework on Graphics Processors, Bingsheng He, Wenbin Fang,
Qiong Luo, Naga Govindaraju and Tuyong Wang Multi-mode Energy Management for Multi-tier Server Clusters, Tibor Horvath and Kevin Skadron |
10:00 – 10:30 |
Break |
10:30 – 12:00 |
Session 9: Programming the
Memory Hierarchy A Tuning
Framework for Software-Managed Memory Hierarchies, Manman Ren, Ji Young
Park, Mike Houston, Alex Aiken and William Dally Hybrid
Access-Specific Software Cache Techniques for the Cell BE Architecture,
Marc Gonzalez, Nikola Vujic, Alexandre E. Eichenberger, Tong Chen, Xavier
Martorell, Eduard Ayguada, Zehra Sura, Tao Zhang, Kevin O'Brien, and
Kathryn O'Brien COMIC: A Coherent Shared Memory Interface for Cell BE, Jaejin Lee, Sangmin Seo, Chihun Kim, Junghyun Kim, Posung Chun, Zehra Sura, Jungwon Kim and SangYong Han |
Keynote #1: Norm Rubin, AMD GPU Evolution: Will Graphics Morph Into Compute?
In the last several years GPU devices have started to evolve into supercomputers. New, non-graphics, features are rapidly appearing along with new more general programming languages. One reason for the quick pace of change is that, games and hardware evolve together: Hardware vendors review the most popular games, looking for places to add hardware while game developers review new hardware, looking for places to add more realism. Today, we see both GPU devices and games moving from a model of looks real to one of acts real. One consequence of acts real is that evaluating physics, simulations, and artificial intelligence on a GPU is becoming an element of future game programs.
We will review the difference between a CPU and a GPU. Then we will describe hardware changes added to the current generation of AMD graphics processors, including the introduction of traditional compute operations such as double precision, scatter/gather and local memory. Along with new features, we have added new metrics like performance/watt and performance/dollar. The current AMD GPU processor delivers 9 gigaflops/watt and 5 gigaflops/dollar. For the last two generations, each AMD GPU has provided double the performance/watt of the prior machine. We believe the software community needs to become more aware and appreciate these metrics.
Because this has been a kind of co-evolution and not a process of radical change, current GPU devices have retained a number of odd sounding transitional features, including fixed functions like memory systems that can do filtering, depth buffers, a rasterizer and the like. Today, each of these remain because they are important for graphics performance.
Software on GPU devices also shows transitional features. As AI/physics virtual reality starts to become important, development frameworks have started to shift. Graphics APIs have added compute shaders.
Finally, there has been a set of transitional programs implemented by graphics programmers but whose only real connection with graphics is that the result is rendered. One early example is toy shop which contains a weak physical simulation of rain on window (it looks great but the random number generator would not pass any kind of test). A more recent and better acting program is March of the Froblins an AI program related to robotic path calculations. This program both simulates large crowds of independent creatures and shows how massively parallel compute can benefit character-centric entertainment.
Bio: Dr Rubin is a fellow at AMD where he is the main
architect for the AMD/ATI graphics compiler. He has built commercial
compilers for processors ranging from embedded (ARM), desktop (HP, ALPHA) and
supercomputer (KSR). He has published numerous papers on compiler design.
Norm holds a PhD from the Courant Institute of NYU. Besides his work in
compilers, he is well known for his work in compiler related parts of the
tool chain, binary translators and dynamic optimizers.
Keynote
#2: Saman Amarasinghe, MIT - (How) Can Programmers Conquer the Multicore
Menace?
The era of exponential
improvement of processor performance, a byproduct of
One promising approach to
parallel programming is the use of novel programming language techniques --
ones that reduce the burden on the programmers, while simultaneously
increasing the compiler's ability to get good parallel performance. In the second part of the talk, I will
introduce StreamIt: a language and compiler specifically designed to expose
and exploit inherent parallelism in "streaming applications" such
as audio, video, and network processing.
StreamIt provides novel high-level representations to improve
programmer productivity within the streaming domain. By exposing the communication patterns of
the program, StreamIt allows the compiler to perform aggressive
transformations and effectively utilize parallel resources. StreamIt is ideally suited for multicore
architectures; recent experiments on a 16-core machine demonstrate an 11x
speedup over a single core.
Bio: Saman
P. Amarasinghe is an Associate Professor in the Department of Electrical
Engineering and Computer Science at Massachusetts Institute of Technology and
a member of the Computer Science and Artificial Intelligence Laboratory
(CSAIL). Currently he leads the Commit compiler group and was the co-leader
of the MIT Raw project. Under Saman's guidance, the Commit group developed
the StreamIt language and compiler for the streaming domain, Superword Level
Parallelism for multimedia extensions, DynamoRIO dynamic instrumentation
system, Program Shepherding to protect programs against external attacks, and
Convergent Scheduling and Meta Optimization that uses machine learning
techniques to simplify the design and improve the quality of compiler
optimization. His research interests are in discovering novel approaches to
improve the performance of modern computer systems and make them more secure
without unduly increasing the complexity faced by either the end users, application
developers, compiler writers, or computer architects. Saman was also the
founder of Determina Corporation, which productized Program Shepherding.
Saman received his BS in Electrical Engineering and Computer Science from