Previous PACTs: If you are looking for
PaCT (Parallel Computing Technologies), please follow this link: PaCT-2007. |
The Seventeenth International Conference on
|
SATURDAY, OCTOBER 25 |
|||
Morning |
TUTORIAL SimFlex and ProtoFlex: Fast, Accurate, and Flexible Simulation of Multicore Systems Eric Chung (echung@ece.cmu.edu), Mike Ferdman,
Nikos Hardavellas |
TUTORIAL Introducing Microthreading and its Programming Model Thomas Bernard (tbernard@science.uva.nl), Mike
Lankamp (mlankamp@science.uva.nl), Chris
Jesshope (jesshope@science.uva.nl) Universiteit van Amsterdam CANCELLED |
WORKSHOP Workshop on Parallel Architectures and Bioinspired
Algorithms J.
Ignacio Hidalgo (hidalgo@fis.ucm.es) Universidad Complutense de
Madrid |
Afternoon |
TUTORIAL Programming Models and Compiler Optimizations for
GPUs and Multi-Core Processors |
TUTORIAL Productive Parallel Programming in PGAS IBM |
|
SUNDAY, OCTOBER 26 |
|||
Morning |
|
WORKSHOP MEDEA: Workshop on MEmory performance: DEaling
with Applications, systems and architecture Sandro
Bartolini (bartolini@dii.unisi.it),
Pierfrancesco Foglia (foglia@iet.unipi.it) and Università degli Studi di Siena Università di Pisa |
TUTORIAL Transactional Memory Presenters: Yang Ni, |
Afternoon |
WORKSHOP WoSPS: Workshop on Soft Processor Systems |
http://www.ece.cmu.edu/~simflex
Computer
architects have long relied on software simulation to evaluate the
functionality and performance of architectural innovations. Unfortunately,
modern cycleaccurate simulators
are several orders of magnitude slower than real hardware and the growing
levels of hardware integration increase simulation complexity even further.
In addition, conventional simulators are optimized for speed at the expense
of code flexibility and maintainability. In this tutorial, we present the
SimFlex and ProtoFlex family of simulation tools for fast, accurate, and
flexible simulation of uniprocessor, multi-core and distributed shared-memory
systems. SimFlex achieves fast simulation turnaround while ensuring
representative results by leveraging the SMARTS simulation sampling
framework. At the same time, its component-based design allows for easy
composition of complex multi-core and multiprocessor systems. ProtoFlex is an
FPGA-accelerated simulation technology that complements SimFlex by enabling
full-system functional simulation of multiprocessor and multi-core systems at
speeds of one to two orders of magnitude faster than software tools.In this
tutorial, first we introduce attendees to the SMARTS simulation sampling
approach. We present relevant background from statistics and compare and
contrast statistical sampling with other sampling proposals. Second, we
present the design, implementation and use of the Flexus simulator suite.
Flexus is a family of component based C++ architecture simulators that
implement timing-accurate models of multi-core and multiprocessor systems. We
give attendees hands-on experience with Sim- Flex/TraceFlex, a Flexus model
for fast functional and memory system simulation, SimFlex/OoO, a Flexus model
for cycle-accurate simulation, and Flexus’ statistical managers and sampling
tools. Finally, we present a hands-on technology preview of ProtoFlex. We
give attendees the opportunity to compile, execute and profile multithreaded
applications on a real operating system running on a BEE2 FPGA platform.
http://domino.research.ibm.com/comm/research_projects.nsf/pages/xlupc.confs.html
Partitioned
Global Address Space (PGAS) programming languages offer an attractive,
high-productivity programming model for parallel programming. PGAS languages,
such as Unified Parallel C (UPC), combine the simplicity of shared-memory
programming with the efficiency of the message-passing paradigm. The
efficiency is obtained through a combination of factors: programmers declare
how the data is partitioned and distributed between threads and use the SPMD
programming model to define work; compilers can use the data annotations to
optimize accesses and communication. We have demonstrated that UPC
applications can outperform MPI applications on large-scale machines, such as
BlueGene/L.
In this
tutorial we shall present our work on the IBM's XLUPC Compiler. We will
discuss language issues, compiler optimizations for PGAS languages, runtime
trade-offs for scalability and performance results obtained on a number of
benchmarks and applications. Attendants should not only gain a better
understanding of parallel programming, but also learn about compiler and
system limitations. The expected outcome is that programmers will be able to
code their applications such that performance optimization opportunities are
exposed and exploited.
Transactions
have recently emerged as a promising alternative to lock-based synchronization.
The tutorial will cover a range of topics related to transactional memory
spanning from the description of high-level language constructs and their
semantics to the low-level details of specific algorithms used to support
efficient execution of these constructs. We will take a programming systems
view of transactional memory and walk the audience through each layer of the
system starting from the top-level programmer's view of transactional memory
and working down to the implementation level. We show how transactional
memory can avoid the problems of lock-based synchronization such as deadlock
and poor scalability when lock-based software modules are composed. We
discuss how transactional constructs can be added to languages, such as C/C++
or Java, as an alternative to current synchronization constructs. We present
software strategies for implementing transactional memory and show how to
leverage compiler optimizations to reduce its overheads. We also describe our
experience writing transactional applications and present the experimental
results comparing their performance with that of the lock-based applications.
Finally, we discuss the advanced topics related to the semantics of
transactional language constructs including isolation levels and integration
with the language memory models.
Yang
Ni is a Research
Scientist in Intel's Programming Systems Lab. He has been working on
programming languages for platforms from mobile devices to chip multi
processors. His current research focuses on transactional memory. He is a
major contributor to the Intel C/C++ TM compiler. Yang received his Ph.D. in
Computer Science from
Adam
Welc is a
Research Scientist in Intel's Programming Systems Lab. His work is in the
area of programming language design and implementation, with specific
interests in concurrency control, compiler and run-time system optimizations,
transactional processing as well as architectural support for programming
languages and applications. Adam received the Master of Science in Computer
Science from Poznan University of Technology, Poland, in July 1999. He
continued his graduate studies at
Tatiana
Shpeisman is a
Research Scientist in Intels Programming Systems Lab. Her general research
interest lies in finding ways to simplify software development while
improving program efficiency. Her current research focuses on the semantics
of transactional memory. In the past, she worked on dynamic compilation for
managed runtime environments, IPF code generation and compiler support for
sparse matrix computations. She holds Ph.D. in Computer Science from
http://atcadmin.dacya.ucm.es/bioinspired/
Parallel Computer Architecture and Bioinspired Algorithms
have been coming together during the last years. On one hand, the application
of Bioinspired Algorithm to solve difficult problems has shown that they need
high computation power and communications technology. Parallel architectures
and Distributed systems have offered an interesting alternative to sequential
counterparts. On the other hand, and perhaps which is more interesting for
the Computer Architecture community, Bioinspired algorithms comprises a
series of heuristics that can help to optimize a wide range of tasks required
for Parallel and Distributed architectures to work efficiently. Genetic
Algorithms (GAs), Genetic Programming (GP), Ant Colonies Algorithms (ACAs) or
Simulated Annealing (SA) are nowadays helping computer designers on the
advance of Computer Architecture, while improvement on parallel architectures
are allowing to run computing intensive Bioinspired algorithms for solving
other difficult problems. We can find in the literature several evolutionary
solutions for design problems such as partitioning, place and route, etc..
which allows technology improvements. Researchers have also used this kind of
meta-heuristics for the optimization of computer architectures, balancing
computer load, instructions code, and other related problems. Nevertheless,
any effort for increasing the relationship between them would be very welcome
by the community. This workshop will gather scientists, engineers, and
practitioners to share and exchange their experiences, discuss challenges,
and report state-of-the-art and in-progress research on all aspects of the
answer to two questions: What can Bioinspired Algorithms do for Parallel
Computer Architectures? And What can Parallel Computer Architectures do for
Bioinspired Algorithms?
http://www.eecg.toronto.edu/wosps08/
Processors implemented in programmable logic, called soft
processors, are becoming increasingly important in both industry and
academia. FPGA-based processors provide an easy way for software programmers
to target FPGAs without having to write hardware-description language---hence
designers of FPGA-based embedded systems are increasingly including soft
processors in their designs. Soft processors will also likely play an
important role in FPGA-based co-processors for high-performance computing.
Furthermore, academics are embracing FPGA-based processors as the foundation
of systems for faster architectural simulation. In all cases, we need to
develop a deeper understanding of processor and multiprocessor architecture
for this new medium.
This workshop will serve as a forum for academia and industry
to discuss and present challenges, ideas, and recent developments in soft
processors, soft multiprocessors, application-specific soft processors, and
soft-processor-based accelerators and architectural simulation platforms.
http://garga.iet.unipi.it/medea08/
MEDEA aims to continue the high level of interest of the
previous editions held with PACT Conference since 2000.
Due to the ever-increasing gap between CPU and memory
speed, there is always great interest in evaluating and proposing processor,
multiprocessor, CMP, multi-core and system architectures dealing with the
"memory wall" and wire-delay problems. At the same time, a modular
high-level design is becoming more and more attracting in order to reduce
design costs.
In this scenario, design solutions and their corresponding
performance are shaped by the combined pressure of a) technological
opportunities and limitations, b) features and organization of system
architecture and c) critical requirements of specific application domains.
Evaluating and controlling the effects on the memory subsystem (e.g. caches,
interconnection, bus, memory, coherence) of any architectural proposal is
extremely important both from the performance (e.g. bandwidth, latency,
predictability) and power (e.g. static, dynamic, manageability) points of
view.
In particular, the emerging trend of single-chip
multi-core solutions, will push towards new design principles for memory
hierarchy and interconnection networks, especially when the design is aimed
to build systems with a high number of cores (many-core instead of multi-core
systems), which aim to scale performance and power efficiency in a variety of
application domains.
From a slightly different point of view, the mutual
interaction between the application behavior and the system on which it
executes, is responsible of the figures of merit of the memory subsystem and,
therefore, pushes towards specific solutions. In addition, it can suggest
specific compile/link time tunings for adapting the application to the
features of the target architecture.
In the overall picture, power consumption requirements are
increasingly important cross-cutting issues and raise specific challenges.
Typical architectural choices of interest include, but are
not limited to, single processors, chip and board multiprocessors, SoC,
traditional and tiled/clustered architectures, multithreaded or VLIW
architectures with emphasis on single-chip design, massive parallelism
designs, heterogeneous architectures, architectures equipped with
application-domain accelerators as well as endowed with reconfigurable
modules. Application domains encompass embedded (e.g. multimedia, mobile,
automotive, automation, medical), commercial (e.g. Web, DB, multimedia),
scientific and networking applications, security, etc. The emerging network
on chip infrastructure and transactional memory may suggest new solutions and
issues.
MEDEA Workshop wants to continue to be a forum for
academic and industrial people to meet, discuss and exchange their ideas,
experience and solutions in the design and evaluation of architectures for
embedded, commercial and general/special purpose systems taking into account
memory issues, both directly and indirectly.
Proceedings of the Workshop will be published under ACM
ISBN, and appear also in the ACM Digital Library. As in the previous years, a
selection of papers will be considered for publication on transactions on
HIPEAC (http://www.hipeac.net/journal)
The format of the workshop includes the presentation of
selected papers and discussion after each presentation.