The long term goal of the Toronto Synthesis
Group
is to produce a
silicon compiler
at the
electronic system level (ESL). This research is driven by the
need of embedded system-on-chips (SOCs), the heart of many performance
demanding electronic products and applications, including but not
limited to, handheld, mobile and internet phones, audio/video
entertainment, gateways, wireless access points and internet TVs.
The design of embedded SOCs from product specification to silicon is
fundamentally challenged by the sheer complexity involved, including
the function complexity, thanks to the exponential growth of the
targeted application functionality, and the silicon complexity,
thanks to the electrical complications brought by deep submicron
manufacturing processes. Although many concrete point problems need to
be attacked, we are particularly intrigued by three haunting,
cross-cutting, intellectual challenges.
In answering these questions, we attempt to carry the two conceptual
frameworks forward and build an operational design methodology
consisting of tangible tools and languages. To do better than merely
offering design guidelines, we are particularly interested in highly
automated design methodology in the same spirit of silicon
compilation. We now summarize our perspectives in a series of simple
equations. Note that these equations do not pretend to be a formal
model - they serve only as articulation points. Also since they are
meant to be a vision, we dare ourselves to be rather idealistic.
[1]
Our first equation breaks the complete SOC design information at ESL
into the platform part, or the generic, reused portion, and the
signoff part, or the application specific, value-added portion. The
importance of this breakdown is to allow us to decompose chip design
into two entities that can separate in space, or time, or vendor, and
practice drastically different design methodologies. A perfect
example is the field programmable gate array (FPGA) business model,
where FPGA vendors supply the platforms designed by the expensive
custom design methodology, and the system vendors personalize the
platforms by supplying applications in programming bits produced by
the inexpensive FPGA design methodology. To keep
non-recurring-engineering (NRE) cost manageable, it is desirable to
have ``thick platform and thin applications''.
[2]
The second equation states that a complete platform at ESL needs to
provide reuse abstractions at the behavioral level, architectural
level and physical level. Examples of behavioral platforms are C,
Matlab, or various extensions of C/C++. Examples of the architectural
platforms include processor-centric architectures, heterogeneous
multiprocessor architectures, and massive-parallel
architectures. Examples of physical platform include placement,
routing, power, and communication grid abstraction for the ASIC,
structured ASIC, and FPGA technologies. Such abstraction is
particularly relevant for the latter. This of course does not say
much more than the platform stack model, except that we emphasize
their presence to be upfront and simultaneous. Having the
abstractions defined upfront makes it possible for automation based
synthesis. Have the three abstractions present simultaneously makes it
possible to incorporate physical effects during synthesis. Also note
that one behavioral platform can work with different types of
architecture platforms. For example, the C language can work with
either a single processor architecture platform or multiple processor
architectural platform. Likewise, an architectural platform can work
with different physical platforms. For example, a multiprocessor
architecture can work with either FPGA, structured ASIC, or ASIC.
[3]
The third equation materializes platforms into concrete forms. It
first states that each platform exposes to the platform user a
configuration language so as to add flexibility, and a programming
language so as to personalize the platform with the desired
function. It then states that each platform encapsulates all the
information necessary for final implementation within tools, one for
verification, the other for synthesis. It is precisely this the
information that should be abstracted away and made implicit in the
application development process. Through such abstraction, reuse
becomes more powerful than Intellectual Property (IP) assembly based
methodology since IPs are not visible - even though they may be used
under the hood.
[4]
The fourth equation states that the application consists of a series
of signoffs (sometimes also called handoffs), that are used to
program the behavioral, architectural and physical platforms. The word
signoff implies both completeness and verifiability. For example, a C
program is complete in the semantic domain of a C-based behavioral
platform and is verifiable by using a compiler tool chain on the
desktop. A binary executable is complete in the semantic domain of an
instruction set based architectural platform and is verifiable by
using an instruction set simulator. The programming bits of an FPGA is
complete in that it completely defines the FPGA function, and is
verifiable by downloading the programming bits into the corresponding
FPGA device.
[5]
The fifth equation states that the application signoffs carry both the
hardware and the software. Note that this view significantly differs
from the layered abstraction model of traditional computer systems,
where hardware is abstracted away by the architecture and operating
system, and the application is purely software. Constraining the
programmability of platforms only to software programmability will
seriously limit their applicability. This equation hence lifts the
value-added hardware, typically application specific accelerators, to
the user space by bypassing the slow operating system/IO
interface.
[6]
The last equation breaks the dichotomy of application hardware and
software and boldly states they should make no difference from the
design methodology point of view: they should be programmed in the same
language not only in the behavioral signoff, but also in the
architectural signoff. This view can only be made possible by a
capable behavioral synthesis tool.
We now return to the engineering mode and describe the efforts taking
place in the Toronto Synthesis Group in pursing the vision in ring
chart.
We build an architectural platform, called Metabacus. Metabacus
is heterogeneous in nature and contains an extensible processor
(OpenRISC, Nios etc), generic IO cores, an on-chip network, and custom
cores. By specializing the on-chip network in different flavors, we
have several families of the platform. Although targeting middle end,
high end and massive parallel applications respectively, a common
characteristics of Metabacus I, II, and III is to have direct hardware
support for the context flow programming model.
We intend to build several physical platforms to abstract the silicon
structures of FPGA, structured ASIC and ASIC technologies. The idea is
to extract relevant placement, routing, power and clock grid
information necessary for high level design exploration. Although
nothing has been carved into stone at this point, it is perhaps best
to start with the
VPR
architecture specifications used in FPGA research and generalize from
there.
Goal
Challenge
Abstraction Wall
Despite constant debates in the
community, the exact meaning of ESL and how it can be abstracted
has not been agreed upon. To make things worse, The layered
abstraction model no longer reconciles with quality of results:
low-level timing and power issues may invalidate decisions made at
the high level, leading to design convergence problems.
Heterogeneity Wall
The heterogeneity of an SOC
architecture, a dichotomy of hardware (processor cores, generic
and custom hardware cores) and software (firmware, operating
systems, embedded application software), dictates the
heterogeneity of design tools, methodologies and teams, which
often lead to the reduction of explored design space and
prolonged design schedule, therefore decreased design
performance and increased design cost.
Scalability Wall
The conventional algorithms may not
keep up with the exponential growth of the information they
process. It is not uncommon even today to see tasks that need to
solve billions of instances of problems, yet our options for
improving the algorithm capacity seem to have been exhausted.
Wisdom
Apparently overcoming the first wall precedes any algorithmic
solutions. For that we need to consult the prophets.
Y Chart
We first draw wisdom from the
Gajski
Y-Chart (1983). Y-chart
taught us that a design can be orthogonally decomposed into the
behavioral domain, which captures the chip function, the structural
domain, which captures the chip architecture as a network of
computational resources, and the physical domain, which adds the
geometric configuration to the architecture. A natural question to ask
is: if we are to push the envelop above the register transfer level
(RTL), then other than calling it the
electronic
system level (ESL), how should the behavioral,
structural and physical domains be abstracted?
Platform Stack
We next draw wisdom from
Sangiovanni-Vicentelli's
Platform
stack
model (2002), which states that chip design is carried through stacks
of application platform, architecture platform and silicon platform,
where a platform is an abstraction layer. It is easy to relate the
platform stack to the Y-chart by equating the application,
architecture and silicon platforms to the behavioral, structural and
physical domains, but what the stacked platform model taught us is
that chip design does not have to follow a top-down approach, as we
took for granted from silicon compilation. In other words, the
architecture does not have to be derived from the application, and the
silicon does not have to be derived from the architecture. Instead,
part of the architecture or silicon can be designed separately, or
reused. What is learned from the mistake of intellectual
property (IP) based design is: reuse should not just be components
stored in the library, which does not reduce integration and
verification complexity, but should somehow be abstracted. What
remains unanswered, again, is how exactly can we abstract reuse?
Vision
ESL = ESL platform + ESL application
ESL platform = behavioral platform + architectural platform + physical platform
behavioral platform | architectural platform | physical platform
= configuration language + programming language + verification tool + synthesis tool
ESL application = behavioral signoff + architectural signoff + physical signoff
behavioral signoff | architectural signoff | physical signoff = hardware + software
hardware = software = program
Ring Chart
We can visualize the above ideas using the new
ring chart, which recasts the
platform stack model onto the Y chart framework, and enforces several
programming abstractions that eliminate the differences between hardware and
software.
Action
Platforms
Given the existing investment in education and
the amount of legacy code base, we envision that the behavioral
platform should be C/C++, perhaps enhanced with composability and
concurrency. Before the dust for yet another language war settles, we
concentrate only on the minimal technical core: we build a language
platform based on and only on the complete ANSI C language,
equipped with a simple component based programming model, called context flow, which formalizes component interaction in and only
in the form of interface invocations.
Diamond: Scalable Behavioral Synthesis
We deliver a new type of behavioral synthesis tool in the
Diamond
project, which
synthesizes C program, or the behavioral signoff, into the binary code
for application software and hardware, or the architectural
signoff. Optionally, Verilog code for hardware can be generated. In
addition to putting behavioral synthesis technology within the
complete ESL design setting, Diamond strives to scale processor
architecture by automatic instruction extension, scale custom hardware
architecture by supporting memory and procedure abstractions, and
scale the scope of the applications by employing scalable program
analysis techniques separately pursued in the
Quanton
project.
Matrix: System-On-Chip Programming and Verification
We envision that the architecture signoff should be a binary program
based on instruction sets. While the instruction sets for processors
are readily available, the challenge is to extend the same concept to
custom hardware. We pursue this effort in the
Matrix
project, which
strives to define a binary abstraction called the hardware
carrying code (HCC), used as architectural signoff. In addition, we
produce a verification environment in which both the application
software and hardware can be simulated together with implicit
resources in the platform, such as generic IO devices. This makes it
possible for full system simulation on which even the operating system
kernel is simulated. To scale the simulation performance, we employ
the trace-based dynamic compilation technology.
Grid: Scalable Architecture and RTL Synthesis
We deliver a synthesis tool in the
Grid
project, which
bridges the gap between the architectural signoff and the physical
signoff. The tool performs several tasks. First, it generates RTL code
for custom cores from HCC. Second, it retrieves the reused IP
components and generates the top level RTL for the personalized
platform. Third, it performs physical driven RTL synthesis. The key
challenges we address in this project are first the scalability
problem, where we initiate the effort separately in the
FBDD
logic synthesis project,
and then the timing closure problem, where we pursue a strategy called
soft synthesis, which delays scheduling, a crucial steps that defines
chip timing behavior, after the completion of global physical
planning.
Web
http://www.eecg.toronto.edu/~jzhu/vision.html