The Microgrid is a future many-core chip architecture which implements the SVP primitives in hardware.
One design goal is to exploit hardware multithreading to increase pipeline utilization and tolerate communication latencies, instead of instruction-level parallelism in singlethreaded instruction streams which requires expensive multiported register files, branch predictors and reorder logic. This is similar in intent and purpose to Niagara, yet the proposed core architecture differs significantly. First the core design focuses on size reduction: an in-order, single issue RISC pipeline uses dynamic self-scheduling with dataflow state bits on each register. Next a dedicated Thread Management Unit (TMU) in hardware takes ownership of thread management, including dynamic allocation of registers, bulk creation and bulk synchronization. The combination of variably sized register windows with bulk thread management enables the successful replacement of tightly dependent computations, typically loops, by families of dependent threads interleaved in the pipeline using only a few instructions and registers each.
Core micro-architecture
Threads are provided by the scheduler to the fetch unit
using a FIFO queue. Asynchronous operations write a “waiting” state
to the output register and allow the thread to continue
execution. Only instructions with “waiting” operands cause
suspension. The instruction streams annotate potentially suspending
instructions so that the fetch unit switches early with no overhead. The TMU
is controlled by instructions from the pipeline and active
messages from the NoC.
Another design goal is to unify the concurrency management protocol within and across cores. The same bulk creation request can allocate resources and dispach work on an arbitrary number of adjacent cores from a “master” identified by its address on chip. Bulk synchronization is likewise resolved across cores upon a single request to the “master” core. A special core addressing scheme based on a space filling curve allows a program to specify clusters of cores of arbitrary size will preserving cache locality at every scale. The memory system a cache network that preserves sequential consistency within threads but provides only bulk consistency at synchronization points across threads, enabling scalability to hundreds of cores.
32-core Microgrid tile
The linear bulk creation and synchronization network follows
a space filling curve to maximize locality, so does
the ring network between L2 caches. A mesh supports
cross-chip work distribution. This 32-core tile can be
assembled into chips of hundreds of cores.
The Microgrid is a research project at the Computer Systems Architecture group at the University of Amsterdam.
Mailing lists:
Presentations:
Technical documentation:
Academic publications (complete list on the Research page):