Microgrids

The Microgrid is a future many-core chip architecture which implements the SVP primitives in hardware.

One design goal is to exploit hardware multithreading to increase pipeline utilization and tolerate communication latencies, instead of instruction-level parallelism in singlethreaded instruction streams which requires expensive multiported register files, branch predictors and reorder logic. This is similar in intent and purpose to Niagara, yet the proposed core architecture differs significantly. First the core design focuses on size reduction: an in-order, single issue RISC pipeline uses dynamic self-scheduling with dataflow state bits on each register. Next a dedicated Thread Management Unit (TMU) in hardware takes ownership of thread management, including dynamic allocation of registers, bulk creation and bulk synchronization. The combination of variably sized register windows with bulk thread management enables the successful replacement of tightly dependent computations, typically loops, by families of dependent threads interleaved in the pipeline using only a few instructions and registers each.

Core micro-architecture Threads are provided by the scheduler to the fetch unit using a FIFO queue. Asynchronous operations write a “waiting” state to the output register and allow the thread to continue execution. Only instructions with “waiting” operands cause suspension. The instruction streams annotate potentially suspending instructions so that the fetch unit switches early with no overhead. The TMU is controlled by instructions from the pipeline and active messages from the NoC.

Another design goal is to unify the concurrency management protocol within and across cores. The same bulk creation request can allocate resources and dispach work on an arbitrary number of adjacent cores from a “master” identified by its address on chip. Bulk synchronization is likewise resolved across cores upon a single request to the “master” core. A special core addressing scheme based on a space filling curve allows a program to specify clusters of cores of arbitrary size will preserving cache locality at every scale. The memory system a cache network that preserves sequential consistency within threads but provides only bulk consistency at synchronization points across threads, enabling scalability to hundreds of cores.

32-core Microgrid tile The linear bulk creation and synchronization network follows a space filling curve to maximize locality, so does the ring network between L2 caches. A mesh supports cross-chip work distribution. This 32-core tile can be assembled into chips of hundreds of cores.

Contact and community

The Microgrid is a research project at the Computer Systems Architecture group at the University of Amsterdam.

Software:

Mailing lists:

More information

Presentations:

Technical documentation:

Academic publications (complete list on the Research page):

  • Jian Fu, Qiang Yang, Raphael Poss, Chris Jesshope, and Chunyuan Zhang. On-demand thread-level fault detection in a concurrent programming environment. In Proc. Intl. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS). IEEE, Samos, Greece, July 2013.
  • Raphael Poss, Mike Lankamp, Qiang Yang, Jian Fu, Michiel W. van Tol, Irfan Uddin, and Chris Jesshope. Apple-CORE: harnessing general-purpose many-cores with hardware concurrency management. Microprocessors and Microsystems, June 2013. ISSN 0141-9331.
  • Raphael Poss, Mike Lankamp, Qiang Yang, Jian Fu, Michiel W. van Tol, and Chris Jesshope. Apple-CORE: Microgrids of SVP cores (invited paper). In Smail Niar, editor, Proc. 15th Euromicro Conference on Digital System Design (DSD 2012). IEEE Computer Society, September 2012. ISBN 978-0-7695-4798-5.
  • Chris Jesshope, Michael Hicks, Mike Lankamp, Raphael Poss, and Li Zhang. Making multi-cores mainstream – from security to scalability. In Advances in Parallel Computing, volume 18. IOS Press, 2010. ISBN 978-1-60750-529-7.
  • Michael A. Hicks, Michiel W. van Tol, and Chris R. Jesshope. Towards Scalable I/O on a Many- core Architecture. In International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS), pages 341–348. IEEE, July 2010. ISBN 978-1-4244-7937-5.
  • K. Bousias, L. Guang, C.R. Jesshope, and M. Lankamp. Implementation and Evaluation of a Microthread Architecture. Journal of Systems Architecture, 55(3):149–161, 2009.
  • Chris Jesshope, Mike Lankamp, and Li Zhang. Evaluating CMPs and their memory architecture. In Architecture of Computing Systems – ARCS 2009, volume 5455/2009 of Lecture Notes in Computer Science, pages 246–257. Springer Berlin / Heidelberg, 2009. ISBN 978-3-642-00453-7. ISSN 0302- 9743 (Print) 1611-3349 (Online).
  • Chris Jesshope, Mike Lankamp, and Li Zhang. The Implementation of an SVP Many-core Processor and the Evaluation of its Memory Architecture. ACM SIGARCH Computer Architecture News, 37 (2):38–45, 2009. ISSN 0163-5964.
  • C.R. Jesshope. A model for the design and programming of multi-cores. Advances in Parallel Computing, High Performance Computing and Grids in Action(16):37–55, 2008. ISBN 978-1-58603-839-7.
  • A. Bolychevsky, C.R. Jesshope, and V.B. Muchnick. Dynamic scheduling in RISC architectures. IEE Trans. E, Computers and Digital Techniques, 143:309–317, 1996.