Main Page/hybrid prog

From Nekcem
Revision as of 19:52, 8 February 2011 by Jingfu (talk | contribs)
Jump to navigationJump to search

This is the document page for hybrid programming proposal for NekCEM (some notes taken from DOE Exascale workshop).

Programming Model Approaches

  • hybrid/evolutionary: MPI+ __ ?
    • MPI for inter-node prog, since # notes and inter-node concerns not expected to change dramatically
      • support for hybrid programming/interoperability
      • purer one-sided communications; active messages
      • asynchronous collectives
    • something else for intra-node
      1. OpenMP (Shared memory, aka Global Address Space)
        • introduction of locality-oriented concepts?
        • efforts in OpenMP 3.0 ?
      2. PGAS languages (Partitioned Global Address Space)
        • already support a notation of locality in a shared namespace
        • UPC (Unified Parallel C)/CAF need to relax strictly SPMD execution model
      3. Sequoia
        • support a strong notation of vertical locality
  • unified/holistic: __ ?
    • a single notation for inter- and intra-node programming?
    • traditional PGAS languages: UPC, CAF, Titanium
      • require extension to handle nested parallelism, vertical locality
    • HPCS languages: Chapel, X10, Fortress(?)
      • designed with locality and post-SPMD parallelism in mind
    • other candidates: Charm++, Global Arrays, Parallel X, ...
  • others
    • mainstream multi-core/GPU language: (sufficient promise to be funded?)
    • domain-specific language
      • fit your problem?
      • should focus on more general solutions
    • functional languages
      • never heavily adopted in mainstream or HPC
      • copy-on-write optimization and alias analysis?
    • parallel scripting languages?


Expectation

  • parallelism: nested, dynamic, loosely-coupled, data-driven (i.e. post-SPMD programming/execution models)
    • to take advantage of architecture
    • to better support load balancing and resilience
  • locality: concepts for vertical control as well as horizontal (i.e. locality within a node rather than simply between nodes)

Tools: Debuggers, perf. analysis

  • challenges
    • need aggregation to hide details
    • need to report info in user's terms
  • good area for innovation (e.g. execution visualization to understand mapping of code to hardware)

Misc Notes

  • three main types of distributed memory programming:
    • active message. Active Messages are actually a lower-level mechanism that can be used to implement data parallel or message passing efficiently.
    • data parallel(aka loop-level parallelism). Data parallelism emphasizes the distributed (parallelized) nature of the data, as opposed to the processing (task parallelism).
    • message passing
  • multi core and many core processors
    • A multi-core processor is composed of two or more independent cores. One can describe it as an integrated circuit which has two or more individual processors (called cores in this sense).[1] Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.
    • A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient — largely due to issues with congestion supplying sufficient instructions and data to the many processors. This threshold is roughly in the range of several tens of cores and probably requires a network on chip.