Difference between revisions of "Main Page/hybrid prog"
From Nekcem
Jump to navigationJump to searchLine 54: | Line 54: | ||
** data parallel(aka loop-level parallelism). Data parallelism emphasizes the distributed (parallelized) nature of the data, as opposed to the processing (task parallelism). | ** data parallel(aka loop-level parallelism). Data parallelism emphasizes the distributed (parallelized) nature of the data, as opposed to the processing (task parallelism). | ||
** message passing | ** message passing | ||
+ | * multi core and many core processors | ||
+ | ** A multi-core processor is composed of two or more independent cores. One can describe it as an integrated circuit which has two or more individual processors (called cores in this sense).[1] Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package. | ||
+ | ** A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient — largely due to issues with congestion supplying sufficient instructions and data to the many processors. This threshold is roughly in the range of several tens of cores and probably requires a network on chip. |
Revision as of 19:52, 8 February 2011
This is the document page for hybrid programming proposal for NekCEM (some notes taken from DOE Exascale workshop).
Programming Model Approaches
- hybrid/evolutionary: MPI+ __ ?
- MPI for inter-node prog, since # notes and inter-node concerns not expected to change dramatically
- support for hybrid programming/interoperability
- purer one-sided communications; active messages
- asynchronous collectives
- something else for intra-node
- OpenMP (Shared memory, aka Global Address Space)
- introduction of locality-oriented concepts?
- efforts in OpenMP 3.0 ?
- PGAS languages (Partitioned Global Address Space)
- already support a notation of locality in a shared namespace
- UPC (Unified Parallel C)/CAF need to relax strictly SPMD execution model
- Sequoia
- support a strong notation of vertical locality
- OpenMP (Shared memory, aka Global Address Space)
- MPI for inter-node prog, since # notes and inter-node concerns not expected to change dramatically
- unified/holistic: __ ?
- a single notation for inter- and intra-node programming?
- traditional PGAS languages: UPC, CAF, Titanium
- require extension to handle nested parallelism, vertical locality
- HPCS languages: Chapel, X10, Fortress(?)
- designed with locality and post-SPMD parallelism in mind
- other candidates: Charm++, Global Arrays, Parallel X, ...
- others
- mainstream multi-core/GPU language: (sufficient promise to be funded?)
- domain-specific language
- fit your problem?
- should focus on more general solutions
- functional languages
- never heavily adopted in mainstream or HPC
- copy-on-write optimization and alias analysis?
- parallel scripting languages?
Expectation
- parallelism: nested, dynamic, loosely-coupled, data-driven (i.e. post-SPMD programming/execution models)
- to take advantage of architecture
- to better support load balancing and resilience
- locality: concepts for vertical control as well as horizontal (i.e. locality within a node rather than simply between nodes)
Tools: Debuggers, perf. analysis
- challenges
- need aggregation to hide details
- need to report info in user's terms
- good area for innovation (e.g. execution visualization to understand mapping of code to hardware)
Misc Notes
- three main types of distributed memory programming:
- active message. Active Messages are actually a lower-level mechanism that can be used to implement data parallel or message passing efficiently.
- data parallel(aka loop-level parallelism). Data parallelism emphasizes the distributed (parallelized) nature of the data, as opposed to the processing (task parallelism).
- message passing
- multi core and many core processors
- A multi-core processor is composed of two or more independent cores. One can describe it as an integrated circuit which has two or more individual processors (called cores in this sense).[1] Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.
- A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient — largely due to issues with congestion supplying sufficient instructions and data to the many processors. This threshold is roughly in the range of several tens of cores and probably requires a network on chip.