Based on OpenMP, OmpSs which is developed by the Barcelona Supercomputing Center, employs a data-flow programming model to ease application porting to the heterogeneous architectures. It exploits task level parallelism and supports asynchronicity, heterogeneity and data movement.
To use OmpSs, some parts of an application must be taskified. Basically, this is done by annotating the selected code with OpenMP-like pragmas indicating the data read (input) and/or written (output) by each task. Additionally the user can specify one or multiple hardware devices where a given task should be executed, and whether data needs to be copied from/to those devices. Various versions of the tasks can exist to target different architectures.
OmpSs annotations are interpreted by the Mercurium source-to-source compiler, which supports Fortran, C, and C++ languages. For each call to the annotated functions the compiler generates a call to the Nanos++ runtime system to create a new task. The result is compiled by a native compiler and linked with Nanos++.
Each time a new task is created its input and output dependencies are matched against those of the already existing tasks. Taking these dependencies into account, the runtime decides on the execution order of the tasks and whether concurrent execution is allowed. All this information is used to schedule the tasks on the available devices.
OmpSs in DEEP
In DEEP, the OmpSs programming model will run not only at the node level, but also as an abstraction of the global MPI. Pragmas are provided to make the offload of tasks from Cluster to Booster more user-friendly. They hide the necessary coordination and management of two or more sets of parallel MPI processes and send the required data from one side to the other and vice versa.
For that, OmpSs is extended by pragmas to mark the MPI functions that must be offloaded. The Mercurium compiler and the Nanos++ runtime will cooperate to transparently manage all the data transfers between the MPI processes running on the Cluster and those running on the Booster, making use of the functions MPI_Comm_spawn and MPI_Comm_send/recv. By corresponding annotation the application developer can mark highly-scalable code parts to be sent to the Booster allowing MPI-operations within these super-tasks at the same time.