Table of Contents
Writing High-Performance code for a wide range of hardware is very challenging. Typically, the software is targetted at particular hardware or optimized for a specific set of parameters. This article describes a mechanism to interface a set of functions covering the same functionality but for different hardware or types of input with a single API, using a mechanism to dispatch to the most appropriate backend.
Sourcery VSIPL++ is configurable to target a wide range of backend implementations for most of its functions. it achives portability by hiding these backends behind common interfaces, yet strives to minimize the calling overhead by doing as much as possible at compile-time.
When the user performs a particular operation (e.g., adding two vectors) the library must select an appropriate implementation. For example, if the vectors are single-precision floating-point types, then a special SIMD routine might be used to perform the additional efficiently. Or, if the vectors are distributed across processors, multi-processor communication might be required.
When determining how to implement a given operation, Sourcery VSIPL++ performs a two-step process. One step is performed at compile-time; the other at run-time. Conceptually, the process is as follows:
Sourcery VSIPL++ forms a list of all possible implementations of the operation.
At compile-time, those implementations which do not accept arguments of appropriate types, or which are otherwise inappropriate for reasons which can be determined statically, are eliminated.
At run-time, each implementation not yet eliminated at compile-time is queried to see whether it can perform the operation. The first implementation that is able to perform the operation is used.
Each implementation is provided as a (possibly partial) specialization
of the Evaluator class template. The library checks the
ct_valid static data member to determine compile-time
suitability and calls the rt_valid() static member
function at run-time to determine run-time suitability.
The actual implementation of the operation is performed by the
exec() static member function.