4.5. Creating custom evaluators

In the previous section we have seen how to improve the expression evaluation by using the return-value optimization technique to avoid certain temporaries. However, there may be more that can be done to improve performance.

It may, for example, be possible to fuse multiple operations into one. Some platforms provide a fused "multiply-add" instruction that may be used, some algorithms are optimized for combined evaluation such as an FFT with a scalar multiplication, etc.

To be able to take advantage of those opportunities, we need to 'see' the whole expression at once, so we can dispatch the relevant sub-expression to such 'backends'.

For common cases, the library already performs this internally. However, sometimes users have their own optimized code that needs to be hooked into expression evaluation

In this section, we will develop an expression evaluator that matches the expression interpolate(scale(a, 2.), 32) from the last section.

Assignments are evaluated using the dispatch mechanism described in Chapter 3, “Using the Dispatch Framework”. To provide a custom evaluator for a particular expression assignment, it is thus necessary to specialize an Evaluator, using op::assign<D> as operation tag, and be::user as backend tag:

4.5.1. Specializing an evaluators for a particular expression type

To make Sourcery VSIPL++ use a custom evaluator, we need to specialize the vsip_csl::dispatcher::Evaluator template for the particular expression type we are interested in. Further, we need to model the evaluator concept.

The type of the expression can be discovered using type_name():

std::cout << type_name(interpolate(scale(a, 2.), 32)) << std::endl;

This yields (approximately):

vsip::Vector<float,
  vsip_csl::expr::Unary<example::Interpolator, 
    vsip_csl::expr::Unary<example::Scale, 
      vsip::Dense<1u, float, tuple<0u, 1u, 2u>, vsip::Local_map>,
        false> const>, false> const>

The block type is thus (with some details removed for clarity):

Unary<Interpolator, Unary<Scale, Dense<1>, false> const, false, const>

This can be visualized like this:

i.e., it is a Unary whose functor is an Interpolator. Its argument block, in turn, is a Unary whose functor is a Scale, and its argument block is a Dense<1>. This allows us to write a matching evaluator:

namespace vsip_csl
{
namespace dispatcher
{
template <typename ResultBlockType, typename ArgumentBlockType>
struct Evaluator<op::assign<1>, be::user
  void(ResultBlockType &,
       expr::Unary<Interpolator, expr::Unary<Scale, ArgumentBlockType> const> const &)>
{
  typedef typename ArgumentBlockType::value_type value_type;

  typedef ResultBlockType LHS;
  typedef expr::Unary<Interpolator, expr::Unary<Scale, ArgumentBlockType> const> RHS;

  static bool const ct_valid = true;
  static bool rt_valid(LHS &, RHS const &) { return true;}
  static void exec(LHS &lhs, RHS const &rhs) {} // TBD
};

This evaluator will match the desired expression. As usual, the ct_valid and rt_valid() members can be used to refine the selection process.

4.5.2. Accessing the expression nodes.

The terminals in this expression are the block, the interpolator's target size, as well as the scale value. Once these three are available, the entire expression may be evaluated in a fused scaled_interpolate(), as shown here:

static void exec(LHS &lhs, RHS const &rhs)
{
  // rhs.arg() yields Unary<Scale, ArgumentBlockType>,
  // rhs.arg().arg() thus returns the terminal ArgumentBlockType block...
  ArgumentBlockType &block = rhs.arg().arg();
  // ...and rhs.arg().operation() the Scale<ArgumentBlockType> functor.
  value_type scale = rhs.functor().argument.functor().func.value;

  // rhs.operation() yields the Interpolator<Unary<Scale, ...> functor.
  length_type new_size(rhs.operation().size(1, 0));

  // wrap terminal blocks in views for convenience, and evaluate.
  Vector<value_type, LHS> result(lhs);
  const_Vector<value_type, ArgumentBlockType const> argument(block);
  scaled_interpolate(result, argument, size, scale, new_size);
}