Table of Contents
This chapter explains how to use Sourcery VSIPL++ to perform fast convolution (a common signal-processing kernel). First, you will see how to compute fast convolution using VSIPL++'s multiple FFT (Fftm) and vector-matrix multiply operations. Then, you will learn how to optimize the performance of the implementation.
Fast convolution is the technique of performing convolution in the frequency domain. In particular, the time-domain convolution f * g can be computed as F . G, where F and G are the frequency-domain representations of the signals f and g. A time-domain signal consisting of n samples can be converted to a frequency-domain signal in O(n log n) operations by using a Fast Fourier Transform (FFT). Substantially fewer operations are required to perform the frequency-domain operation F . G than are required to perform the time-domain operation f * g. Therefore, performing convolutions in the frequency domain can be substantially faster than performing the equivalent computations in the time domain, even taking into account the cost of converting from the time domain to the frequency domain.
One practical use of fast convolution is to perform the pulse compression step in radar signal processing. To increase the effective bandwidth of a system, radars will transmit a frequency modulated "chirp". By convolving the received signal with the time-inverse of the chirp (called the "replica"), the total energy returned from an object can be collapsed into a single range cell. Fast convolution is also useful in many other contexts including sonar processing and software radio.
In this section, you will construct a program that performs fast convolution on a set of time-domain signals stored in a matrix. Each row of the matrix corresponds to a single signal, or "pulse". The columns correspond to points in time. So, the entry at position (i, j) in the matrix indicates the amplitude and phase of the signal received at time j for the ith pulse.
The first step is to declare the data matrix, the vector that will contain the replica signal, and a temporary matrix that will hold the results of the computation:
// Parameters. length_type npulse = 64; // number of pulses length_type nrange = 256; // number of range cells // Views. typedef complex<float> value_type; Vector<value_type> replica(nrange); Matrix<value_type> data(npulse, nrange); Matrix<value_type> tmp (npulse, nrange);
For now, it is most convenient to initialize the input data to zero. (In Section 7.3, “Performing I/O with User-Specified Storage”, you will learn how to perform I/O operations so that you can populate the matrix with real data.)
In C++, you can use the constructor syntax T() to
perform "default initialization" of a type
T(). The default value for any numeric type
(including complex numbers) is zero. Therefore, the expression
value_type() indicates the complex number with zero as
both its real and imaginary components. In the VSIPL++ API, when
you assign a scalar value to a view (a vector, matrix, or tensor),
all elements of the view are assigned the scalar value. So, the
code below sets the contents of both the data matrix and replica
vector to zero:
data = value_type(); replica = value_type();
The next step is to define the FFTs that will be performed. Typically (as in this example) an application performs multiple FFTs on inputs with the same size. Since performing an FFT requires that some set-up be performed before performing the actual FFT computation, it is more efficient to set up the FFT just once. Therefore, in the VSIPL++ API, FFTs are objects, rather than operators. Constructing the FFT performs the necessary set-up operations.
Because VSIPL++ supports a variety of different kinds of FFT, FFTs are themselves template classes. The parameters to the template allow you to indicate whether to perform a forward (time-domain to frequency-domain) or inverse (frequency-domain to time-domain) FFT, the type of the input and output data (i.e., whether complex or real data is in use), and so forth. Then, when constructing the FFT objects, you indicate the size of the FFT. In this case, you will need both an ordinary FFT (to convert the replica data from the time domain to the frequency domain) and a "multiple FFT" to perform the FFTs on the rows of the matrix. (A multiple FFT performs the same FFT on each row or column of a matrix.) So, the FFTs required are:
// A forward Fft for computing the frequency-domain version of // the replica. typedef Fft<const_Vector, value_type, value_type, fft_fwd, by_reference> for_fft_type; for_fft_type for_fft (Domain<1>(nrange), 1.0); // A forward Fftm for converting the time-domain data matrix to the // frequency domain. typedef Fftm<value_type, value_type, row, fft_fwd, by_reference> for_fftm_type; for_fftm_type for_fftm(Domain<2>(npulse, nrange), 1.0); // An inverse Fftm for converting the frequency-domain data back to // the time-domain. typedef Fftm<value_type, value_type, row, fft_inv, by_reference> inv_fftm_type; inv_fftm_type inv_fftm(Domain<2>(npulse, nrange), 1.0/(nrange));
Before performing the actual convolution, you must convert the replica to the frequency domain using the FFT created above. Because the replica data is a property of the chirp, we only need to do this once; even if the radar system runs for a long time, the converted replica will always be the same. VSIPL++ FFT objects behave like functions, so you can just "call" the FFT object:
for_fft(replica);
Now, you are ready to perform the actual fast convolution
operation! You will use the forward and inverse multiple-FFT
objects you've already created to go into and out of the frequency
domain. While in the frequency domain, you will use the
vmmul operator to perform a
vector-matrix multiply. This operator multiplies each row
(dimension zero) of the frequency-domain matrix by the replica.
The vmmul operator is a template taking a
single parameter which indicates whether the multiplication should
be performed on rows or on columns. So, the heart of the fast
convolution algorithm is just:
// Convert to the frequency domain. for_fftm(data, tmp); // Perform element-wise multiply for each pulse. tmp = vmmul<0>(replica, tmp); // Convert back to the time domain. inv_fftm(tmp, data);
A complete program listing is show below. You can copy this program directly into your editor and build it. (You may notice that there are a few things in the complete listing not discussed above, including in particular, initialization of the library.)
/***********************************************************************
Included Files
***********************************************************************/
#include <vsip/initfin.hpp>
#include <vsip/support.hpp>
#include <vsip/signal.hpp>
#include <vsip/math.hpp>
using namespace vsip;
/***********************************************************************
Main Program
***********************************************************************/
int
main(int argc, char** argv)
{
// Initialize the library.
vsipl vpp(argc, argv);
typedef complex<float> value_type;
// Parameters.
length_type npulse = 64; // number of pulses
length_type nrange = 256; // number of range cells
// Views.
Vector<value_type> replica(nrange);
Matrix<value_type> data(npulse, nrange);
Matrix<value_type> tmp(npulse, nrange);
// A forward Fft for computing the frequency-domain version of
// the replica.
typedef Fft<const_Vector, value_type, value_type, fft_fwd, by_reference>
for_fft_type;
for_fft_type for_fft (Domain<1>(nrange), 1.0);
// A forward Fftm for converting the time-domain data matrix to the
// frequency domain.
typedef Fftm<value_type, value_type, row, fft_fwd, by_reference>
for_fftm_type;
for_fftm_type for_fftm(Domain<2>(npulse, nrange), 1.0);
// An inverse Fftm for converting the frequency-domain data back to
// the time-domain.
typedef Fftm<value_type, value_type, row, fft_inv, by_reference>
inv_fftm_type;
inv_fftm_type inv_fftm(Domain<2>(npulse, nrange), 1.0/(nrange));
// Initialize data to zero.
data = value_type();
replica = value_type();
// Before fast convolution, convert the replica to the the
// frequency domain
for_fft(replica);
// Perform fast convolution.
// Convert to the frequency domain.
for_fftm(data, tmp);
// Perform element-wise multiply for each pulse.
tmp = vmmul<0>(replica, tmp);
// Convert back to the time domain.
inv_fftm(tmp, data);
}
The following figure shows the performance in MFLOP/s of fast convolution on a 3.06 GHz Pentium Xeon processor as the number of range cells varies from 16 to 65536.
