B.6. CUDA Benchmark Usage

This appendix presents the displays from the CUDA benchmarks.

B.6.1. copy

CUDA copy -- fixed rows\n"
   -1 -- host to device copy\n"
   -2 -- device to host copy\n"
   -3 -- host->device->host copy (A = B)\n"
   -4 -- device to shared copy\n"
   -5 -- device to device copy\n"
   -6 -- device fill with zeroes\n"
CUDA copy -- fixed columns\n"
  -11 -- host to device copy\n"
  -12 -- device to host copy\n"
  -13 -- host->device->host copy (A = B)\n"
  -14 -- device to shared copy\n"
  -15 -- device to device copy\n"
  -16 -- device fill with zeroes\n"

Parameters:
  -p:rows ROWS (default 64)
  -p:size SIZE (default 2048)

B.6.2. fastconv

fastconv -- fast convolution benchmark
 Sweeping pulse size:
   -1 -- Out-of-place, phased
   -2 -- On-device, phased
   -3 -- On-device, interleaved

 Parameters (for sweeping convolution size, cases 1 through 10)
  -p:rows ROWS -- set number of pulses (default 64)

 Sweeping number of pulses:
  -11 -- Out-of-place, phased
  -12 -- On-device, phased
  -13 -- On-device, interleaved

 Parameters (for sweeping number of convolutions, cases 11 through 20)
  -p:size SIZE -- size of pulse (default 2048)

B.6.3. fftm

fftm -- FFT/FFTM benchmark using CUDA
 Fixed rows, sweeping FFT size:
   -1 -- op  : out-of-place CC fwd fft
   -2 -- ip  : In-place CC fwd fft
   -3 -- dev : On-device CC fwd fft

 Parameters (for sweeping FFT size, cases 1 through 6)
  -p:rows ROWS -- set number of pulses (default 64)

 Fixed FFT size, sweeping number of FFTs:
  -11 -- op  : out-of-place CC fwd fft
  -12 -- ip  : In-place CC fwd fft
  -13 -- dev : On-device CC fwd fft

 Parameters (for sweeping number of FFTs, cases 11 through 16)
  -p:size SIZE -- size of pulse (default 2048)

B.6.4. transpose

CUDA transpose (direct - memory moves not timed)
 Sweeping column size:
   -1 -- Out-of-place, complex
 Sweeping row size:
  -11 -- Out-of-place, complex

CUDA transpose (normal - memory moves are timed)
 Sweeping column size:
  -21 -- Out-of-place, complex
 Sweeping row size:
  -31 -- Out-of-place, complex

 Parameters (for sweeping number of columns, cases 1, 21)
  -p:rows ROWS -- set number of rows (default 64)

 Parameters (for sweeping number of columns, cases 11, 31)
  -p:cols COLS -- set number of columns (default 2048)

B.6.5. vmmul

CUDA vmmul -- vector-matrix multiply
 Sweeping column size, vmmul<row>:
   -1 -- Out-of-place, complex
 Sweeping row size, vmmul<row>:
  -11 -- Out-of-place, complex
 Sweeping column size, vmmul<col>:
  -21 -- Out-of-place, complex
 Sweeping row size, vmmul<col>:
  -31 -- Out-of-place, complex

 Parameters (for sweeping number of columns, cases 1, 21)
  -p:rows ROWS -- set number of rows (default 64)

 Parameters (for sweeping number of columns, cases 11, 31)
  -p:cols COLS -- set number of columns (default 2048)