This appendix presents the displays from the CUDA benchmarks.
CUDA copy -- fixed rows\n" -1 -- host to device copy\n" -2 -- device to host copy\n" -3 -- host->device->host copy (A = B)\n" -4 -- device to shared copy\n" -5 -- device to device copy\n" -6 -- device fill with zeroes\n" CUDA copy -- fixed columns\n" -11 -- host to device copy\n" -12 -- device to host copy\n" -13 -- host->device->host copy (A = B)\n" -14 -- device to shared copy\n" -15 -- device to device copy\n" -16 -- device fill with zeroes\n" Parameters: -p:rows ROWS (default 64) -p:size SIZE (default 2048)
fastconv -- fast convolution benchmark Sweeping pulse size: -1 -- Out-of-place, phased -2 -- On-device, phased -3 -- On-device, interleaved Parameters (for sweeping convolution size, cases 1 through 10) -p:rows ROWS -- set number of pulses (default 64) Sweeping number of pulses: -11 -- Out-of-place, phased -12 -- On-device, phased -13 -- On-device, interleaved Parameters (for sweeping number of convolutions, cases 11 through 20) -p:size SIZE -- size of pulse (default 2048)
fftm -- FFT/FFTM benchmark using CUDA Fixed rows, sweeping FFT size: -1 -- op : out-of-place CC fwd fft -2 -- ip : In-place CC fwd fft -3 -- dev : On-device CC fwd fft Parameters (for sweeping FFT size, cases 1 through 6) -p:rows ROWS -- set number of pulses (default 64) Fixed FFT size, sweeping number of FFTs: -11 -- op : out-of-place CC fwd fft -12 -- ip : In-place CC fwd fft -13 -- dev : On-device CC fwd fft Parameters (for sweeping number of FFTs, cases 11 through 16) -p:size SIZE -- size of pulse (default 2048)
CUDA transpose (direct - memory moves not timed) Sweeping column size: -1 -- Out-of-place, complex Sweeping row size: -11 -- Out-of-place, complex CUDA transpose (normal - memory moves are timed) Sweeping column size: -21 -- Out-of-place, complex Sweeping row size: -31 -- Out-of-place, complex Parameters (for sweeping number of columns, cases 1, 21) -p:rows ROWS -- set number of rows (default 64) Parameters (for sweeping number of columns, cases 11, 31) -p:cols COLS -- set number of columns (default 2048)
CUDA vmmul -- vector-matrix multiply Sweeping column size, vmmul<row>: -1 -- Out-of-place, complex Sweeping row size, vmmul<row>: -11 -- Out-of-place, complex Sweeping column size, vmmul<col>: -21 -- Out-of-place, complex Sweeping row size, vmmul<col>: -31 -- Out-of-place, complex Parameters (for sweeping number of columns, cases 1, 21) -p:rows ROWS -- set number of rows (default 64) Parameters (for sweeping number of columns, cases 11, 31) -p:cols COLS -- set number of columns (default 2048)