Highly Efficient FFT for Exascale: HeFFTe v2.3
|
Classes | |
struct | heffte::direct_packer< tag::gpu > |
Simple packer that copies sub-boxes without transposing the order of the indexes. More... | |
struct | heffte::transpose_packer< tag::gpu > |
GPU version of the transpose packer. More... | |
struct | heffte::pack_plan_3d< index > |
Holds the plan for a pack/unpack operation. More... | |
struct | heffte::packer_backend< backend > |
The packer needs to know whether the data will be on the CPU or GPU devices. More... | |
struct | heffte::direct_packer< mode > |
Defines the direct packer without implementation, use the specializations to get the CPU or GPU implementation. More... | |
struct | heffte::direct_packer< tag::cpu > |
Simple packer that copies sub-boxes without transposing the order of the indexes. More... | |
struct | heffte::transpose_packer< mode > |
Defines the transpose packer without implementation, use the specializations to get the CPU implementation. More... | |
struct | heffte::transpose_packer< tag::cpu > |
Transpose packer that packs sub-boxes without transposing, but unpacks applying a transpose operation. More... | |
Functions | |
template<typename index > | |
std::ostream & | heffte::operator<< (std::ostream &os, pack_plan_3d< index > const &plan) |
Writes a plan to the stream, useful for debugging. | |
template<typename scalar_type , typename index > | |
void | heffte::data_scaling::apply (void *, index num_entries, scalar_type *data, double scale_factor) |
Simply multiply the num_entries in the data by the scale_factor. | |
template<typename precision_type , typename index > | |
void | heffte::data_scaling::apply (void *stream, index num_entries, std::complex< precision_type > *data, double scale_factor) |
Complex by real scaling. More... | |
template<typename scalar_type , typename index > | |
void | heffte::data_scaling::apply (index num_entries, scalar_type *data, double scale_factor) |
Helper method that omits the stream for the CPU case. | |
MPI communications assume that the data is located in contiguous arrays; however, the blocks that need to be transmitted in an FFT algorithm correspond to sub-boxes of a three dimensional array, which is never contiguous. Thus, packing and unpacking operations are needed to copy the sub-box into contiguous arrays. Furthermore, some backends (e.g., fftw3) work much faster with contiguous FFT transforms, thus it is beneficial to transpose the data between backend calls. Combining unpack and transpose operations reduces data movement.
void heffte::data_scaling::apply | ( | void * | stream, |
index | num_entries, | ||
std::complex< precision_type > * | data, | ||
double | scale_factor | ||
) |
Complex by real scaling.
Depending on the compiler and type of operation, C++ complex numbers can have bad performance compared to float and double operations. Since the scaling factor is always real, scaling can be performed with real arithmetic which is easier to vectorize.