Highly Efficient FFT for Exascale: HeFFTe v2.3
Packing/Unpacking operations
Collaboration diagram for Packing/Unpacking operations:

Classes

struct  heffte::direct_packer< tag::gpu >
 Simple packer that copies sub-boxes without transposing the order of the indexes. More...
 
struct  heffte::transpose_packer< tag::gpu >
 GPU version of the transpose packer. More...
 
struct  heffte::pack_plan_3d< index >
 Holds the plan for a pack/unpack operation. More...
 
struct  heffte::packer_backend< backend >
 The packer needs to know whether the data will be on the CPU or GPU devices. More...
 
struct  heffte::direct_packer< mode >
 Defines the direct packer without implementation, use the specializations to get the CPU or GPU implementation. More...
 
struct  heffte::direct_packer< tag::cpu >
 Simple packer that copies sub-boxes without transposing the order of the indexes. More...
 
struct  heffte::transpose_packer< mode >
 Defines the transpose packer without implementation, use the specializations to get the CPU implementation. More...
 
struct  heffte::transpose_packer< tag::cpu >
 Transpose packer that packs sub-boxes without transposing, but unpacks applying a transpose operation. More...
 

Functions

template<typename index >
std::ostream & heffte::operator<< (std::ostream &os, pack_plan_3d< index > const &plan)
 Writes a plan to the stream, useful for debugging.
 
template<typename scalar_type , typename index >
void heffte::data_scaling::apply (void *, index num_entries, scalar_type *data, double scale_factor)
 Simply multiply the num_entries in the data by the scale_factor.
 
template<typename precision_type , typename index >
void heffte::data_scaling::apply (void *stream, index num_entries, std::complex< precision_type > *data, double scale_factor)
 Complex by real scaling. More...
 
template<typename scalar_type , typename index >
void heffte::data_scaling::apply (index num_entries, scalar_type *data, double scale_factor)
 Helper method that omits the stream for the CPU case.
 

Detailed Description

MPI communications assume that the data is located in contiguous arrays; however, the blocks that need to be transmitted in an FFT algorithm correspond to sub-boxes of a three dimensional array, which is never contiguous. Thus, packing and unpacking operations are needed to copy the sub-box into contiguous arrays. Furthermore, some backends (e.g., fftw3) work much faster with contiguous FFT transforms, thus it is beneficial to transpose the data between backend calls. Combining unpack and transpose operations reduces data movement.

Function Documentation

◆ apply()

template<typename precision_type , typename index >
void heffte::data_scaling::apply ( void *  stream,
index  num_entries,
std::complex< precision_type > *  data,
double  scale_factor 
)

Complex by real scaling.

Depending on the compiler and type of operation, C++ complex numbers can have bad performance compared to float and double operations. Since the scaling factor is always real, scaling can be performed with real arithmetic which is easier to vectorize.