Collaboration diagram for Packing/Unpacking operations:

Classes
struct	heffte::direct_packer< tag::gpu >
	Simple packer that copies sub-boxes without transposing the order of the indexes. More...

struct	heffte::transpose_packer< tag::gpu >
	GPU version of the transpose packer. More...

struct	heffte::pack_plan_3d< index >
	Holds the plan for a pack/unpack operation. More...

struct	heffte::packer_backend< backend >
	The packer needs to know whether the data will be on the CPU or GPU devices. More...

struct	heffte::direct_packer< mode >
	Defines the direct packer without implementation, use the specializations to get the CPU or GPU implementation. More...

struct	heffte::direct_packer< tag::cpu >
	Simple packer that copies sub-boxes without transposing the order of the indexes. More...

struct	heffte::transpose_packer< mode >
	Defines the transpose packer without implementation, use the specializations to get the CPU implementation. More...

struct	heffte::transpose_packer< tag::cpu >
	Transpose packer that packs sub-boxes without transposing, but unpacks applying a transpose operation. More...

Functions
template<typename index >
std::ostream &	heffte::operator<< (std::ostream &os, pack_plan_3d< index > const &plan)
	Writes a plan to the stream, useful for debugging.

template<typename scalar_type , typename index >
void	heffte::data_scaling::apply (void , index num_entries, scalar_type data, double scale_factor)
	Simply multiply the num_entries in the data by the scale_factor.

template<typename precision_type , typename index >
void	heffte::data_scaling::apply (void stream, index num_entries, std::complex< precision_type > data, double scale_factor)
	Complex by real scaling. More...

template<typename scalar_type , typename index >
void	heffte::data_scaling::apply (index num_entries, scalar_type *data, double scale_factor)
	Helper method that omits the stream for the CPU case.

Detailed Description

MPI communications assume that the data is located in contiguous arrays; however, the blocks that need to be transmitted in an FFT algorithm correspond to sub-boxes of a three dimensional array, which is never contiguous. Thus, packing and unpacking operations are needed to copy the sub-box into contiguous arrays. Furthermore, some backends (e.g., fftw3) work much faster with contiguous FFT transforms, thus it is beneficial to transpose the data between backend calls. Combining unpack and transpose operations reduces data movement.

Function Documentation

◆ apply()

template<typename precision_type , typename index >

void heffte::data_scaling::apply	(	void *	stream,
		index	num_entries,
		std::complex< precision_type > *	data,
		double	scale_factor
	)

Complex by real scaling.

Depending on the compiler and type of operation, C++ complex numbers can have bad performance compared to float and double operations. Since the scaling factor is always real, scaling can be performed with real arithmetic which is easier to vectorize.

Classes

Functions

Detailed Description

Function Documentation

◆ apply()