Defines a set of tweaks and options to use in the plan generation. More...

#include <heffte_plan_logic.h>

Public Member Functions
template<typename backend_tag >
	plan_options (backend_tag const)
	Constructor, initializes all options with the default values for the given backend tag.

	plan_options (bool reorder, reshape_algorithm alg, bool pencils)
	Constructor, initializes each variable, primarily for internal use.

void	use_num_subranks (int num_subranks)
	Defines the number of ranks to use for the internal reshapes, set to -1 to use all ranks.

void	use_subcomm (MPI_Comm comm)
	Set sub-communicator to use in the intermediate reshape operations. More...

int	get_subranks () const
	Return the set number of sub-ranks.

Public Attributes
bool	use_reorder
	Defines whether to transpose the data on reshape or to use strided 1-D ffts.

reshape_algorithm	algorithm
	Defines the communication algorithm.

bool	use_pencils
	Defines whether to use pencil or slab data distribution in the reshape steps.

bool	use_gpu_aware
	Defines whether to use MPI calls directly from the GPU or to move to the CPU first.

Detailed Description

Defines a set of tweaks and options to use in the plan generation.

Example usage:

heffte::plan_options options = heffte::default_options<heffte::backend::fftw>();
options.algorithm = reshape_algorithm::p2p; // forces the use of point-to-point communication
heffte::fft3d<heffte::backend::fftw> fft3d(inbox, outbox, comm, options);

Option use_reorder: Controls whether the backends should be called with strided or contiguous data. If the option is enabled then during the reshape operations heFFTe will reorder the data so that the backend is called for contiguous batch of 1D FFTs. Otherwise the strided call will be performed. Depending on the size and the specific backend (or version of the backend), one or the other may improve performance. The reorder is applied during the unpacking stage of an MPI communication and will be applied even if no MPI communication is used. Note that some backends don't currently support strided transforms, e.g., the Sine and Cosine transforms, in which case this option will have no effect.

Option algorithm: Specifies the combination of MPI calls to use in the communication. See heffte::reshape_algorithm for details.

Option use_pencils: Indicates whether the intermediate steps of the computation should be done either in pencil or slab format. Slabs work better for problems with fewer MPI ranks, while pencils work better when the number ranks increases. The specific cutoff depends on the hardware and MPI implementation. Note that is the input or output shape of the data is in slab format, then this option will be ignored.

Option use_gpu_aware: Applied only when using one of the GPU backends, indicates whether MPI communication should be initiated from the GPU device or if the data has to be moved the CPU first. MPI calls from the GPU have faster throughput but larger latency, thus initiating the calls from the CPU (e.g., setting use_gpu_aware to false) can be faster when using smaller problems compared to the number of MPI ranks.

Option use_subcomm or use_num_subranks: Restricts the intermediate reshape and FFT operations to a subset of the ranks specified by the communicator given in the construction of heffte::fft3d and heffte::fft3d_r2c. By default, heFFTe will use all of the available MPI ranks but this is not always optimal (see the two examples below). The other options are defined as member variables, but the subcomm option is specified with member functions that accept either an integer or an MPI communicator. Using an integer will specify ranks 0 to num_subranks -1, while using a communicator can define an arbitrary subset. MPI ranks that don't belong to the subcomm should pass MPI_COMM_NULL. The plan_options class will hold a non-owning reference to the MPI subcomm but heffte::fft3d and heffte::fft3d_r2c will use the subcomm only in the constructors, i.e., the subcomm can be safely discarded/freed after the fft3d classes are constructed.

: For example, if the input and output shapes of the data do not form pencils in any direction (i.e., using a brick decomposition), then heFFTe has to perform 4 reshape operations (3 if using slabs) and if the problem size is small relative to the number of ranks this results in 4 sets of small messages which increases the latency and reduces performance. However, if the problem can fit on a single node (e.g., single GPU), then gathering all the data to a single rank then performing a single 3D FFT and scattering the data back will result in only 2 communications involving the small messages. Thus, so long as the two operations are less expensive than the 4, using a subcomm will result in an overall performance boost.

: Similar to the previous example, if we are using a CPU backend with multiple MPI ranks per node, then reducing the MPI ranks to one-per-node can effectively coalesce smaller messages from multiple ranks to larger messages and thus reduce latency. If the CPU backend supports multi-threading, then all CPU cores can still be used by calls from the single rank without reduction of performance.

Member Function Documentation

◆ use_subcomm()

void heffte::plan_options::use_subcomm ( MPI_Comm comm )

inline

Set sub-communicator to use in the intermediate reshape operations.

The ranks defined by comm must be a subset of the communicator that will be used in the future call to heffte::fft3d or heffte::fft3d_r2c. The ranks that are not associated with the comm should pass in MPI_COMM_NULL. The plan_options object will take a non-owning reference to comm but the reference will not be passed into heffte::fft3d or heffte::fft3d_r2c.

This method takes precedence over use_num_subranks() if both methods are called. Avoid calling both methods.

The documentation for this struct was generated from the following file:

include/heffte_plan_logic.h

Public Member Functions

Public Attributes

Detailed Description

Member Function Documentation

◆ use_subcomm()