Highly Efficient FFT for Exascale: HeFFTe v2.3
|
Classes | |
class | heffte::reshape3d_base< index > |
Base reshape interface. More... | |
class | heffte::reshape3d_alltoall< location_tag, packer, index > |
Reshape algorithm based on the MPI_Alltoall() method. More... | |
class | heffte::reshape3d_alltoallv< location_tag, packer, index > |
Reshape algorithm based on the MPI_Alltoallv() method. More... | |
class | heffte::reshape3d_pointtopoint< location_tag, packer, index > |
Reshape algorithm based on the MPI_Send() and MPI_Irecv() methods. More... | |
class | heffte::reshape3d_transpose< location_tag, index > |
Special case of the reshape that does not involve MPI communication but applies a transpose instead. More... | |
Functions | |
template<typename index > | |
void | heffte::compute_overlap_map_transpose_pack (int me, int nprocs, box3d< index > const destination, std::vector< box3d< index >> const &boxes, std::vector< int > &proc, std::vector< int > &offset, std::vector< int > &sizes, std::vector< pack_plan_3d< index >> &plans) |
Generates an unpack plan where the boxes and the destination do not have the same order. More... | |
template<typename index > | |
size_t | heffte::get_workspace_size (std::array< std::unique_ptr< reshape3d_base< index >>, 4 > const &shapers) |
Returns the maximum workspace size used by the shapers. | |
template<typename location_tag , template< typename device > class packer = direct_packer, typename index > | |
std::unique_ptr< reshape3d_alltoall< location_tag, packer, index > > | heffte::make_reshape3d_alltoall (typename backend::device_instance< location_tag >::stream_type q, std::vector< box3d< index >> const &input_boxes, std::vector< box3d< index >> const &output_boxes, bool uses_gpu_aware, MPI_Comm const comm) |
Factory method that all the necessary work to establish the communication patterns. More... | |
template<typename location_tag , template< typename device > class packer = direct_packer, typename index > | |
std::unique_ptr< reshape3d_alltoallv< location_tag, packer, index > > | heffte::make_reshape3d_alltoallv (typename backend::device_instance< location_tag >::stream_type q, std::vector< box3d< index >> const &input_boxes, std::vector< box3d< index >> const &output_boxes, bool use_gpu_aware, MPI_Comm const comm) |
Factory method that all the necessary work to establish the communication patterns. More... | |
template<typename location_tag , template< typename device > class packer = direct_packer, typename index > | |
std::unique_ptr< reshape3d_pointtopoint< location_tag, packer, index > > | heffte::make_reshape3d_pointtopoint (typename backend::device_instance< location_tag >::stream_type q, std::vector< box3d< index >> const &input_boxes, std::vector< box3d< index >> const &output_boxes, reshape_algorithm algorithm, bool use_gpu_aware, MPI_Comm const comm) |
Factory method that all the necessary work to establish the communication patterns. More... | |
template<typename backend_tag , typename index > | |
std::unique_ptr< reshape3d_base< index > > | heffte::make_reshape3d (typename backend::device_instance< typename backend::buffer_traits< backend_tag >::location >::stream_type stream, std::vector< box3d< index >> const &input_boxes, std::vector< box3d< index >> const &output_boxes, MPI_Comm const comm, plan_options const options) |
Factory method to create a reshape3d instance. More... | |
A reshape operation is one that modifies the distribution of the indexes across an MPI communicator. In a special case, the reshape can correspond to a simple in-node data transpose (i.e., no communication).
The reshape operations inherit from a common heffte::reshape3d_base class that defines the apply method for different data-types and the sizes of the input, output, and scratch workspace. Reshape objects are usually wrapped in std::unique_ptr containers, which handles the polymorphic calls at runtime and also indicates the special case of no-reshape when the container is empty.
void heffte::compute_overlap_map_transpose_pack | ( | int | me, |
int | nprocs, | ||
box3d< index > const | destination, | ||
std::vector< box3d< index >> const & | boxes, | ||
std::vector< int > & | proc, | ||
std::vector< int > & | offset, | ||
std::vector< int > & | sizes, | ||
std::vector< pack_plan_3d< index >> & | plans | ||
) |
Generates an unpack plan where the boxes and the destination do not have the same order.
This method does not make any MPI calls, but it uses the set of boxes the define the current distribution of the indexes and computes the overlap and the proc, offset, and sizes vectors for the receive stage of an all-to-all-v communication patterns. In addition, a set of unpack plans is created where the order of the boxes and the destination are different, which will transpose the data. The plan has to be used in conjunction with the transpose packer.
std::unique_ptr<reshape3d_alltoall<location_tag, packer, index> > heffte::make_reshape3d_alltoall | ( | typename backend::device_instance< location_tag >::stream_type | q, |
std::vector< box3d< index >> const & | input_boxes, | ||
std::vector< box3d< index >> const & | output_boxes, | ||
bool | uses_gpu_aware, | ||
MPI_Comm const | comm | ||
) |
Factory method that all the necessary work to establish the communication patterns.
The purpose of the factory method is to isolate the initialization code and ensure that the internal state of the class is minimal and const-correct, i.e., objects do not hold onto data that will not be used in a reshape apply and the data is labeled const to prevent accidental corruption.
location_tag | the location for the input/output buffers for the reshape operation, tag::cpu or tag::gpu |
packer | is the packer to use to parts of boxes into global send/recv buffer |
q | device stream |
input_boxes | list of all input boxes across all ranks in the comm |
output_boxes | list of all output boxes across all ranks in the comm |
uses_gpu_aware | use MPI calls directly from the GPU (GPU backends only) |
comm | the communicator associated with all the boxes |
Note: the input and output boxes associated with this rank are located at position mpi::comm_rank() in the respective lists.
std::unique_ptr<reshape3d_alltoallv<location_tag, packer, index> > heffte::make_reshape3d_alltoallv | ( | typename backend::device_instance< location_tag >::stream_type | q, |
std::vector< box3d< index >> const & | input_boxes, | ||
std::vector< box3d< index >> const & | output_boxes, | ||
bool | use_gpu_aware, | ||
MPI_Comm const | comm | ||
) |
Factory method that all the necessary work to establish the communication patterns.
The purpose of the factory method is to isolate the initialization code and ensure that the internal state of the class is minimal and const-correct, i.e., objects do not hold onto data that will not be used in a reshape apply and the data is labeled const to prevent accidental corruption.
location_tag | the location of the input/output buffers, tag::cpu or tag::gpu |
packer | is the packer to use to parts of boxes into global send/recv buffer |
q | device stream |
input_boxes | list of all input boxes across all ranks in the comm |
output_boxes | list of all output boxes across all ranks in the comm |
use_gpu_aware | use MPI calls directly from the GPU (GPU backends only) |
comm | the communicator associated with all the boxes |
Note: the input and output boxes associated with this rank are located at position mpi::comm_rank() in the respective lists.
std::unique_ptr<reshape3d_pointtopoint<location_tag, packer, index> > heffte::make_reshape3d_pointtopoint | ( | typename backend::device_instance< location_tag >::stream_type | q, |
std::vector< box3d< index >> const & | input_boxes, | ||
std::vector< box3d< index >> const & | output_boxes, | ||
reshape_algorithm | algorithm, | ||
bool | use_gpu_aware, | ||
MPI_Comm const | comm | ||
) |
Factory method that all the necessary work to establish the communication patterns.
The purpose of the factory method is to isolate the initialization code and ensure that the internal state of the class is minimal and const-correct, i.e., objects do not hold onto data that will not be used in a reshape apply and the data is labeled const to prevent accidental corruption.
location_tag | the tag for the input/output buffers, tag::cpu or tag::gpu |
packer | is the packer to use to parts of boxes into global send/recv buffer |
q | device stream |
input_boxes | list of all input boxes across all ranks in the comm |
output_boxes | list of all output boxes across all ranks in the comm |
algorithm | must be either reshape_algorithm::p2p or reshape_algorithm::p2p_plined |
use_gpu_aware | use MPI calls directly from the GPU (GPU backends only) |
comm | the communicator associated with all the boxes |
Note: the input and output boxes associated with this rank are located at position mpi::comm_rank() in the respective lists.
std::unique_ptr<reshape3d_base<index> > heffte::make_reshape3d | ( | typename backend::device_instance< typename backend::buffer_traits< backend_tag >::location >::stream_type | stream, |
std::vector< box3d< index >> const & | input_boxes, | ||
std::vector< box3d< index >> const & | output_boxes, | ||
MPI_Comm const | comm, | ||
plan_options const | options | ||
) |
Factory method to create a reshape3d instance.
Creates a reshape operation from the geometry defined by the input boxes to the geometry defined but the output boxes. The boxes are spread across the given MPI communicator where the boxes associated with the current MPI rank is located at input_boxes[mpi::comm_rank(comm)] and output_boxes[mpi::comm_rank(comm)].
Assumes that the order of the input and output geometries are consistent, i.e., input_boxes[i].order == input_boxes[j].order for all i, j.