As mentioned earlier, the 3D FFT is a key component of Quantum Espresso, and is implemented as a set of modules which are shared between several of the executables. In particular PW.x and PH.x both make use of the same FFT code, so optimisations made here will benefit both.
At a high level, the FFT grids are distributed as slices (planes) in real space, and as columns (rays) in Fourier space. The reason for the column distribution in Fourier space (often used in a full 2D decomposition), is that not every column will necessarily have the same number (or any) of non-zero Fourier coefficients. By dividing in columns, the number of columns will typically be much larger than the number of MPI processes, and hence the columns can distibuted among the processes to load balance the number of non-zero Fourier coefficients per process. A full description of this technique is presented by Giannozzi et al, 2004. Quantum Espresso maintains two grids of varying resolution, with the finer grid typically containing around twice as many grid points as the coarse grid. However, there are many more coarse grid FFTs performed each timestep.
As a result of this decomposition, each 3D FFT consists of a 2D FFT performed on local data in planes, a global transpose using MPI_Alltoallv across all processes, followed by a 1D FFT on the newly localised data in columns (or vice versa for the inverse transform).