next up previous contents
Next: Parallel Renderer Communication Up: Image Compositing Previous: Image Compositing   Contents


Compositor Performance

Figure [*] shows the performance of various compositing operations. The top row shows timings for one particular rank for an image of size $512^2$ pixels and $1024^2$ pixels. The rank chosen is the middle rank for each number of render processes (any arbitrary rank could be used). This gives a snapshot of what one particular render process is doing during image compositing. All times are totals for the entire image compositing operation. The Image exchange time is the total time the middle rank process spends exchanging screen-space images with other processes during the compositing operation. The Image blend time is the total time spent blending image pixels (using depth testing). The Image final gather time is time spent sending the blended sub-image to the single process (usually the first pstmpunode render process) that gathers the sub-images from all other render processes. Note that the gather operation writes the sub-images directly in to the final image buffer. The Total time is the sum of the previous three operations. Even the slowest compositing times correspond to a frame rate of 90fps for a $512^2$ window and 30fps for a $1024^2$ window.

Figure: Image compositing operations for two window sizes (top row). Final composited image send / recv through the MPI proxy (bottom row).
\resizebox{60mm}{!}{\includegraphics{Stats/comp/comp-512-plot.ps}} \resizebox{60mm}{!}{\includegraphics{Stats/comp/comp-1024-plot.ps}}
\resizebox{60mm}{!}{\includegraphics{Stats/comp/comp-finalsr-512-plot.ps}} \resizebox{60mm}{!}{\includegraphics{Stats/comp/comp-finalsr-1024-plot.ps}}

The second row shows the time required for the express process to receive the image through the MPI proxy, and the time for the single render process to send the image to the proxy. The sending time is small because this is occurring within the Cray MPI layer. The receiving time is significant and will limit the overall frame rate of the renderer. This image transfer occurs over the tcp/ip socket through which express and the MPI proxy process communicate.

We have added an OpenMP wrapper around the Paracomp library's core image compositing routine. This allows the blending time to be reduced in the compositing operation for large images. A trade-off is needed between the number of pstmpunode processes running on a node (usually two or four) against the number of OpenMP threads dedicated to image blending. However, given that the image transfer from the MPI proxy to express is the limiting factor, it is recommended that only one OpenMP thread be used. This allows the number of pstmpunode processes on a node (mppnppn) to be increased.


next up previous contents
Next: Parallel Renderer Communication Up: Image Compositing Previous: Image Compositing   Contents
George Leaver 2010-07-29