HECToR

Tensor Manipulation and Storage

Tensor Network Theory (TNT) provides efficient and highly accurate algorithms for the simulation of strongly correlated quantum systems. The corresponding numerical algorithms enable approximate descriptions of many-body states and linear operators acting on them that do not grow exponentially with system size, in contrast to exact descriptions. While TNT algorithms are efficient, they are numerically demanding and require high-performance optimised and parallelised implementations. A TNT library is currently being developed at the University of Oxford to allow users to have access to the complex algorithms needed to solve these problems from their own high level codes.

This project is concerned with optimising and parallelising those parts of the TNT library that involve heavy computations, to enable important quantum effects in many-body systems to be studied. The initial objectives were:

Improving storage of the matrices by a more efficient usage of memory and parallelisation using OpenMP.
Developing more efficient and scalable calculations for the core functions of the TNT algorithms which are the most computationally demanding parts. This will mainly concern the contraction and SVD operations.
Improvements will be achieved by incorporating symmetry information that will allow a decomposition of the tensors into sub-blocks so that they may then be assigned to individual MPI processes.

On completion of this project the main achievements may be summarised as follows:

Efficiency of the core functions of the TNT library were improved by implementing better storage and reuse methods for the reshape operation and in the matrix-matrix calculations for the contract operation. OpenMP parallelism was developed for the reshape operatiod, good scalability with 8 threads can now be achieved.
Based upon the performance of the original code, a 10x speedup can now be achieved for the reshape operation and a 20x speedup for the contract operation.
Performance of the SVD operation was improved by introducing a new a tolerance below which values may be treated as zero. This enables the use of a blockwise approach where multiple but smaller SVDs and matrix multiplications can be performed, and now gives a representative 20x speedup.
For optimum tensor storage a distributed memory approach to the network was implemented. Non-blocking MPI with dynamic load balancing of the network partitioning was used, which enables concurrent parallel execution of up to half the sites in a network.
MPI data decompositions for the SVD and matrix-matrix multiply operations have also been introduced.
By distributing the network with MPI parallelism and the blocks with OpenMP, maximum parallel efficiency and better scope for future scalability to bigger problems can now be achieved.
The new code was benchmarked on the Theoretical Physics and Atomic and Laser Physics' cluster and Arcus cluster, both at the University of Oxford, and also on HECToR.
All developments have been incorporated into the CCPForge repository of the TNT library.

Please see PDF or HTML for a report which summarises this project.

Main web site navigation

Tensor Manipulation and Storage

In this section

Apply to ARCHER

Current Service Status