The W216 test case was use to benchmark these changes. The table below gives the runtimes on a range of processor counts on HECToR and the resulting speedup: