For a parallel benchmark, we chose a larger molecular system in the form of Buckminster-Fullerene, C. We ran the calculations on HECToR (phase 2a) for 5 iterations of the Davidson solver, and used 16 cores as a baseline. See figure 1 below. We have used the prototype shared memory extension coded by Chris Armstrong (NAG) under a core CSE call. With 4-way SMP, a parallel efficiency of approximately 80% could be achieved with 256 processing elements. Initial tests with HECToR phase 2b gave us comparable calculation times only when using 4 cores per node (one per die), again with 4-way SMP.
We plan to merge our TDDFT code with the main CASTEP branch in the near future. Minor modifications will be required for bands-parallel. Improvements to the parallel scaling can be expected to be inline with those for the ground state calculation.