next up previous contents
Next: Modes of operation Up: HECToR Previous: Introduction   Contents

Architecture

The scalar XT4 component comprises 1416 compute blades, each of which has 4 dual-core processor sockets amounting to a total of 11,328 cores which can be accessed independently. Each dual-core socket consisting of a single dual-core processor is referred to as a node. The processor used is an AMD 2.8 GHz Opteron. Each dual-core node shares 6 GB of memory. The theoretical peak performance of the system is 59 Tflops.

Each of the AMD Opteron cores has a floating point addition unit and a floating point multiplication unit. These units are independent of each other which means that an addition and a multiplication operation can take place simultaneously. The processor is capable of completing a single floating point operation from each of these units per cycle. Given the clock speed of 2.8 GHz this gives us a theoretical peak performance of 2 * 2.8 = 5.6 Gflops per core or 11.2 Gflops per dual core for double precision floating point operations.

The caches on each core are private. Unlike many systems there are no shared caches on HECToR. Each core has a separate 2-way set associative level 1 cache of 64 kB. The level 2 cache is a 16-way combined data and instruction cache totalling 1 MB. Both the level 1 and 2 caches use 64 byte cache lines, equating to eight double precision words. The level 2 cache acts as a victim cache for the level 1 cache which means that data evicted from the level 1 cache gets placed onto the level 2 cache.


next up previous contents
Next: Modes of operation Up: HECToR Previous: Introduction   Contents