Welcome to HECToRNews 5, August 2009
- New Training Courses
- Quad Core HECToR
- Programming Environment issues
- Distributed Support
This is the fifth Newsletter for HECToR users from the Computational Science and Engineering support (CSE) team of NAG Ltd. The HECToR newsletter intends to keep users updated with useful information on the national supercomputing service, for the previous issues please see here
In this issue we have information on HECToR related training courses, the hardware upgrade to phase 2a, general points regarding the HECToR programming environment and information on the distributed CSE support service.
New Training Courses
We are pleased to announce several new HECToR courses running in the next few months:
Quad Core Training - With the recent upgrade of the HECToR service to a system based upon AMD's Barcelona quad core chips users have new opportunities for exploiting the system, but also face new challenges in getting the best performance. In this two day course the new architecture will be described in detail, and its impact on performance will be examined by use of the profiling tools available on HECToR. We shall also examine the use of mixed mode OpenMP/MPI as a programming paradigm, and also briefly introduce the use of System V shared memory segments.
Parallel I/O - Input and Output (I/O) is often an under-considered part of a code but can severely limit its scalability. This course will present MPI-IO which allows a program to read and write to a single file from multiple processes. We will also take a look at the NetCDF and HDF5 libraries.
Best Practice in HPC Software Development - The course is designed for those with parallel programming experience who are embarking on a major software development project. It is a five day course and covers hardware, compilers and optimization, tools for the programmer including debugging and profiling, parallel I/O, testing and benchmarking code and portability and maintainability issues. All aspects of the course will be backed up with hands on exercises.
Core Algorithms for High Performance Scientific Computing - This course addresses the fact that many scientific calculations involve computational linear algebra and optimization. We will develop a solid grounding in the mathematics of these algorithms; discuss the efficient implementation in a range of standard libraries and demonstrate their effective use.
The Current Timetable
The current schedule is below:
- August 26, 2009 NAG Oxford - Introduction to HECToR
- September 7-9, 2009 University of Exeter - Fortran 95
- September 14-16, 2009 University of Exeter - Parallel Programming with MPI
- September 21, 2009 University of Exeter - Introduction to HECToR
- September 22-23, 2009 University of Exeter - OpenMP and Mixed-mode Programming
- September 28 - October 2, 2009 University of Warwick - Core Algorithms for High Performance Scientific Computing
- October 12-16, 2009 NAG Oxford - Best Practice in HPC Software Development
- October 19-23, 2009 NAG Manchester - Best Practice in HPC Software Development
- November 9-10, 2009 NAG Manchester - Quad Core Training
- November 11, 2009 NAG Manchester - DL_POLY
- December, 2009 (tba) Southern venue - Quad Core Training
These training courses run by NAG Ltd. are provided free of charge to HECToR users and UK academics whose work is covered by the remit of one of the participating research councils (EPSRC, NERC and BBSRC).
For more information on HECToR training, including the most up to date schedule, please see here, or contact [Email address deleted]
Quad Core HECToR
The main part of HECToR, namely the CRAY XT4, has now entered phase 2a of its operational life. In June and July all of the original 5664 dual core AMD opteron processors of the XT4 were replaced by quad core AMD opteron processors.
The upgrade has increased HECToR's theoretical peak performance from around 60TF to over 200TF. The maximum job size running on the compute nodes is now 4096 nodes or 16,384 cores.
There are software performance implications arising from the upgrade. The main one being that the actual amount of memory available per core has decreased from under 3GB to under 2GB. Users' codes will have less memory available on a per core basis even though the total amount of memory available per node has increased from 6GB to 8GB. For more information on code performance considerations for quad core please see here.
Please also note that the Seastar2 interconnect, Lustre filesystem and X2 Vector machine remain unaltered by the upgrade. For more details on the accounting for phase 2a onwards please see here.
Updated User Documentation
There are several good practice guides which have recently been updated with relevance to the quad core upgrade. They include guides from getting started on HECToR to improving code performance and scaling. For further information on HECToR user documentation please see here.
Some users may find that their codes only scale up to the corresponding dual core node count. In such cases, where users are not fully utilising all cores per node, they will require more compute time and AUs. This is because the AU charging mechanism is still based upon a per node allocation. If a user finds that their code is at a disadvantage due to the quad core upgrade, then they are encouraged to seek help by contacting the HECToR helpdesk. The NAG CSE team will give advice on how the code might be able to get full benefit from the hardware upgrade. For further information on how to prepare your codes for the upgrade please see here.
Resumption of Accounting
In early July HECToR successfully completed a series of acceptance tests involving benchmark cases. The rest of July has also been part of the acceptance testing when users were able to test and validate their codes' performance on quad core. In July utilisation of HECToR reached 90% or higher which is far greater than the normal 60-70% average during normal user accounting.
However, please note that the acceptance testing trials are now successfully complete and job accounting resumed at 8:00am on Monday 3rd August.
Programming Environment Issues
Standard Output, Standard Error and Lost Files
Standard Output and Standard Error from jobs running on the compute nodes appear in files placed in the job's working directory at the end of the run.
Writing a lot of information to stdout/stderr (e.g. printing within a loop) can overload the filesystem used to hold this output before it is written to disk. Recommended practice is to redirect stdout/stderr in your job script as follows:
aprun -n $NPROC -N $NTASK ./myexe >& stdouterr
This way messages can be monitored as the job is progressing, rather than having to wait for the stdout/stderr files to appear at the end of the run. Please see the File management section of the HECToR user guide.
Following this practice stdout/stderr files will still be generated, but the stdout file will contain job information and stderr should be empty. In some situations stdout/stderr files cannot be delivered into the working directory because of a lack of quota or another problem with your environment.
In this case, lost files may be retrieved from the following locations:
/work/pbs-spool/login1/spool /work/pbs-spool/login1/undelivered /work/pbs-spool/login2/spool /work/pbs-spool/login2/undelivered
Files in these locations count towards quota, so please check here if you think you have less disk space than you should. Note that files left in here will be purged after one month, as the assumption will be that you no longer require the files.
If you need any assistance with setting up your output files, please contact the helpdesk.
Sharing of User Accounts
All PIs and users are reminded that they must only use their own account to access HECToR. Account sharing between users is NOT permitted.
Users should not share their password, nor use anyone else's password to access the system.We are aware that some projects which maintain their own codes may have the need to share 'package' type accounts. Multiple members of the project may need access to edit code etc.
Shared package accounts can be set up for team use and differ to a basic user account. These accounts can be accessed by 'su' only. Users cannot log in directly to these accounts. This facilitates an audit of exactly which registered user is using the account at any one time.
If any PIs that have a requirement to use such accounts they should please come forward and let the helpdesk know. The package account can be configured, thus ensuring that the project adheres to the security policy.
Using all 4 cores per node
Please remember that if you wish to utilise all cores of a node you must set the PBS option mppnppn to 4. If you leave the value as 2, then you will only run on two cores per node, but you will be charged for the maximum available for the entire node which is now 4. These instructions along with the full explanation of what the quad core upgrade entails are available here.
NAG Fortran Compiler Compiler freely available for HECToR Users
The NAG Fortran Compiler is freely available to all HECToR users for checking that your Fortran code adheres to Fortran standards. Developed by NAG experts the Fortran Compiler is robust, highly tested, and valued by developers all over the globe for its checking capabilities and detailed error reporting.
From a default login shell access the compiler on HECToR with:
module swap PrgEnv-pgi PrgEnv-gnu module load Nag-f95
For the full documentation please see here. Additionally,
module load Nag-ftools nag_tools
will invokethe NAG Fortran Tools. This application consists of a useful GUI for transforming and analysing Fortran 77 and Fortran 90/95 code.
Alternatively,you can download the NAG Fortran Compiler here. If you chose this route to use the compiler you'll need to have a NAG licence key. To request a key as a HECToR user please email firstname.lastname@example.org quoting NAG/HECToR Compiler download.
Harness the extensive numerical functionality of the NAG Library
NAG's extensive and highly regarded numerical libraries are also available to HECToR users. This includes the NAG Fortran Library, NAG Parallel Library, specifically developed for MPI programming and the NAG SMP Library, which includes routines highly tuned for high performance computing systems.
You can access the NAG Fortran, Parallel, and SMP Libraries on HECToR with
module load xt-libnagfl, module load xt-libnagfd, module load xt-libnagfs respectively.
It has been discovered that certain types of NWChem runs can cause severe problems on HECToR and the following note only applies to users who compile their own version of the code. In these cases users should set the following environment variables in their job submission script just before the aprun line.
export SHMEM_SWAP_BACKOFF=150 export CRAY_PORTALS_USE_BLOCKING_POLL=yes aprun ....
These environment variables have already been added to the centrally installed version of NWChem on HECToR for use by module load nwchem.
Cray Compiler Environment
The Cray x86 compiler environment has been installed on HECToR on a trial basis such that it may become permanent if there is sufficient interest. This comes with full support for Unified Parallel C (UPC) and Co-Array Fortran (CAF) as well as Fortran and C compilers.
Use from a default login with module swap PrgEnv-pgi PrgEnv-cray
For more information on the Cray Fortran compiler please see here.
Users should be advised that SiCortex, the company behind Pathscale, recently ceased operating. The existing versions of Pathscale will continue to be available on the current HECToR architecture, however there will be no bug fixes or upgrades provided. Users will be advised if there are any future change to this. However if there are any immediate concerns then please contact helpdesk.
Casino4.2 Now Available
Casino 4.2 is now available on HECToR. Version 2.3 is still the default, but users can now access the newer code with
module load casino/2.4
Instructions for use are documented on the user wiki.
CP2K Now Available on HECToR
CP2K is a freely available program to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials.
Access to the binaries and source is available for all users via the module environment i.e.
module load cp2k
Instructions for use are documented on the user wiki.
New Interactive Job Monitoring Web Page
A new interactive web page is available allowing you to easily monitor your jobs and visualise them running on HECToR (without logging on to the machine).
Access to the page does require you to enter your Service Administration (SAFE) credentials for security reasons.
There is a link to the above page from the main HECToR website (click on "Current jobs" in the left-hand menu).
This is also referred to as dCSE support and funding is available to provide extended help with improving the performance of existing HECToR codes and developing high-performance algorithmic improvements for them. Support is also available to port new codes from other systems to HECToR. Awards to support proposed projects are assessed via an independent panel review.
There are several current projects that are underway. Projects to improve the performance of the NEMO ocean modelling code and the density functional code CP2K have recently completed. Also further information on the dCSE support service can be found here. The next application deadline is the 28th September 2009. Priority will be given to projects that propose specialist support to address any computational effects of the recent quad core transition and also those which can justify a reasonable AU saving impact for the wider HECToR community.
Applicants are advised to contact [Email address deleted] with a brief description of their proposed work beforehand. All applicants for the current round will be informed of the outcome of their proposals early November. NAG staff are available to visit institutions to talk about this service. If you are interested in a visit please contact us at [Email address deleted]