The serial build of netCDF 4.0 is relatively straightforward. The following commands allow a serial version of netCDF 4.0 to be compiled:
make distclean export FC='ftn -O2' export F90='ftn -O2' export F95='ftn -O2' export CC='cc -O2' export CXX='CC -O2' export NM=nm export CPPFLAGS=-DpgiFortran export LOGFILE=build_netcdf4.0_noparallel.txt export CHECKFILE=check_netcdf4.0_noparallel.txt ./configure --enable-netcdf-4 \ --with-hdf5=/work/n01/n01/fionanem/local/noparallel \ --with-zlib=/work/n01/n01/fionanem/local \ --with-szlib=/work/n01/n01/fionanem/local --disable-cxx \ --disable-parallel-tests \ --prefix=/work/n01/n01/fionanem/local/noparallel &> $LOGFILE
The CPPFLAGS variable is a macro which is required by the PGI compiler suite - otherwise the build fails. The -disable-cxx prevents the C++ API from being built. The build fails when attempting to link the shared libgcc_s otherwise. The -disable-parallel-tests ensures that the parallel components of the library and testers do not get built.
The make, make check and make install can all be executed on the login nodes as no parallel elements are involved. The tester codes all pass without error.
Building the parallel version of netCDF 4.0 proved to be more problematic due to various cross-compilation issues. At several stages of the build process, executables are generated and then run. Unlike HDF5, netCDF 4.0 does not contain any environment settings for cross-compilation (e.g. the RUNSERIAL and RUNPARALLEL mentioned above). Any parallel executables are either invoked via mpiexec which is not valid for HECToR or on the command line via ./exename. This means that any steps of the build process which run parallel executables must be extracted and run separately via the batch system.
The first such problem arises during the make, where ncgen is executed in order to generate the ctest.c and ctest64.c files. As ncgen is a parallel executable it cannot run on the login nodes. The error message reported is:
[unset]: _pmi_init: _pmi_preinit encountered an internal error Assertion failed in file /tmp/ulib/mpt/nightly/3.0/042108/xt/trunk/mpich2/.. .. src/mpid/cray/src/adi/mpid_init.c at line 119: 0 aborting job:
The solution is to execute the two runs of ncgen on the backend via a batchscript and then to continue the make on the login node once the batch job has completed.
A similar problem occurs during the make check where 18 testers fail for the same reasons. The error messages are of the form:
 assertion: st == sizeof ident at file mptalps.c line 93, pid 25085 FAIL: tst_dims  assertion: st == sizeof ident at file mptalps.c line 93, pid 25090 FAIL: tst_files ... Testing parallel I/O with HDF5... SUCCESS!!! PASS: run_par_tests.sh ========================================= 18 of 36 tests failed Please report to email@example.com =========================================
Again, the error occurs because the tester codes are parallel (i.e. contain MPI calls) and cannot run on the login nodes of HECToR. As before, the solution is to run these eighteen testers on the backend via a batchscript.
The flags used to compile the parallel version of netCDF are summarised below:
make distclean # Ensure we start with a clean install export FC='ftn -O2' export F90='ftn -O2' export F95='ftn -O2' export CC='cc -O2' export CXX='CC -O2' export NM=nm export CPPFLAGS=-DpgiFortran export LOGFILE=build_netcdf4.0_parallel.txt export CHECKFILE=check_netcdf4.0_parallel.txt ./configure --enable-netcdf-4 \ --with-hdf5=/work/n01/n01/fionanem/local/parallel \ --with-zlib=/work/n01/n01/fionanem/local \ --with-szlib=/work/n01/n01/fionanem/local --disable-cxx \ --enable-parallel-tests \ --prefix=/work/n01/n01/fionanem/local/parallel &> $LOGFILE
The CPPFLAGS and -disable-cxx are as described for the
serial installation. The -enable-netcdf-4 ensures that the netCDF
4.0 features are enabled. The
-enable-parallel-tests ensures that the parallel tests are executed.
After configuration completes the procedure for compiling and testing the parallel version of netCDF 4.0 is as follows:
aprun -n $NPROC ../ncgen/ncgen -c -o ctest0.nc ./../ncgen/c0.cdl > ./ctest.c aprun -n $NPROC ../ncgen/ncgen -v2 -c -o ctest0_64.nc ./../ncgen/c0.cdl > ./ctest64.c
The serial and parallel tester codes are all found to run successfully confirming that our installation of both the serial and parallel versions of netCDF 4.0 has been successful.