Migrating UM codes to the new NCI Fujitsu system is presently being tested. The following notes neccessary adjustments to the new system.

Please read NCI's Raijin User guide for details on how to set up your account & general differences between Vayu and Raijin.

Tested jobs are available under the UMUI folder uakn, you can compare your own jobs against these or make copies & edit them to suit the configurations you're interested in.

A number of modules have been installed specifically for ACCESS users, providing programs used in running UM jobs and analysing their results. To access these modules for your own use run the command
module use ~access/modules

The modules we have available include:
  • fcm build system
  • rose configuration system
  • cylc run suite scheduler
  • iris climate data analysis library
  • cdat-lite climate data analysis library (uv-cdat is also available as a NCI module)
To see the full list run 'module avail'. Let the CMS team know if you'd like other modules installed.

ACCESS 1.3 / UM7.3


An example UM7.3 job for Raijin is available in the UMUI as sabqb

Utility Libraries

Module versions loaded by the UMUI for 7.3 jobs are:
intel-cc/12.1.9.293
intel-fc/12.1.9.293
openmpi/1.6.3
netcdf/4.2.1.1
gcom/4.4
oasis3/3 # included for completeness, AMIP doesn't call oasis but there are references to it in the code
fcm
um

Branches

The branch settings to build the UM are found in the UMUI panel 'FCM Configuration->FCM Options for Atmosphere'.
You will need to tell the build system what compiler settings to use on Raijin. Do this by setting:
  • Set UM_SVN_BIND to fcm:um_dev/Share/VN7.3_local_changes/src/configs/bindings
  • Set UM_CONTAINER to $UM_SVN_BIND/container.cfg@HEAD
Once this is done set up the branches, depending on if you are using an ACCESS model or plain UM. Make sure you've set up a prebuild (see below), these significantly speed up the UM's build process.

UM7.3 (excluding ACCESS)


In order for the code changes needed on Raijin to be picked up by your jobs the following branch structure must be used:
  • Base Branch: fcm:um_tr, using default revision (vn7.3)
  • Branches:
    • fcm:um_dev/Share/VN7.3_local_changes, revision HEAD
    • Any user branches

ACCESS


ACCESS based models can't use this format - there was a code reformat in its past and due to the way FCM merges work it produces merge conflicts when combined with local_changes and trunk. Instead these should use the format
  • Base Branch: fcm:um_br/pkg/Rel/ACCESS1.3, using revision 4896
  • Branches:
    • fcm:um_dev/Share/VN7.3/access1.3_local_changes/src, revision HEAD
    • Any ACCESS user branches

Other UMUI modifications

You will of course have to set the Target Machine to 'raijin'. You should also set the username and project of the job to the exact strings '$USER' and '$PROJECT'. This means the correct values get automatically picked up from the environment so that if someone else wants to use your job they don't have to change these settings again. Both of these settings are found in the 'User Information and Target Machine' UMUI section.

You should also check the hand edit and override files. Some of these introduce platform specific settings which will cause errors when you try and run on Raijin. Things to watch out for are explicit paths (e.g. /data/projects/access doesn't exist on Raijin) and library versions that may not have been installed on the new machine.If you get build errors check these files, you can also ask for help by emailing the helpdesk mailto:climate_help@nf.nci.org.au.

A major difference between Vayu and Raijin is where ancillary data files are stored. Previously they were on a separate file system to the ~access directory, now the two filesystems have been combined. Ancillary and data files now reside in /projects/access/data (instead of /data/projects/access). You can easily convert a job to the new filesystem by adding the hand edit '~access/raijin/data-paths.sh', though if possible you should change the paths in the UMUI since this will make it easier to find the files you're using.

Other

  • CABLE has a hardcoded path to one of its ancils in the ACCESS1.3 source - this is fixed by the 'local_changes' branch
  • The ACCESS run depends on Oasis being available, this requires some changes to the build configuration. Add the file '~access/raijin/raijin-file.ovr' to the User Overrides in the Compilation section of the UMUI.
  • The 'match_solar_compile.sh' hand edit was originally designed so that results between Solar and Vayu were an exact match, it works by changing the modules that get loaded. Raijin is sufficiently different that this no longer works, symptoms include the ~/um_output directory not being created and errors from modules not being found. This hand edit should not be used on Raijin.
  • For ACCESS jobs the hand edit 'natforce.ed' depends on Vayu's directory structure. A version for Raijin is available as 'natforce-raijin.ed' in the same directory

UM7.5


No major changes are required to run 7.5 jobs on Raijin. Make sure that the target machine is 'raijin', any FCM override files are disabled and if you are using ancillary files from /data/projects/access add in the '~access/raijin/data-paths.sh' hand edit.

UM8.2


UM 8.2 has been set up by CAWCR & should be ready to use

UM8.4/8.5


An example UM8.4 job for Raijin is available in the UMUI as sabqc

At this stage any 8.4 jobs are likely to have been imported from the Met Office. To run on Raijin:

  • In User Information -> Job Submission select the 'PBS Pro' method, use 'linux' as the target machine, don't change the programming environment and use 'linux-ifort-nci' as the machine config. Maximum walltime can be set by pressing the PBSPro button
8.4-submission.png

  • In FCM Configurations -> Options for Atmosphere change UM_SVN_BIND to 'fcm:um_dev/vn8.4/local_changes/src/configs/bindings' and UM_CONTAINER to '$UM_SVN_BIND/container.cfg@HEAD'. Enable central script modifications, using the branch 'fcm:um_dev/vn8.4/local_changes/src' at version HEAD
8.4-fcm.png

  • In FCM Configurations -> Options for Jules set UM_SVN_URL to fcm:jules/trunk and the revision to vn8.5

You'll also need to make sure that the various output directories are set correctly

  • In Input/Output Control -> Time Convention and Environment set
DATAW=/short/$PROJECT/$USER/UM_ROUTDIR/$USER/$RUNID
DATAM=/short/$PROJECT/$USER/UM_ROUTDIR/$USER/$RUNID

  • In FCM Configurations -> FCM Extract directories set
UM_OUTDIR=$HOME/UM_OUTDIR
UM_ROUTDIR=/short/$PROJECT/$USER/UM_ROUTDIR

These settings mean that all the model output on the supercomputer will stay alongside the model source

Standard Tests


A standard test includes a model release (e.g. UM8.2) and a configuration release (e.g. GA4.0, ACCESS 1.3). The resolution of the model run is indicated by a value Ny, which describes a resolution of 2y x 3y/2.

The following standard tests are/will be available on Raijin:

ACCESS 1.3 AMIP N96


The ACCESS 1.3 configuration submitted to AMIP. Based on an early HadGEM3-A version, but uses CABLE's land surface instead of MOSES. Available at UM7.3

HadGEM3-A GA4.0 N96


(Under construction)

The Met Office HadGEM3 atmospheric climate model. This model is under active development (see the development site, email the CMS team for an account), the GA4.0 release version is the status as of 2012. Available at UM8.2

HadGEM3-A GA6.0 N96


(Will be released by the Met Office in the second half of 2013)

The Met Office HadGEM3 atmospheric climate model. This model is under active development (see the development site), the GA6.0 release version is the status as of 2013. This release includes the new ENDGame dynamical core for increased model stability. Available at UM8.5


Prebuilds

Prebuilds allow UM build jobs to use the results of a previous build, meaning that only files that have been changed or are affected by science section changes need to be rebuilt. For all jobs on Raijin you should use the settings (in FCM Configuration->FCM Options for Atmosphere):
  • UM_PREBUILD: ~access/prebuilds
  • UM_REM_PREBUILD: $UMDIR/prebuilds

You can in many cases just use a generic prebuild name - these are named after the model name & the build optimisation level, e.g. vn7.3_safe. The optimisation level in the prebuild name should match that of your own job. The following prebuilds are currently available on Raijin:
  • vn7.3_safe
  • vn7.3_access1.3_safe
Please let the help team know if other configurations would be useful.

Reproducability

Check the final abs values by running on the output file:
grep -i 'final abs' | head
If results are identical across Vayu and Raijin these numbers will match, differences indicate subtle changes in calculations. These differences can be caused by different CPUs, compilers and/or libraries.

ACCESS 1.3 AMIP N96

Raijin

Final Absolute Norm : 7.276954701768109E-003
Final Absolute Norm : 4.715066763113245E-003
Final Absolute Norm : 9.434324436544766E-003
Final Absolute Norm : 8.119724668540855E-003
Final Absolute Norm : 8.754816579023663E-003
Final Absolute Norm : 9.709622368623301E-003
Final Absolute Norm : 8.431032766056844E-003
Final Absolute Norm : 8.862476886181806E-003
Final Absolute Norm : 9.746810549474134E-003
Final Absolute Norm : 9.506179859159086E-003

Vayu

Final Absolute Norm : 7.276956451914318E-003
Final Absolute Norm : 4.715657804189433E-003
Final Absolute Norm : 9.433036729385244E-003
Final Absolute Norm : 8.127311324946143E-003
Final Absolute Norm : 8.737187347119101E-003
Final Absolute Norm : 9.758734323322731E-003
Final Absolute Norm : 8.330291277356124E-003
Final Absolute Norm : 8.961952297931315E-003
Final Absolute Norm : 9.903317054337587E-003
Final Absolute Norm : 9.932131133228725E-003

Benchmarks

These values are for order of magnitude only. Build times are for when you're not using a prebuild - with a prebuild on Raijin builds take under 2 minutes.

ACCESS 1.3 AMIP N96

Raijin Build

  • 4 cpu
    • 00:13 walltime
    • 00:19 cputime
    • 600 mb memory
    • 0.88 SU

Raijin Run (3 month NRUN)


  • 128 cpu '''uakoa''' (8ew x 16 ns)
    • 0:43 walltime
    • 90:30 cputime
    • 23 gb memory
    • 91.48 SU

  • 128 cpu (16ew x 8 ns)
    • Very similar results

  • 64 cpu '''uakob''' (8ew x 8 ns)
    • 1:04 walltime
    • 68:22 cputime
    • 14 gb memory
    • 68.8 SU

  • 192 cpu '''uakoc''' (24 ew x 8 ns)
    • 0:44 walltime
    • 145:05 cputime
    • 37gb memory
    • 143.15 SU

Vayu Build


  • 4 cpu
    • 0:21 walltime
    • 0:39 cputime
    • 1400 mb memory
    • 1.45 SU

Vayu Run

  • 128 cpu '''saaqb''' (8ew x 16 ns)
    • 1:10 walltime
    • 147:08 cputime
    • 27gb memory (60gb virtual)
    • 149.16 SU

HadGEM3-A GA4.0 N96