The following charts show the performance of the UM running GA7 at various processor decomposition, to help you decide the best tradeoff between walltime and SU cost

While increasing the number of CPUs in a run will speed it up, as you add more CPUs the model becomes more inefficient - more time gets spent by the CPUs communicating instead of computing.

This data was calculated from a number of 10 model day runs. Real run times will also be affected by writing output, and time spent in the queue can increase the walltime spent waiting for the model to complete considerably (changing the restart length / walltime to different values may improve this)

Changing the science settings can also improve the model performance - for example for longer runs you might consider using the `EasyAerosol` scheme instead of the default aerosols.

Recommended settings


We recommend for general GA 7 runs running on Raijin's Broadwell nodes the following settings:

Description
Config Setting
Value
Comment
Restart length
RESUB
P4Y
4 model years
Walltime
CLOCK
'24:00:00'
24 hours
X Decomposition
PE_ATM_NPROCX
34
Y Decomposition
PE_ATM_NPROCY
28
OpenMP Threads
OMPTHR_ATM
1

This will run 4 model years in approximately 18 wall hours, and cost 21 kSU

To conserve SU, a decomposition of 18 x 28 will run 4 model years in 24 hours and cost 15 kSU

To speed up the model (at the expense of SU cost) it is important to enable OpenMP threads, by setting OMPTHR_ATM to 2, otherwise the model becomes very inefficient at CPU counts above 1000 (the total CPU count is NPROCX * NPROCY * OMPTHR)

Performance by CPU count


NCI recommend starting out choosing a decomposition setting by minimising the value (walltime x SU cost), which can be seen below for UM versions 10.6 and 11.0