During your work at the Centre, you are likely to produce, use and share data on different systems. You will probably have access to two different systems: your University system and NCI.

Storage at NCI


NCI provides two types of storage: tape and disk. Tape is for long term storage while disk is more suited to store data you need to access often.

Tape at NCI


The tape system at NCI is called /massdata. Please read the Users' guide to learn how to use this system. Here are a few important points to keep in mind:
  • Tape is mostly appropriate for archiving data.
  • You should only store big files on tape. If you want to migrate a lot of small files, you should first archive them together. To learn how to do that please have a look at the Users' guide and email your questions to us.
  • /massdata is only accessible from the login nodes (interactive) or via a script submitted to the copyq queue. It is generally recommended to use the copyq queue as you then have a much longer run time.
  • Tape access (writing and reading) is slow.
  • Considering data on /massdata are likely to be un-used for a long time it is quite essential to document your data. For example adding a detailed README file to your data folder can help a lot.
  • Storage update for quota is only updated daily, overnight for massdata because of the size of it. It is then recommended to act quickly for clean up and if possible before breaching the quota.

Additional storage

It might be possible to add quota on massdata for your project to do so please send an email to us detailing how much additional space you would like and which NCI project it is for. Again, being tape storage the request might take some days to be processed so please plan ahead.

Disk at NCI


There are three different disk filesystems at NCI, each with a slightly different purpose. NCI also has a Users' Guide. All the disk filesystems are accessible from the login nodes and the compute nodes hence you can read/write to one while currently being in an other filesystem. And all these filesystems have access to massdata either through login nodes connection or sending a script to copyq queue.

/home

  • This is your home directory.
  • This space is strictly limited at 2GB for each user but it is backed up.
  • It is most suitable for storing source code rather than model outputs or observation datasets.
  • You can monitor your use on home with the "lquota"[1] command.

/short

  • All projects have some storage on /short.
  • The amount of space varies from a project to another.
  • The management of the space is left to the responsibility of the members of each project. Although automatic emails are sent to the project members when the usage comes close to fill in the quota.
  • When a project fills its quota on /short, the project's members will not be able to use the computing queues except for copyq queues to help with moving data around.
  • To monitor the overall usage, please use the "nci_account"[1] command. To monitor the usage per user, please use "short_files_report -P $PROJECT" [1].
  • Increasing the quota on /short for a project might be possible but is left to the decision of NCI staff. If you want to try to have an extension, please send us an email.

/g/data

  • Most projects now have some storage on /g/data1 or /g/data3.
  • As for /short, the quota on /g/data is per project with management of the usage the sole responsibility of the project's members.
  • /g/data can be less stable than /short. As such it is recommended to use the special PBS resource: #PBS -lother=gdata. Then your job will only start when the /g/data filesystem is accessible.
  • Compute nodes have both read and write access to this filesystem.
  • Your project's quota on /g/data is not extensible by simple email. There is a review of some of the quotas every year, at which point some projects might be granted an increase. If you need additional storage before then, please consider deleting old or incorrect data, archiving old data to /massdata, using temporary storage or your University system. If you still want an increase to be considered at review time, please make sure to discuss it with the Lead CI of your project who will be part of the review.
  • To monitor usage, please use "nci_account" [1]. To see a per-user summary run the command "gdata1_files_report -P $PROJECT"

Temporary storage


The CMS is also managing two projects on NCI that can be used for temporary storage of data. Both are mounted on /g/data and have the same characteristic as other /g/data storage space as explained above.

Before using any space on these projects, you need to:
  • request connection to the project if you are not yet a member. You can check which projects you are part of with the "groups" command.
  • fill a storage request at http://dmp.climate-cms.org:3000/ Please email us if you have any question about this tool. Note this form is principally to enable us to monitor the space used, requested and available. It also enables us to prepare a folder for you with appropriate permissions. The forms are very short and quick to fill and the storage is usually ready for use within a few hours. See this page for more detailed instructions on how to fill the form.
The temporary storage projects are:
  • /g/data3/hh5: this project is for short temporary use (~3 months). It could be used for example to print your raw model outputs, then you would save a subset or a reformatted version to your project's space and move the raw outputs to /massdata for safekeeping.
  • /g/data1/ua8: the main purpose of this project is to store published datasets created by the Centre's staff. For example, some journals now request that researchers publish their data in parallel to their papers. It is also used for small downloaded datasets that are shared across the CoE and do not have their own "project". However, the free space in this project can be used as temporary storage. Our preference would be to store data that might potentially require publication.


Storage at Universities


at ANU
at Monash
at UMelb
at UNSW
at UTas


File compression and archiving


For an efficient use of storage, there are a few rules to keep in mind:
  • Nowadays it is better to store a few larger files than lots of smaller files. It is hard to define large and small but files of several tens of gigabytes are absolutely acceptable. That said it is best practice to compress your data when possible.
  • Netcdf files are now easily compressible, see this article for detailed explanations on tools available at NCI.

To store small setup files that define your experiments, think about using the "tar" command. This is a shell command with a manual accessible through
man tar
This command will save many files together in a single archive, it can be used on a directory tree and will restore the directory structure when restoring the files from the archive. This means if you have several experiments you need to save the setup of, the best way might be to create a directory tree containing the setup files of all the experiments then create one single archive file for all. The archive files can also easily be compressed/uncompressed using the gzip utility either at the archive creation time or afterwards.

[1]: see here for usage of these commands.