Skip to main content
You are not a member of this wiki.
ARC CoE for Climate System Science CMS Wiki
Computational Modelling Systems Wiki
Pages and Files
ARCCSS Home Page
This page presents links to documents and other resources relating the CMIP5 dataset
Before you go ahead with any analysis of the CMIP5 data I would like to stress out there's an
, detailing known issues with the models output.
This is most probably not a complete list, but it is better than nothing and it is worth to check it regularly and before publishing any results based on this dataset.
HadGEM2-ES piControl and rcp85 have some overlapping years, I've been updating the most commonly used variables, but there are still files on raijin which need updating.
24 May - most of the nodes are back online though still some of them (ie Chinese nodes and NCI) are offline and some haven't published again all their data
CMIP5 data access
To access the CMIP5 data collection you need to have an account with on of the ESGF (Earth System Grid Federation) nodes. This apply for licensing issue even if you're using the data replicated on raijin.
For documentation and tutorials on how to navigate the ESGF web interface to get and manage your account, search data collection online, and other information
CMIP5 data access: getting started
Other ESGF related videos:
UV-CDAT-1.2.0 ESGF Simple Demo
: simple demo to browse ESGF database and bring up files into UV-CDAT.
Downloading ESGF Data with Globus Online
: this video shows how to create an account with Globus Online to download ESGF/CSSEF Data.
Known issues with OpenID use
To access the data on raijin you need to become a member of ua6, request using your NCI account at
My NCI portal
NCI has made available the
CMIP community space
to CMIP users to provide a space were anyone belonging to the ua6 group can find information, having questions about CMIP data answered, check progress on preparation for CMIP6, notify data managers of issues with the data and provide useful feedback around the access and analysis of this dataset.
A Summary of the CMIP5 Experiment Design
CMIP5 Data Reference Syntax (DRS) and Controlled Vocabularies
(this file explains the official directory structure and filename encoding)
CMIP5 Model Output Requirements: File Contents and Format, Data Structure and Metadata
The List of Requested Model Output
(pdf format). This file contains table describing all the variables and their attributes and lists the priority for each variable, it is regularly updated.
To download in excel format
A useful paper:
An Overview of CMIP5 and the Experiment Design
A website with metadata records for each experiment:
Published CIM metadata
CMIP5 online errata sheet
Here you can find information for model output providers:
A Guide for Modeling Groups Participating in CMIP5
CMOR - Climate Model Output Rewriter Software
A list of all CMIP5 related publications:
How to cite CMIP5 publications
We entreat you who are authors or co-authors of publications based on CMIP5 model output to add citation information to the CMIP5 List of Publications at
. Instructions on how to do this can be found at:
. This should only take you a few minutes. Nearly 800 papers have already been registered there, and we hope you will add your contributions.
Registering your paper(s)
is vitally important because by documenting CMIP’s scientific impact, the modeling groups and those developing CMIP software infrastructure can continue to secure funding. We expect that those of you who have taken advantage of the CMIP5 archive will consider this an obligation.
If you and your co-authors have only
the CMIP5 project, but not analysed any of the CMIP5 output, no action is required.
Upon registering your publications, please be sure to provide the additional critical information requested, including which
models, experiments, and variables
you have used in your study. The information you provide will help us justify archiving these variables in future phases of CMIP.
Since anyone can enter citation information on your behalf, please check (by searching the alphabetical listing at
) whether or not any of your already registered papers include complete and correct information (e.g., please provide a DOI if it is missing).
If you have questions or concerns regarding the list of publications please email:
CMIP5 data replica on raijin
NB ACCESS and CSIRO MK models are published by NCI and so anything published for them is available on raijin under
A subset of the CMIP5 data archive has been replicated on the NCI facilities.
The disk with the data is mounted on raijin.nci.org.au as /g/data1/ua6/unofficial-ESG-replica/ .
On raijin itself a file listing the actual paths of all the CMIP5 replica files is available at:
/g/data1/ua6/unofficial-ESG-replica/tmp/tree/esg-tree-paths_latest.txt - lists all the CMIP5 files under the directory /g/data1/ua6/unofficial-ESG-replica/tmp/tree (updated every Monday)
CAWCR has also a summary of what's available on their wiki:
: this a table that lists the number of downloaded files by experiment and CMIP_table in unofficial-ESG-replica. By clicking on the number of files you can get a more detailed table for the relevant experiment and CMIP_table. These are updated every Monday.
/g/data1/ua6/drstree/ offers links to the CMIP5 data in a simplified directory structure
The <files> in this directory are symbolic links to the latest available (on raijin) version. They link to the sub-dir "files" which in turn contains links to the replica in /tmp/tree/ for all the available versions. The link names here are composed by the <filename>.<version>.
This is the directory used by the CWSlab workflow tool and is maintained by the CWSlab developers. There could be a time lag before newly downloaded or updated data appears here.
There have also been issues in the past with some links, whose number of characters was greater than the allowed length of symbolic links. Just keep this in mind if you're using this directory to access the data.
This is also where you'll find the entire collection of CMIP3 dataset:
CMIP5 database and ARCSSive module
You can download a sqlite database that lists all the CMIP5 data available on raijin.nci.org.au: (last update 31 August 2016).
The database has four tables: instances, versions, files and warnings. The available fields include tracking_id and checksums at file level and wherever possible versions.
You can find more information on the database including the schema in the
of NCI confluence.
You can browse this file using the free software
DB Browser for SQLite
, this is easy to install and use, it is free, it works on windows, mac and linux.
There is also a sqlite3 module you can load in a python script, if you want to use the database on raijin, the same file is available in
The easiest way to access the database is using the
ARCSSive python module
. This interface is easy to use since it does not imply knowledge of sql. It also imbed the
that provide a python interface to the ESGF web interface. In the "examples" directory there are a few scripts that shows how to include this module in a script but that can also be used to search for data and to compare what's available on raijin to what's available online. If you find there's value in this approach please give us some feedback and examples of ways you access this dataset.
Requesting data and/or help
If you need help to find out what is available on raijin or you want to request more fields, send an e-mail to firstname.lastname@example.org. Please specify experiment, model, frequency and a list of variables in your e-mail. If you need an extensive amount of data to be downloaded, please take sometime to work out which models/experiments and/or fields are a priority. Downloading can take time, so put in your requests as early as possible.
Before requesting data have a look at what's already available (more information on how to find out is in the "CMIP5 data replica on raijin" section) and check what has been published on one of the ESG nodes (
You can also use the compare_ESGF.py script available on the ARCCSSive github repository to check what is currently available on raijin and on the ESGF nodes and submit directly a request for new files to be downloaded. If you are interested e-mail me at email@example.com.
If you want to notify issues with the data contact:
if there are issues with the replicated files on raijin
if there are issues
with one of the ESGF portals or you have found errors in the files metadata and/or data, they will refer your e-mail to the relevant modelling group.
a new way to ask information and notify data managers is to use the NCI
CMIP community space
Analysis tool: VisTrails
CAWCR has developed an analysis tool, called VisTrails, to streamline analysis of CMIP5 datasets. You can find information and links to the CAWCR tool on
We recommend to use this tool for at least the first part of your analysis, since it can search and manage the very complicated CMIP5 directory structure for you*. As an example if you want to apply the same analysis to any available ensemble/model combination of monthly surface temperature, you can do that by specifying
frequency=mon, variable=tas and the pipeline itself will find out any available data responding to your criteria and will concatenate files into "ensembles" virtual files and then perform the various steps of your re-analysis by submitting queue jobs to the system.
There are several tasks you can perform which are included already and you can add your own tasks.
In order to use VisTrail you need to install and use VDI a remote desktop. NCI provides an online "
Introduction to VDI
" training course. You can analyse data interactively on the remote desktop without using the PBS queueing system, for more information see the VDI page.
*If you're wondering why check the data FAQ section
NB these scripts are not maintained anymore use the ARCSSive instead
To query the file list and get a subset of the data you can use the python script
. You can download this from the CMS github repository
This script uses the esg-tree-paths_latest.txt file and return for all the files matching a given set of constraints: variable, mip_table, model, experiment, ensemble, version (if available) and the ensemble path on dcc.
To download the file:
To see how to use it just type:
python search_CMIP5_replica.py –h / --help
The two Amon_only and other_fields tables are an example of the output.
Added two new scripts to replace fetch_CMIP5.py,
fetch_step1.py and fetch_step2.py:
given a variable/model/experiment input will return a list of anything that has been published and if it is or not on raijin.
To do that checks first if file exists on raijin and if does uses checksum to check it is the same as latest available version.
You can use it without specifying models but it need at least one variable expressed as variable_mip-table and one experiment.
It has an option that produces a csv table summarising the results.
We split the process in two steps to avoid misusing the system when search returns lots of files. Now you can run the online search with the first step, check quickly how many files are return and then decide based on that to run the second step either interactively or submit a job to the queue. In the second case you can choose to use more than one processor to speed up the job. This should only be necessary if the search returned thousands of files. If in doubt submit to the queue.
To see how to use it just type:
python fetch_step1.py –h / --help
Model group webpages
CNRM contribution to CMIP5
A very useful presentation from Will Hobbs about using CMIP5 data from a researcher perspective:
An introduction to dealing with multi-model ensembles (or how I learned to stop worrying and love CMIP5)
Will Hobbs also has started to collect some of the output of his CMIP5 ocean data processing in the ARCCSS ua8 project on raijin: /g/data1/ua8/CMIP5_ocean_processing/ .
Anyone from the ARCCSS or collaborators can request access to this project.
Accessing ESG data through netCDF library
It is possible to access Earth Systems Grid (ESG) datasets from ESG servers through the netCDF API. This requires building the netCDF library with DAP2 protocol support using the "--enable-dap" flag to the configure program.
For the complete instructions follow the
You can use RSS feeds to keep up with CMIP5 updates. The complete documentation is on the
You can build a rss feed url by using as root the url of any of the ESGF nodes followed by /esg-search/feed/ + any constraints you want using the same vocabulary used to build searches for data. As an example if you're interested in receiving updates on air temperature you can use the following url
help on how to format text
Contributions to http://climate-cms.unsw.wikispaces.net/ are licensed under a
Creative Commons Attribution Non-Commercial 3.0 License
Turn off "Getting Started"