This page presents links to documents and other resources relating the CMIP5 dataset


Before you go ahead with any analysis of the CMIP5 data I would like to stress out there's an errata page, detailing known issues with the models output.
This is most probably not a complete list, but it is better than nothing and it is worth to check it regularly and before publishing any results based on this dataset.
HadGEM2-ES piControl and rcp85 have some overlapping years, I've been updating the most commonly used variables, but there are still files on raijin which need updating.

24 May - most of the nodes are back online though still some of them (ie Chinese nodes and NCI) are offline and some haven't published again all their data

CMIP5 induction


We created a CMIP5 induction on the NCI online training website. We recommend to anyone starting with this dataset to go through it. This can also be used as reference and the includes a detailed training on how to use the ARCCSSive module.

NCI has made available the CMIP community space to CMIP users to provide a space were anyone belonging to the ua6 group can find information, having questions about CMIP data answered, check progress on preparation for CMIP6, notify data managers of issues with the data and provide useful feedback around the access and analysis of this dataset.

CMIP5 data access


To access the CMIP5 data collection you need to have an account with on of the ESGF (Earth System Grid Federation) nodes. This apply for licensing issues even if you're using the data replicated on raijin.
For documentation and tutorials on how to navigate the ESGF web interface to get and manage your account, search data collection online, and other information
CMIP5 data access: getting started
ESGF tutorials

Other ESGF related videos:
- UV-CDAT-1.2.0 ESGF Simple Demo : simple demo to browse ESGF database and bring up files into UV-CDAT.
- Downloading ESGF Data with Globus Online : this video shows how to create an account with Globus Online to download ESGF/CSSEF Data.

Known issues with OpenID use

To access the data on raijin you need to become a member of ua6, request using your NCI account at
My NCI portal

CMIP5 documentation


A Summary of the CMIP5 Experiment Design
CMIP5 Data Reference Syntax (DRS) and Controlled Vocabularies (this file explains the official directory structure and filename encoding)
CMIP5 Model Output Requirements: File Contents and Format, Data Structure and Metadata
The List of Requested Model Output (pdf format). This file contains table describing all the variables and their attributes and lists the priority for each variable, it is regularly updated.
To download in excel format standard_output.xls
A useful paper: An Overview of CMIP5 and the Experiment Design
A website with metadata records for each experiment: Published CIM metadata
CMIP5 online errata sheet

Here you can find information for model output providers:
A Guide for Modeling Groups Participating in CMIP5
CMOR - Climate Model Output Rewriter Software

A list of all CMIP5 related publications:
CMIP5 publications

How to cite CMIP5 publications


We entreat you who are authors or co-authors of publications based on CMIP5 model output to add citation information to the CMIP5 List of Publications at http://cmip.llnl.gov/cmip5/publications/allpublications . Instructions on how to do this can be found at: http://cmip-pcmdi.llnl.gov/cmip5/publications.html . This should only take you a few minutes. Nearly 800 papers have already been registered there, and we hope you will add your contributions.
Registering your paper(s) is vitally important because by documenting CMIP’s scientific impact, the modeling groups and those developing CMIP software infrastructure can continue to secure funding. We expect that those of you who have taken advantage of the CMIP5 archive will consider this an obligation.
If you and your co-authors have only cited the CMIP5 project, but not analysed any of the CMIP5 output, no action is required.
Upon registering your publications, please be sure to provide the additional critical information requested, including which models, experiments, and variables you have used in your study. The information you provide will help us justify archiving these variables in future phases of CMIP.
Since anyone can enter citation information on your behalf, please check (by searching the alphabetical listing at http://cmip.llnl.gov/cmip5/publications/allpublications ) whether or not any of your already registered papers include complete and correct information (e.g., please provide a DOI if it is missing).
If you have questions or concerns regarding the list of publications please email: taylor13@llnl.gov .

CMIP5 data replica on raijin


NB ACCESS and CSIRO MK models are published by NCI and so anything published for them is available on raijin under
/g/data/rr3/authoritative/IPCC/CMIP5, the previous authoritative folder under ua6 is still available but will be removed in October 2017.

A subset of the CMIP5 data archive has been replicated on the NCI facilities.
The disk with the data is mounted on raijin.nci.org.au as /g/data1/ua6/unofficial-ESG-replica/tmp/tree .
We recommend to use our sqlite database, cmip5_raijin_latest.db, in conjunction with the python module ARCCSSive to search the dataset.
module use /g/data3/hh5/public/modules
module load conda/analysis27 for python2.7
module load conda/analysis3 for python 3.6

DRSV2
.
DRSv2 is the latest version of the drstree symbolic link structure
/g/data1/ua6/DRSv2/ offers links to the CMIP5 data in a simplified directory structure
CMIP5/<model>/<experiment>/<frequency>/<realm>/<ensemble>/<variable>/latest/<files>

The <files> in this directory are symbolic links to the latest available (on raijin) version.
This is the directory used by the CWSlab workflow tool and is maintained by the CWSlab developers. There could be a time lag before newly downloaded or updated data appears here.
There have also been issues in the past with some links, whose number of characters was greater than the allowed length of symbolic links. Just keep this in mind if you're using this directory to access the data.

The entire CMIP3 dataset is only available on the older drstree:

/g/data1/ua6/drstree/CMIP3/GCM/<institute>/<model>/<experiment>/<frequency>/<realm>/<variable>/<ensemble>/<files>

CMIP5 database and ARCSSive module

We have built a sqlite database that lists all the CMIP5 data available on raijin.nci.org.au. This is updated regularly every time new data is added to the collection.
The database has four tables: instances, versions, files and warnings. The available fields include tracking_id and checksums at file level and wherever possible versions.
You can find more information on the database including the schema in the CMIP5 section of NCI confluence.
You can browse this file using the free software DB Browser for SQLite, this is easy to install and use, it is free, it works on windows, mac and linux.
There is also a sqlite3 module you can load in a python script, if you want to use the database on raijin, the actual file is available in
/g/data1/ua6/unofficial-ESG-replica/tmp/tree/cmip5_raijin_latest.db .

The easiest way to access the database is using the ARCSSive python module. This interface is easy to use since it does not imply knowledge of sql. It also embeds the pyesgf module that provides a python interface to the ESGF web interface. In the "examples" directory there are a few scripts that shows how to include this module in a script but that can also be used to search for data and to compare what's available on raijin to what's available online. If you find there's value in this approach please give us some feedback and examples of ways you access this dataset.
It is very easy to use for simple search but you can integrate it and perform more complex search once you're more familiar with it. The bext way to start up is using the specific ARCCSSive training on the NCI online training website.

Requesting data and/or help


If you need help to find out what is available on raijin or you want to request more fields, send an e-mail to climate_help@nf.nci.org.au. Please specify experiment, model, frequency and a list of variables in your e-mail. If you need an extensive amount of data to be downloaded, please take sometime to work out which models/experiments and/or fields are a priority. Downloading can take time, so put in your requests as early as possible.
Before requesting data have a look at what's already available (more information on how to find out is in the "CMIP5 data replica on raijin" section) and check what has been published on one of the ESG nodes (PCMDI ESGF node is recommended).
You can also use the compare_ESGF.py script available with the ARCCSSive github repository to check what is currently available on raijin and on the ESGF nodes and submit directly a request for new files to be downloaded. For instruction again refer to the available training.

If you want to notify issues with the data contact:
  • climate_help@nf.nci.org.au if there are issues with the replicated files on raijin
  • esgf-user@lists.llnl.gov if there are issues with one of the ESGF portals or you have found errors in the files metadata and/or data, they will refer your e-mail to the relevant modelling group.
  • a new way to ask information and notify data managers is to use the NCI CMIP community space

Analysis tool: VisTrails


CAWCR has developed an analysis tool, called VisTrails, to streamline analysis of CMIP5 datasets. You can find information and links to the CAWCR tool on this page .
We recommend to use this tool for at least the first part of your analysis, since it can search and manage the very complicated CMIP5 directory structure for you*. As an example if you want to apply the same analysis to any available ensemble/model combination of monthly surface temperature, you can do that by specifying
frequency=mon, variable=tas and the pipeline itself will find out any available data responding to your criteria and will concatenate files into "ensembles" virtual files and then perform the various steps of your re-analysis by submitting queue jobs to the system.
There are several tasks you can perform which are included already and you can add your own tasks.
In order to use VisTrail you need to install and use VDI a remote desktop. NCI provides an online "Introduction to VDI" training course. You can analyse data interactively on the remote desktop without using the PBS queueing system, for more information see the VDI page.
*If you're wondering why check the data FAQ section

Model group webpages


ACCESS wiki
CNRM contribution to CMIP5

Other


A very useful presentation from Will Hobbs about using CMIP5 data from a researcher perspective: An introduction to dealing with multi-model ensembles (or how I learned to stop worrying and love CMIP5)
Will Hobbs also has started to collect some of the output of his CMIP5 ocean data processing in the ARCCSS ua8 project on raijin: /g/data1/ua8/CMIP5_ocean_processing/ .
Anyone from the ARCCSS or collaborators can request access to this project.

Accessing ESG data through netCDF library
It is possible to access Earth Systems Grid (ESG) datasets from ESG servers through the netCDF API. This requires building the netCDF library with DAP2 protocol support using the "--enable-dap" flag to the configure program.
For the complete instructions follow the link

RSS feeds
You can use RSS feeds to keep up with CMIP5 updates. The complete documentation is on the ESGF wiki.
You can build a rss feed url by using as root the url of any of the ESGF nodes followed by /esg-search/feed/ + any constraints you want using the same vocabulary used to build searches for data. As an example if you're interested in receiving updates on air temperature you can use the following url
http://esgf-node.llnl.gov/esg-search/feed/cf_standard_name/air_temperature.rss