Monday, April 20, 2020

How to find CMIP6 Data Citations (machine-access)?

The last post on April 2, 2020, explained how a human user can find CMIP6 data citations. For use cases where the data citation information was not stored during ESGF data download and many datasets have been analyzed, a script-based data citation access is required.

There are several APIs available at DKRZ, which are documented at https://www.wdc-climate.de/ui/cmip-api-docs/, and one provided by DataCite:

1. Citation Search API

In addition to the Citation Search GUI for human users, the Citation Search API provides flexible machine-access to selected CMIP6 data citations in JSON format. The response contains all components of a data reference and data use information:

http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search

Available filter options:

  • filter by DRS:
    •     mipEra
    •     activityId
    •     institutionId
    •     sourceId
    •     experimentId
  • filter by granularity:
    •     granularity=[exp|model]
  • filter by date (in ISO 8601 format):
    •     gePublicationDate=YYYY-MM-DD: DOI published at or after a given date
    •     lePublicationDate=YYYY-MM-DD: DOI published before or at a given date


Sample Calls:

  1. Data references on experiment (fine) granularity for a given source_id and activity_id:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=CMIP&sourceId=HadGEM3-GC31-MM&granularity=exp
  2. Update data references of request 1. with data references published at or after a given date:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=CMIP&sourceId=HadGEM3-GC31-MM&granularity=exp&gePublicationDate=2020-01-01
  3. Data references on model/MIP (coarse) granularity contributing to an activity_id available at a given snap-shot date:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=ScenarioMIP&granularity=model&lePublicationDate=2020-03-31

 

2. Direct access using DRS_id

The content of the CMIP6 DOI landing pages is provided in two additional machine-readable formats: JSON and XML. The underlying metadata standard is that of DataCite 4 (see documentation: https://doi.org/10.14454/7xq3-zf69; schema definition: http://schema.datacite.org/meta/kernel-4/metadata.xsd):

http://cera-www.dkrz.de/WDCC/meta/CMIP6/
<mip_era>.<activity_drs>.<institution_id>.<source_id>[.<experiment_id>].[json|xml]


For possible values of the DRS (Data Reference Syntax) components, please check the CMIP6 Controlled Vocabulary at:
https://github.com/WCRP-CMIP/CMIP6_CVs


Example calls for json format:

a. Model/MIP granularity: 

http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.json

b. Experiment granularity:  

http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.json


It is possible to use the ESGF Search API to collect these JSON urls for the 'experiment granularity' from the ESGF index. These 'citation_url's are part of every dataset information. More information on the ESGF Search API is available at: https://esgf.github.io/esg-search/ESGF_Search_RESTful_API.html

3. API to list data citations based on DRS components

A list of available CMIP6 data citations in a simple JSON response can be requested via an API:
https://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6Citations

Available attributes are combined as logical AND: institutionId, sourceId, complete (true|false), drsId.

4. DataCite RestAPI to list data citations based on DRS components

DataCite also provides a RestAPI for the access of CMIP6 citation information with a registered DOI. It is documented at: https://support.datacite.org/docs/api . Examples for DataCite RestAPI requests are:
    1. Access of all CMIP6 DOIs:
      https://api.datacite.org/dois?query=publisher:Earth%20System%20Grid%20Federation
    2. Search through the entries in the JSON response to identify them by their DRS under 'attributes/subjects/subject' with subjectScheme='DRS', e.g.

                    

References and Links:
CMIP6 Citation Service: https://cmip6cite.wdc-climate.de
CMIP6: https://pcmdi.llnl.gov/CMIP6/
CMIP6 Registration/CV: https://github.com/WCRP-CMIP/CMIP6_CVs
DKRZ API documentation: https://www.wdc-climate.de/ui/cmip-api-docs/
DataCite: https://datacite.org
DataCite API documentation: https://support.datacite.org/docs/api

Thursday, April 2, 2020

How to find CMIP6 Data Citations?

The IPCC AR6 Part 1 is in its Second Order Draft review. Publications to be included in the AR6 are published. The question about how to find a CMIP6 data reference comes up more frequently. The data citation recommendation is part of the DOI landing page for each data collection. But how to find that?

The different options are discussed in this post. A second blog post discusses machine-accessible options.


1. ESGF CoG

The best and least time-consuming option is to check the CMIP6 data citation at the time of data download in the ESGF CoG portal, e.g. https://esgf-data.dkrz.de/search/cmip6-dkrz/.
Use 'Show Citation' and follow the provided link to the landing page.





























2. CMIP6 Citation Service search interface

In case the CMIP6 data citations were not stored at the time of data download, the CMIP6 Citation Service offers a dedicated search interface at: http://bit.ly/CMIP6_Citation_Search. A detailed documentation with use cases is available here.

The result list includes data references for both granularities offered. For filtering a simple search using the magnifying glass and an advanced search under 'Actions' > 'Filter' are offered. Please choose the appropriate granularity for your use case. Each result gives the complete data citation recommendation. Results can be exported in csv, html and PDF formats. To reduce the downloaded information, it is possible to hide individual columns, e.g. via 'Actions' > 'Selected Columns'. Please read the 'Actions' > 'Help' for information on further functionalities.





In case a machine readable version of the complete metadata is required, it is possible to make a link to the json formatted metadata visible via: 'Actions' > 'Select Columns'.


3. DataCite Search interface

Another source with a complete record of all CMIP6 data citations is DataCite's search interface at https://search.datacite.org/repositories/dkrz.esgf.
The search syntax is not very intuitive. An example search for MPI-M's CMIP6 data is: https://search.datacite.org/repositories/dkrz.esgf?query=MPI-M.




A documentation of DataCite's Search is available at: https://support.datacite.org/docs/datacite-search-user-documentation.


4. Google Dataset Search

CMIP6 Data Citations appear in Google Dataset Search with an unknown delay. Auto-completion supports DRS_ids.





 

 

 

 


 



 


 

 

 

 

 

 

 

 

 

 

 

 

 

5. FurtherInfoUrl link

Based on the 'furtherInfoUrl' global attribute provided in each NetCDF file header, the CMIP6 Data Citation information can be accessed via a page hosted by ES-DOC, e.g. http://furtherinfo.es-doc.org/CMIP6.DKRZ.MPI-ESM1-2-HR.ssp126.none.r1i1p1f1.






















6. OpenAIRE's Explore portal

An alternative to DataCite's Search offers OpenAIRE's Explore portal https://explore.openaire.eu. The search functionality of the Explorer is similar to that of DataCite but easier to use. However, a temporal delay has to be taken into account when using this portal, as the CMIP6 Citation information is harvested by OpenAIRE from DKRZ's OAI server.

To search through CMIP6 data citation information, please use this link as entry point.































References and Links:
CMIP6 Citation Service: https://cmip6cite.wdc-climate.de
CMIP6:                           https://pcmdi.llnl.gov/CMIP6/
DataCite:                        https://datacite.org
ES-DOC:                        https://es-doc.org
Google Dataset Search: https://datasetsearch.research.google.com/
OpenAIRE Explore:        https://explore.openaire.eu