Tuesday, March 22, 2022

Data citation is more than credit and more than DOIs

 Different stakeholders tend to discuss their specific interests in their separate communities:

* Scientific publishers focus on data references as part of the provenance information for a paper and basic reproducibility of its results.

* Researchers or data/paper authors are interested to get credit for their scientific results and the integration of data into common research impact metrics.

* Infrastructure providers want to connect scholarly information via PIDs such as DataCite and crossref DOIs or ORCIDs and ROR IDs.

* Long-term archives and data publishers contribute the data long-term preservation and the underpinning data services supporting the interests of the above stakeholders. They are essential to turn FAIRenabling into FAIRpreserving activities or in other words they are essential for sustainable data services.

The FAIR Guidelines introduced in the preparation of the Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) were jointly developed and implemented by all stakeholders: the researchers and authors of the AR6, the scientific publisher IPCC, and the IPCC Data Distribution Centre (DDC) as infrastructure provider and long-term archive facility. 

The aim of the FAIR Guidelines, the enhancing the transparency of IPCC's output was approached in three aspects:

1. Traceability of key statements of the reports centered around the figure creation process and relating report to data;

2. Providing credit for input data and receiving credit for created final data underpinning figures; and

3. Long-term preservation of scripts as well as input, intermediate, and final data.

The IPCC FAIR Guidelines approach can serve as an example for a joint implementation of FAIR and TRUST principles including the interests and expertise of the different stakeholders.

Monday, April 20, 2020

How to find CMIP6 Data Citations (machine-access)?

The last post on April 2, 2020, explained how a human user can find CMIP6 data citations. For use cases where the data citation information was not stored during ESGF data download and many datasets have been analyzed, a script-based data citation access is required.

There are different options available:

1. Direct access using DRS_id

The content of the CMIP6 DOI landing pages is provided in two additional machine-readable formats: JSON and XML. The underlying metadata standard is that of DataCite 4 (see documentation: https://doi.org/10.14454/7xq3-zf69; schema definition: http://schema.datacite.org/meta/kernel-4/metadata.xsd):


For possible values of the DRS (Data Reference Syntax) components, please check the CMIP6 Controlled Vocabulary at:

Example calls for json format:

a. Model/MIP granularity: 


b. Experiment granularity:  


It is possible to use the ESGF Search API to collect these JSON urls for the 'experiment granularity' from the ESGF index. These 'citation_url's are part of every dataset information. More information on the ESGF Search API is available at: https://earthsystemcog.org/projects/cog/esgf_search_restful_api

2. API to list data citations based on DRS components

A list of available CMIP6 data citations in a simple JSON response can be requested via an API:

Available attributes are combined as logical AND: institutionId, sourceId, complete (true|false), drsId.

3. DataCite RestAPI to list data citations based on DRS components

DataCite also provides a RestAPI for the access of CMIP6 citation information with a registered DOI. It is documented at: https://support.datacite.org/docs/api . Examples for DataCite RestAPI requests are:
    1. Access of all CMIP6 DOIs:
    2. Search through the entries in the JSON response to identify them by their DRS under 'attributes/subjects/subject' with subjectScheme='DRS', e.g.


References and Links:
CMIP6 Citation Service: https://cmip6cite.wdc-climate.de
CMIP6: https://pcmdi.llnl.gov/CMIP6/
CMIP6 Registration/CV: https://github.com/WCRP-CMIP/CMIP6_CVs
DataCite: https://datacite.org

Thursday, April 2, 2020

How to find CMIP6 Data Citations?

The IPCC AR6 Part 1 is in its Second Order Draft review. Publications to be included in the AR6 are published. The question about how to find a CMIP6 data reference comes up more frequently. The data citation recommendation is part of the DOI landing page for each data collection. But how to find that?

The different options are discussed in this post. A second blog post discusses machine-accessible options.


The best and least time-consuming option is to check the CMIP6 data citation at the time of data download in the ESGF CoG portal, e.g. https://esgf-data.dkrz.de/search/cmip6-dkrz/.
Use 'Show Citation' and follow the provided link to the landing page.

2. CMIP6 Citation Service search interface

In case the CMIP6 data citations were not stored at the time of data download, the CMIP6 Citation Service offers a dedicated search interface at: http://bit.ly/CMIP6_Citation_Search. A detailed documentation with use cases is available here.

The result list includes data references for both granularities offered. For filtering a simple search using the magnifying glass and an advanced search under 'Actions' > 'Filter' are offered. Please choose the appropriate granularity for your use case. Each result gives the complete data citation recommendation. Results can be exported in csv, html and PDF formats. To reduce the downloaded information, it is possible to hide individual columns, e.g. via 'Actions' > 'Selected Columns'. Please read the 'Actions' > 'Help' for information on further functionalities.

In case a machine readable version of the complete metadata is required, it is possible to make a link to the json formatted metadata visible via: 'Actions' > 'Select Columns'.

3. DataCite Search interface

Another source with a complete record of all CMIP6 data citations is DataCite's search interface at https://search.datacite.org/repositories/dkrz.esgf.
The search syntax is not very intuitive. An example search for MPI-M's CMIP6 data is: https://search.datacite.org/repositories/dkrz.esgf?query=MPI-M.

A documentation of DataCite's Search is available at: https://support.datacite.org/docs/datacite-search-user-documentation.

4. Google Dataset Search

CMIP6 Data Citations appear in Google Dataset Search with an unknown delay. Auto-completion supports DRS_ids.




















5. FurtherInfoUrl link

Based on the 'furtherInfoUrl' global attribute provided in each NetCDF file header, the CMIP6 Data Citation information can be accessed via a page hosted by ES-DOC, e.g. http://furtherinfo.es-doc.org/CMIP6.DKRZ.MPI-ESM1-2-HR.ssp126.none.r1i1p1f1.

6. OpenAIRE's Explore portal

An alternative to DataCite's Search offers OpenAIRE's Explore portal https://explore.openaire.eu. The search functionality of the Explorer is similar to that of DataCite but easier to use. However, a temporal delay has to be taken into account when using this portal, as the CMIP6 Citation information is harvested by OpenAIRE from DKRZ's OAI server.

To search through CMIP6 data citation information, please use this link as entry point.

References and Links:
CMIP6 Citation Service: https://cmip6cite.wdc-climate.de
CMIP6:                           https://pcmdi.llnl.gov/CMIP6/
DataCite:                        https://datacite.org
ES-DOC:                        https://es-doc.org
Google Dataset Search: https://datasetsearch.research.google.com/
OpenAIRE Explore:        https://explore.openaire.eu