Sunday, December 4, 2022

DOI Registration Statistics and Data Usage Metrics

This post describes where to find and access information on the evolution of DOI registrations for CMIP6 and input4MIPs and the papers referencing this data.

1. DOI Registration Statistics:

This information contains the evolution of the number of registered DOIs per project (CMIP6, input4MIPs) and the number of metadata updates sent to DataCite. DataCite Commons provides additional statistical information for the ESGF repository.

a. Statistics provided by the Citation Service: http://bit.ly/CMIP6_DOI_Statistic

 


b. API provided by the Citation Service:  http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/statistics/cmip6_doi_registration 

c. Statistics provided by DataCite Commons: https://commons.datacite.org/repositories/8orcv25

2. Data Usage Metrics:

This information contains the evolution of the number of papers and data products citing CMIP6 and input4MIPs data. It is based on the Scholix interface of OpenAire (http://www.scholix.orghttp://scholexplorer.openaire.eu). Please note, that the provided information is incomplete because of the ongoing culture change in data citation: Not every author formally cites the data and includes the data DOI references in the reference list though many publishers have added this to their Author Guidelines. Secondly, not all publishers publish these data references as part of the metadata to crossref.

a. Data Usage Metrics provided by the Citation Service: http://cera-www.dkrz.de/WDCC/ui/cerasearch/statistics?type=cmip6_data_usage


b. API for Data Usage Metrics provided by the Citation Service:  http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/statistics/cmip6_data_usage


Tuesday, March 22, 2022

Data citation is more than credit and more than DOIs

 Different stakeholders tend to discuss their specific interests in their separate communities:

  • Scientific publishers focus on data references as part of the provenance information for a paper and basic reproducibility of its results.
  • Researchers or data/paper authors are interested to get credit for their scientific results and the integration of data into common research impact metrics.
  • Infrastructure providers want to connect scholarly information via PIDs such as DataCite and crossref DOIs or ORCIDs and ROR IDs.
  • Long-term archives and data publishers contribute the data long-term preservation and the underpinning data services supporting the interests of the above stakeholders. They are essential to turn FAIRenabling into FAIRpreserving activities or in other words they are essential for sustainable data services.

The FAIR Guidelines (Pirani et al., 2022) introduced in the preparation of the Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) were jointly developed and implemented by all stakeholders: the researchers and authors of the AR6, the scientific publisher IPCC, and the IPCC Data Distribution Centre (DDC) as infrastructure provider and long-term archive facility. 

The aim of the FAIR Guidelines, the enhancing the transparency of IPCC's output was approached in three aspects:

1. Traceability of key statements of the reports centered around the figure creation process and relating report to data;

2. Providing credit for input data and receiving credit for created final data underpinning figures; and

3. Long-term preservation of scripts as well as input, intermediate, and final data.

The IPCC FAIR Guidelines approach can serve as an example for a joint implementation of FAIR and TRUST principles including the interests and expertise of the different stakeholders.

Reference:
Anna Pirani, Andrés Alegria, Alaa Al Khourdajie, Wawan Gunawan, José Manuel Gutiérrez, Kirstin Holsman, David Huard, Martin Juckes, Michio Kawamiya, Nana Klutse, Volker Krey, Robin Matthews, Adam Milward, Charlotte Pascoe, Gerard van der Shrier, Alessandro Spinuso, Martina Stockhause, & Xiaoshi Xing. (2022). The implementation of FAIR data principles in the IPCC AR6 assessment process. Zenodo. https://doi.org/10.5281/zenodo.6504468.

Monday, April 20, 2020

How to find CMIP6 Data Citations (machine-access)?

The last post on April 2, 2020, explained how a human user can find CMIP6 data citations. For use cases where the data citation information was not stored during ESGF data download and many datasets have been analyzed, a script-based data citation access is required.

There are several APIs available at DKRZ, which are documented at https://www.wdc-climate.de/ui/cmip-api-docs/, and one provided by DataCite:

1. Citation Search API

In addition to the Citation Search GUI for human users, the Citation Search API provides flexible machine-access to selected CMIP6 data citations in JSON format. The response contains all components of a data reference and data use information:

http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search

Available filter options:

  • filter by DRS:
    •     mipEra
    •     activityId
    •     institutionId
    •     sourceId
    •     experimentId
  • filter by granularity:
    •     granularity=[exp|model]
  • filter by date (in ISO 8601 format):
    •     gePublicationDate=YYYY-MM-DD: DOI published at or after a given date
    •     lePublicationDate=YYYY-MM-DD: DOI published before or at a given date


Sample Calls:

  1. Data references on experiment (fine) granularity for a given source_id and activity_id:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=CMIP&sourceId=HadGEM3-GC31-MM&granularity=exp
  2. Update data references of request 1. with data references published at or after a given date:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=CMIP&sourceId=HadGEM3-GC31-MM&granularity=exp&gePublicationDate=2020-01-01
  3. Data references on model/MIP (coarse) granularity contributing to an activity_id available at a given snap-shot date:
    http://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6search?mipEra=CMIP6&activityId=ScenarioMIP&granularity=model&lePublicationDate=2020-03-31

 

2. Direct access using DRS_id

The content of the CMIP6 DOI landing pages is provided in two additional machine-readable formats: JSON and XML. The underlying metadata standard is that of DataCite 4 (see documentation: https://doi.org/10.14454/7xq3-zf69; schema definition: http://schema.datacite.org/meta/kernel-4/metadata.xsd):

http://cera-www.dkrz.de/WDCC/meta/CMIP6/
<mip_era>.<activity_drs>.<institution_id>.<source_id>[.<experiment_id>].[json|xml]


For possible values of the DRS (Data Reference Syntax) components, please check the CMIP6 Controlled Vocabulary at:
https://github.com/WCRP-CMIP/CMIP6_CVs


Example calls for json format:

a. Model/MIP granularity: 

http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.json

b. Experiment granularity:  

http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.json


It is possible to use the ESGF Search API to collect these JSON urls for the 'experiment granularity' from the ESGF index. These 'citation_url's are part of every dataset information. More information on the ESGF Search API is available at: https://esgf.github.io/esg-search/ESGF_Search_RESTful_API.html

3. API to list data citations based on DRS components

A list of available CMIP6 data citations in a simple JSON response can be requested via an API:
https://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/cmip6Citations

Available attributes are combined as logical AND: institutionId, sourceId, complete (true|false), drsId.

4. DataCite RestAPI to list data citations based on DRS components

DataCite also provides a RestAPI for the access of CMIP6 citation information with a registered DOI. It is documented at: https://support.datacite.org/docs/api . Examples for DataCite RestAPI requests are:
    1. Access of all CMIP6 DOIs:
      https://api.datacite.org/dois?query=publisher:Earth%20System%20Grid%20Federation
    2. Search through the entries in the JSON response to identify them by their DRS under 'attributes/subjects/subject' with subjectScheme='DRS', e.g.

                    

References and Links:
CMIP6 Citation Service: https://cmip6cite.wdc-climate.de
CMIP6: https://pcmdi.llnl.gov/CMIP6/
CMIP6 Registration/CV: https://github.com/WCRP-CMIP/CMIP6_CVs
DKRZ API documentation: https://www.wdc-climate.de/ui/cmip-api-docs/
DataCite: https://datacite.org
DataCite API documentation: https://support.datacite.org/docs/api