Hannes Thiemann - Authorea

EASYDAB (Earth System Data Branding): Enhancing the Findability and the Reuse of FAIR...

Anette Ganske

and 3 more

January 04, 2022

Even though in Earth System Sciences (ESS) the importance of good research data management has been widely discussed, the easy discoverability of quality-checked data has not yet been addressed in detail. This is the aim of the Earth System Data Branding (EASYDAB). EASYDAB is a branding to highlight FAIR and open data from Earth System Sciences that are published with DataCite DOIs. The EASYDAB guideline defines principles on how to achieve high metadata quality of ESS datasets by demanding specific metadata information. The EASYDAB logo is protected and may only be used by repositories that agree to follow the EASYDAB terms. The logo indicates that published data have an open licence, open file formats and rich metadata information. Quality controls by the responsible repository ensure that these conditions are met. For the control, the repository can choose between different approved quality guidelines such as e.g. the ATMODAT Standard, ISO 19115 or the OGC Geopackage Encoding Standard. Ideally, a quality guideline provides detailed mandatory and recommended specifications for rich metadata in the data files, the DataCite DOI and the landing page. One example of such a quality guideline is the ATMODAT standard, which has been developed specifically for atmospheric model data (AtMoDat project). In addition to the metadata specifications, it also demands controlled vocabularies, structured landing pages and specific file formats (netCDF). The ATMODAT standard includes checklists for data producers and data curators so that compliance with the requirements can easily be obtained by both sides. To facilitate an automated compliance check of the netCDF files metadata, a Python tool has also been developed and published. The automated checking of the quality principles enables a simplified control of the data by the repository. Nevertheless, repositories can also use checklists for the curation of the data. The overall aim of the curation of EASYDAB datasets shall always be the enhancement of the reuse of reviewed, high-quality data. Therefore, EASYDAB shows scientists the way to open and FAIR data while it enables repositories to indicate their efforts in publishing data with high maturity.

FAIR long term preservation of climate and Earth System Science data with a focus on...

Karsten Peters

and 2 more

January 16, 2020

The full-featured and CoreTrustSeal certified long term archiving service LTA WDCC (World Data Centre for Climate) at DKRZ (German Climate Computing Center, Hamburg) offers long term preservation for datasets relevant for climate and Earth System research. The WDCC collects, stores, and disseminates Earth System data with a focus on climate simulation data and climate related data products. It has established itself as a staple infrastructure for the global climate modelling research community. Data preservation in LTA WDCC is preceded by a thorough technical quality control and provides intense data curation for storage periods longer than 10 years. During the preservation period, long term findability, searchability and reusability of the data are ensured. Long term findability of the curated data is enabled through the possibility of assigning DataCite DOI’s to archived datasets. The data undergo additional quality checks before being eligible for DOI assignment. This process is performed in close collaboration with the data providers. The focus of these quality checks is to ensure the unambigous (inter-)disciplinary reusability of the preserved datasets and includes checking for proper documentation, adherence to domain specific (meta)data standards, uncertainty analysis and cross-referencing. Only then can a high level of reusability of the data be achieved, justifying the involved effort. The perceived need for research data repositories to comply with the 2016-published FAIR Guiding Principles has motivated us to perform an even-handed and systematic self assessment of LTA WDCC FAIRness. Due to lack of a standardised evaluation framework, this assessment reflects our specific, albeit objective, interpretation of the principles. Our assessment, published on the DKRZ webpages, shows that the native philosophy behind DKRZ’s LTA WDCC service – especially the focus on reusability – reflects the FAIR Guiding Principles by design and even goes beyond them by ensuring very long-term (>10 years) preservation and therefore reusability of archived data.

Using an ensemble of FAIR assessment approaches to inform the design of future FAIRne...

Karsten Peters von Gehlen

and 4 more

January 07, 2022

From a research data repositories’ perspective, offering data management services in-line with the FAIR principles is becoming more and more of a selling point to compete on the market. In order to do so, the services offered must be evaluated and credited following transparent and credible procedures. Several FAIRness evaluation methods are openly available for being applied to archived (meta)data. However, there exists no standardized and globally accepted FAIRness testing procedure to date. Here, we apply an ensemble of 5 FAIRness evaluation approaches to selected datasets archived in the WDCC. The selection represents the majority of WDCC-archived datasets (by volume) and reflects the entire spectrum of data curation levels. Two tests are purely automatic, two are purely manual and one test applies a hybrid method (manual and automatic combined) for evaluation. The results of our evaluation show a mean FAIR score of 0.67 of 1. Manual approaches show higher scores than automated ones. The hybrid approach shows the highest score. Computed statistics show agreement between the tests at the data collection level. None of the five evaluation approaches is fully fit-for-purpose to evaluate (discipline-specific) FAIRness, but all have their merit. Manual testing captures domain- and repository-specific aspects of FAIR. Machine-actionability of archived (meta)data is judged by the evaluator. Automatic approaches evaluate the machine-actionable features of archived (meta)data. These have to be accessible by an automated agent and comply with globally established standards. An evaluation of contextual metadata (essential for reusability) is not possible. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation. We recommend that future operational FAIRness evaluation be based on a mature hybrid approach. The automatic part of the evaluation would retrieve and evaluate as much machine-actionable discipline specific (meta)data content as possible and be then complemented by a manual evaluation focusing on the contextual aspects of FAIR. Design and adoption of the discipline-specific aspects will have to be conducted in concerted community efforts. We illustrate a possible structure of this process with an example from climate research.