Data SPHINX (DATA Storage and Preservation of High resolution climate experiments)
is an EUDAT Data Pilot project which will allow long-term storage and sharing among a wide scientific user community of high-resolution climate model output data. It aims at building a repository serving the climate change impact modelling community, providing selected variables at high temporal and spatial resolution, with a focus on climate extremes and the hydrological cycle in areas with complex orography. Potential users include researches studying the impacts of climate on ecosystems, floods, landslides, fires. The archive will contain high-resolution data from the PRACE project Climate SPHINX and will later be extended with simulations from the projects PRIMAVERA, CRESCENDO and HighResMIP.
The Scientific and Technical Challenge
An open issue which is currently being actively investigated is the sensitivity of climate simulations to model resolution and determining if very high resolution is useful for a realistic representation of the main features of climate variability. Also the advantage of sub-grid parameterizations capable of capturing small-scale variability, such as stochastic parameterizations, has to be determined. To this end extremely high resolution climate integrations are necessary and they are being performed or planned in the framework of several initiatives (Climate SPHINX, HighResMIP, CRESCENDO, PRIMAVERA).
In a first stage the EC-Earth Earth-System model is being used to explore the impact of Stochastic Physics in long climate integrations as a function both of model resolution (from 80km to 16km for the atmosphere). This research will for the first time investigate extensively and systematically the impact of resolution and stochastic parameterisations for climate simulations. As a result, we estimate data storage needs around 50-300 TB in this first stage.
In a second stage, the archive will be further expanded with high-resolution coupled simulations performed mainly with the EC-Earth model in the framework of the CMIP6 HighResMIP initiative and of the PRIMAVERA and CRESCENDO H2020 projects. For this second phase we estimate storage needs around 300-700 TB.
Technical issues to be solved include the implementation appropriate tools for the distributing and searching the data, for postprocessing and data extraction and for comparing them with available observations from other archives. To this end the integration of standard tools from the climate research community (such as ESGF nodes) will be explored.
This pilot will be used to demonstrate the integration of existing solutions, still under development, with relevant EUDAT services. The size of the potential user base can be estimated as hundreds of scientists in the climate change and climate impact fields.
EUDAT and expected outcomes
The pilot will expose stored data using an ESGF (Earth Science Grid Federation) node and a Thredds Data Server, deployed using the EUDAT “Service Hosting Framework”. It will explore how to expose the ESGF instance through B2FIND for improving data discoverability. We will evaluate the possibility to register the data sets either through the DOI or the PID. The use of B2SHARE as catalogue where to store meta-data records only will be evaluated. Specific EUDAT services involved include data repository, (long tail) data sharing and data staging for analysis and processing.
The data repository, data sharing and staging services offered by the pilot will be crucial to allow a wide user base to have access to a set of climate variables at high temporal resolution and at extremely high spatial resolutions, not commonly available at this time. These services will represent one important source of very high resolution simulation data in preparation for following international efforts (such as HighResMIP and current and future H2020 projects), to perform preliminary studies following the work programme of these projects and to develop further data analysis, diagnostic and visualization tools.
The pilot will provide a platform for medium term storage and to facilitate the access and discovery of state-of-the-art high-resolution climate simulations. The EUDAT services will be used to allow easy and fast access, sharing and analysing efficiently selected variables from extremely high resolution datasets (particularly storage intensive), with a particular focus on climate extremes and the hydrological cycle. This will facilitate scientific collaboration and will foster research facilitating data analysis and postprocessing. The services offered by this pilot will be made available to participants of different climate research communities or participants in national and international research projects.