Describe current or emerging science or engineering research challenge(s), providing context in terms of recent research activities and standing questions in the field.

1.1 Paleogeosciences: Major Scientific Questions and Research Challenges

\label{paleogeosciences-major-scientific-questions-and-research-challenges}
The grand challenge in the paleogeosciences is to enable a fully resolved understanding of the past dynamics of the Earth-Life System and its interacting subsystems, across the entire history of Earth, at temporal scales of \(10^9\) to \(10^1\) years, by organizing and mobilizing the many millions of individual geoscientific observations that make up the long tail of paleogeoscience data (Transitions Report, 2012, Earth Cube Paleogeosciences Domain Workshop 2012, NRC 2013, 2011a,b). The paleogeosciences branch of Earth System Science encompasses paleoclimatology, paleobiology, paleoecology, geochronology, sedimentary geology, geochemistry, glaciology, and other disciplines. In our era of global change, with projected rates of change and states of the climate system with no analog in recorded human history, the paleogeosciences are vital to studying how the Earth-Life system responds to and recovers from large perturbations to the global carbon cycle, global biodiversity, regional and global climates, cryosphere, and hydrosphere.
Four overarching scientific challenges in Earth System Science were identified in the National Research Council’s Transitions Report (2012):
See also National Research Council reports (2013, 2011a,b) and EarthCube Domain Working Group Reports (Noren et al. 2013, Aufdenkampe et al. 2013, Chan and Budd 2013, and Singer et al. 2013, all refs: http://bit.ly/2nUOUQc).
We can answer these questions through the study of Earth’s history and its rich record of past abrupt change, evolutionary innovations, and complex dynamics driven by interactions among multiple components of the earth system, across multiple temporal and spatial scales. Earth’s history provides multiple model systems for 21st-century changes (Williams et al. 2013). Areas of active research include:

1.2 Paleogeoscientific Data: Key Features and Challenges

\label{paleogeoscientific-data-key-features-and-challenges}
Here we summarize key features of paleogeoscientific data, practice, and practitioners. These characteristics have been the starting point for current cyberinfrastructure-building efforts (Sect.2.1) and inform our recommendations for the next generation of cyberinfrastructure advances (Sect.2.2-3.5).
1. Paleogeoscientific observations are long-tail data collected by scientists from many disciplines and institutions, with many data types and forms of measurement. Individual records are temporally long but spatially point-level data, collected at one or more outcrops, drill sites, or other discrete sites. Hence, site-level paleogeoscientific data must be assembled into global-scale data networks in order to understand the Earth System, its external forcings, and internal feedbacks (e.g. PAGES 2k, 2013). Assembling such data is labor-intensive. Few widely accepted data standards and identifiers exist (McKay & Emile-Geay, 2016), although several are emerging through EarthCube-supported Research Coordination Networks (Cyber4Paleo) and Integrative Activities (ePANNDA, Earth-Life Consortium, Open Core Data).
2. Paleogeoscientific data share common underlying structure . Despite the above heterogeneity, paleogeoscientific data share several underlying common features: They typically involve a measurement of a proxy in various geological archives, often structured by depth , from which we must estimate time. This structural homogeneity facilitates the development of common data models in the paleogeosciences.
3. Paleogeoscientific data has a long shelf life.
Paleogeoscientific data derive primarily from physical samples of geological materials collected in the field and the laboratory measurements of these samples. As new techniques are developed, we often seek to reanalyze previously collected samples, cf. the recent wave of ancient DNA analyses from museum fossils. We must curate physical samples and maintain an unbroken chain of provenance from sample to all data generated from the sample (Sect.2.3).
4. Time is an unknown variable that must be estimated in the paleogeosciences (Singer et al. 2013). We must infer age through discrete age estimates (called age controls) and age models that provide age estimates between dated samples. Age models must be regularly updated as more precise and accurate dates become available and as more sophisticated age-depth software modeling approaches are developed. Published geochronological frameworks become obsolete with every new date and refinement to dating methods, decay constants, and other parameters. Data repositories exist for some geochronological data (GeoChron/IEDA), but they are not systematically linked to one another or to other affiliated databases.
5. Dark Data . Data are often not fully published. For example, papers presenting microfossil data often show only summary diagrams for selected taxa and may fail to include supplementary data. Published metadata are incomplete, e.g. geochronological labs usually do not publish all instrumental parameter settings. Some disciplines have adopted minimal metadata standards and established a common data repository; others have not. A great deal of data is still digitally “dark”, even if publications themselves are available electronically. Data mobilization efforts are essential (Sect.3.3).
6. Paleodata are increasingly assimilated with Earth System Models. Our field uses Earth system models to simulate the processes governing the past and present evolution of the Earth-Life system. These same models are also the basis for climate scenarios over the coming decades, and paleodata offer an important constraint on modeled estimates (e.g. sensitivity of global temperatures to atmospheric CO2, Hargreaves et al. 2012). Increasingly, data assimilation methods are being employed to make joint inferences from paleodata and Earth system models (Crucifix, 2012). For example, atmospheric general circulation models now include stable isotopic tracers (e.g. d18O), enabling direct assimilation of earth system models with paleodata. Data assimilation creates new needs for well-structured datasets with rigorous estimates of temporal and proxy uncertainty and for high-capacity computing.
7. Paleogeoscientific Expertise is Widely Distributed , with individual paleogeoscientists specializing in particular proxy types, archives, time periods, regions, and questions. Dispersion of expertise places a premium on developing decentralized, but interlinked governance and data management systems for our data (Sect.2.1, 3.1-3.2)
8. Uneven Workforce Training and Interest in Informatics . The paleogeosciences emphasize high-quality field and laboratory measurements. Informatics has not traditionally been part of the core geoscientific curriculum, except for courses in statistics and calculus. Most geoscientists have not sought to keep pace with recent rapid advances in informatics. Disciplinary and cultural norms vary with respect to data sharing. Training programs at all levels are needed (Sect.3.4).