Abstract

\label{abstract}
In an era of global change, we use paleogeoscientific data to study how the Earth-Life system responds to and recovers from large perturbations to the global carbon cycle, biodiversity, climates, cryosphere, and hydrosphere. The grand informatics challenge is to organize and mobilize billions of observations distributed across space, time, disciplines, and institutions, so that we can bring all relevant data to bear on any time, place, or process. The emerging cyberinfrastructure model consists of a distributed, federated network of resources, with community curated data repositories (CCDRs), physical sample repositories, individual geoscientists, the scientific literature, and networking/coordination efforts. In our field, the most productive scientific return from NSF cyberinfrastructure investments will come from distributed, meso-scale investments: 1) Long-term investments in the human capital necessary to develop and sustain community-curated data resources (CCDRs), 2) Data mobilization campaigns targeted to high-priority research questions, 3) Scientific workforce training at all career stages, 4) Reduced data friction via integrated data handling systems from field collection to measurement, paper publication, and data publication, 5) Automated data-mining systems for extracting information from unstructured sources, 6) A National Center for Paleodata Synthesis to accelerate and coordinate the above global-scale science and informatic activities.