What is ‘dynamic’ data?
Put simply, dynamic data are data that are subject to change. There are many forms of dynamic data including where:
- new data are regularly and systematically appended to an existing data set over time. For example, outputs from a satellite sensors such as Landsat or MODIS.
- pre-existing data in a large data set is modified or updated. This occurs where errors are found in pre-existing data, or analytical and or processing techniques affect some attributes of the existing data set.
Citing dynamic data
The ability to precisely reference datasets is a key goal of data citation. This is relatively straightforward where data are static or fixed. Challenges arise however, where the data to be referenced are ‘dynamic’ and more so, if they are also large scale.
Identifying and providing persistent access to dynamic data can be challenging. Issues may arise where a researcher wishes to identify and reference precisely:
- the subset of dynamic data used in a study
- the data as it existed at any point in time.
A number of international organisations, including the Research Data Alliance Data Citation Working Group, are actively developing approaches to the citation of large scale dynamic data. Approaches include:
- storing a subset of the data as it was retrieved at a given point in time;
- capturing provenance information about a referenced subset;
- creating snapshots and versions of the data at specified times or trigger events.
Example 1: Citing a version of an evolving dataset
- Doe, J. (2009-2011): Dynamic Data Set Title. Version: 1.2
Responsible Data Archive [evolving dataset] doi.10.1001/1234@version=1.2
Example 2: Citing a subset of a growing dataset
- Doe, J. (2009-2011): Dynamic Data Set Title. Subset: 2010-01-01 -2010-12-13
Responsible Data Archive [growing dataset] doi.10.1001/1234@range=2010-01-01-2010-12-13
It is widely agreed that no one size fits all approach currently exists to support all dynamic data citation use cases. There may never be a single solution.
Dynamic Data Citation Interest Group (DDCIG)
ANDS is working with a small number of organisations in Australia to capture and share information about current practice, and proposed approaches to, dynamic data citation. The Australian based DDCIG has been formed to:
- contribute an Australian perspective to international initiatives around dynamic data citation;
- help connect Australian and international organisations that share similar use cases and approaches to dynamic data citation;
- make broadly accessible, resources that describe the current and emerging status of dynamic data citation to enable Australian research organisations to adopt, adapt or contribute to the approaches shared.
Membership of the DDCIG currently includes representatives from:
- Integrated Marine Observing System (IMOS)
- Geoscience Australia (GA)
- Terrestrial Ecosystem Research Network (TERN)
- National Computational Infrastructure (NCI)
- CSIRO.
To date the group has:
- Documented their use cases and approaches to dynamic data citation. Included is information about the dynamic nature of the data managed by the organisation as well as their approach to assignment of DOIs or other identifiers, version management and citation statement format. The information is a snapshot of ‘current status’ as of May 2016.
- Described a series of models that visually represent a spectrum of dynamic data and associated approaches to citation that incorporate variables such as data size, complexity and frequency of change.
- Contributed to a web page that describes different approaches to data versioning, including citation of versioned data.
- Presented at the RD-A Data Citation Working Group break out session at RD-A Plenary 8, September 2016
- Presented at the SciDataCon conference held in Denver, September 2016
- Created a list of resources including papers, presentations and reports that have contributed to the Group’s thinking around dynamic data citation. We welcome suggestions for additional resources to be added to the list below noting the focus is on practical approaches to the citation of large scale dynamic data.
The Group welcomes new members. Email us at contact@ands.org.au for more information.
Resources
Papers and guidelines
- American Meteorological Society Guidelines for Data Archiving and Citation
- Buneman P., Davidson S., Frew J. (2016) Why Data Citation Is a Computational Problem. Communications of the ACM. 59(9), 50-57, http://doi.org/10.1145/2893181
- Cook, R. Vannan, S., McMurry et al (2016) Implementation of persistent identifiers at the ORNL DAAC. Ecological Informatics, vol. May 2016, Pages 10–16. doi:10.1016/j.ecoinf.2016.03.003
- Herterich P, Dallmeier-Tiessen S. Data Citation Services in the High-Energy Physics Community. D-Lib Magazine. 2016; 22(1/2) doi:10.1045/january2016-herterich
- Huber, R., Asmi, A., Buck, J., de Luca, J.M., Diepenbroek, D., Michelini, A. and participants of the joing COOPEUS/ENVIRI/EUDAT PID workshop.
- Data citation and digital identifiers for time series data/environmental research infrastructures. Part 1: Use cases and scenarios for data citation and digital identifiers for time series data. Technical Paper.
- Klump, J., R. Huber, and M. Diepenbroek (2016), DOI for geoscience data - how early practices shape present perceptions, Earth Sci. Inform., 9(1), 123–136, http://doi.org/10.1007/s12145-015-0231-5
- Bulletin of IEEE Technical Committee on Digital Libraries, 12(1), May 2016 (special issue on data citation)
- Stockhause, M. & Lautenschlager, M., (2017). CMIP6 Data Citation of Evolving Data. Data Science Journal. 16, p.30. DOI: http://doi.org/10.5334/dsj-2017-030
Articles
- Andreas Rauber, Ari Asmi, Dieter van Uytvanck and Stefan Pröll, Identification of Reproductible Subsets for Data Citation, Sharing and Re-Use
- Dario De Nart, Dante Degl'Innocenti and Marco Peressotti, Well-Stratafield Linked Data for Well-Behaved Data Citation
- Megan Force, Nigel Robinson, Mark Matthews, Daniel Auld and Mariana Boletta, Research Data in Journals and Repositories in the Web of Science: Developments and Recommendations
Research Data Alliance
- Recommendations and guidelines from the Data Citation Working Group
- Submission by Lesley Wyborn to RD-A Data Citation WG recommendations
- Slides from the RD-A Data Citation WG report at RD-A Plenary 7, March 2016
- Slides from the RD-A Data Citation WG break out session at RD-A Plenary 8, September 2016
Presentations
- Data Citation Guidelines presented by Ruth Duerr from the Federation of Earth Science Information Providers (ESIP)
- Scalable approaches for identifiers of dynamic data and linked data in an evolving world presented by Jens Klump, Robert Huber and Lesley Wyborn, eResearch Australasia 2015
- How do you assign persistent identifiers to extracts from large, complex, dynamic data sets that underpin scholarly publications? Presented by Lesley Wyborn, Nick Car, Ben Evans and Jens Klump, EGU General Assembly, 2016