What is Open Data and how will the research community benefit from it on our open research platform?

Previous blogs in this series have examined the content and tools that the platform will feature. Here we will look at one of the platform’s key policies – Open Data – and how it can enhance the reproducibility, transparency, and rigour of publications.

Very simply, Open Data means that any supporting data associated with a study and its conclusions are made available for anyone to access, share freely and re-use. This can include images, sequence data, software, code, datasets, etc.

What are the benefits of Open Data?

Open Data is increasingly at the forefront of conversations about improving research practices and scientific communications. It comes with far-reaching benefits, ranging from those for individual authors to the wider society.

Authors can improve the discoverability, reuse and citability of their work by making the data open and easy to access by clearly referring to the data deposited in their article [1–4]. By being fully transparent and providing access to supporting data, authors can establish they were the first to undertake the study, as well as proving the robustness of their research.

For the research community, Open Data allows for the discovery and re-use of other researchers’ data to validate the outcomes and to build upon and advance the research. Open Data can facilitate future collaborations and can improve the speed at which important research can be conducted and disseminated. The increased transparency can promote public confidence in and support for scientific research.

The benefits of Open Data extend to wider society too. Real-world applications of research can be more effectively implemented by using studies that are easier to validate and advance. A good example of this can be seen in the environmental and healthcare fields where the more effective dissemination of and increased engagement with recent discoveries can inform efficient real-time responses and policy decisions [5, 6].

FAIR principles

Access Microbiology’s Open Data policy will support the FAIR principles [7]. The FAIR principles aim to enhance the Findability, Accessibility, Interoperability, and Reuse of scholarly data, particularly through the actions of machines, with little to no human guidance.

© Sangya Pundir 2016 (https://creativecommons.org/licenses/by-sa/4.0/deed.en)
FAIR guiding principles for data resources.

To ensure supporting data are easy to find, they should contain detailed metadata relating to the study and ideally be associated with a unique identifier (for example, a digital object identifier – a DOI, or persistent URLs that are unlikely to become invalid after a period of time). Data should also be deposited in a searchable repository, in a location where it is obvious for a user to know precisely what is needed for it to be accessed.

To ensure interoperability, human and machine users should be able to exchange and interpret the data they have found without the need for specialised programs or guidelines. If documents are needed for interpretation, they should be discoverable and accessible alongside the data.

The main goal of the FAIR principles is to improve the reusability of datasets. Data should be well described and labelled, and it should be easy for anyone to use it to replicate, validate and improve on the original study.

How can authors make their data open?

Preparation is key

The first step in ensuring adherence to an Open Data policy is by preparing data for deposition by labelling and describing the datasets according to the FAIR principles above. This is so that any users accessing the data know precisely what data they have accessed, which study it is associated with and subsequently how to use it. It is important to confirm with your institution that all the data can be made public, and that appropriate anonymization has been applied, where necessary. We do understand that in some cases, for example for legal or ethical reasons, it may not be possible to make data freely available. The platform will therefore adopt the principle that data should be ‘as open as possible, as closed as necessary’, and authors should always discuss possible limitations with the office before submitting.

Choosing the right repository

Next, authors should deposit their data in a publicly accessible, community-recognised repository, appropriate for the study discipline and the type of data. Authors who have new sequencing data associated with their study should deposit the sequences in an approved sequencing database, for example GenBank or EMBL-EBI for DNA and RNA sequencing and UniProt or PRIDE for protein sequence data.

Figshare and Zenodo can be used to deposit almost all types of research, including datasets, figures, code and software, and each deposit receives a citable DOI. Authors submitting manuscripts with associated code should consider Zenodo and Code Ocean, whilst GitHub is a suitable repository for deposits of both software and code.

Authors can choose to submit their data to the Microbiology Society’s Figshare portal. Uploading your data occurs during the submission process, is very quick and the data will be viewable on the Figshare website. For published Versions of Record, the data will be available directly on the website via a widget.

Data Summary and referencing your data

To ensure discoverability and ease of access for readers, authors should always include a Data Summary section in the article which describes and links directly to their data. This should be clear, concise and should direct readers to the descriptions, identifiers, and locations of any associated data. If the data has been deposited in a repository, the data is an independent, citable piece of work, so authors should also include this as a reference in the article, giving the work its appropriate recognition.

How will we support our authors in making their data Open?

During the submission process, we will be able to recommend appropriate repositories, including the use of our own Figshare portal if authors wish, and guide authors on any requirements to bring their submission in line with the FAIR principles. We understand that many authors will be new to this policy, so we are happy to support and offer advice throughout the process.

We believe that Open Data has wide-reaching benefits and comes with the potential to streamline and enhance the way the research community collaborates. Our gold open access journal, Microbial Genomics, has already had huge success in implementing this policy, and it makes sense that we continue our open science journey by extending Open Data to Access Microbiology when launching this innovative platform.

If you have comments on anything you have read here, we would love to hear from you. Please contact Alex Howat at [email protected].

References

Colavizza G, Hrynaszkiewicz I, Staden I, Whitaker K, McGillivray B. The citation advantage of linking publications to research data. PLOS ONE. 2020;15(4):e0230416. https://doi.org/10.1371/journal.pone.0230416
Zhang L, Ma L. Does open data boost journal impact: evidence from Chinese economics. Scientometrics. 2021;126:3393–3419. https://doi.org/10.1007/s11192-021-03897-z
Piwowar HA, Vision TJ. Data reuse and the open data citation advantage. PeerJ. 2013;1:e175 https://doi.org/10.7717/peerj.175
Sielemann K, Hafner A, Pucker B. The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ. 2020;8:e9954 https://doi.org/10.7717/peerj.9954
Huston P, Edge VL, Bernier, E. Reaping the benefits of Open Data in public health. Can Commun Dis Rep, 2019;45:252-256 https://doi.org/10.14745/ccdr.v45i10a01
Maeda EE, Torres JA. Open Environmental Data in Developing Countries: Who Benefits? Ambio. 2012;41:410-412 https://doi.org/10.1007/s13280-012-0283-4
Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. https://doi.org/10.1038/sdata.2016.18

Related categories

Publishing and Journals

Return to listing