Quick guide to publishing data through GBIF.org

Learn about tools, processes and best practices for publishing datasets through the GBIF network

Sunflower by Stefan Gara licensed under CC BY-NC-ND 2.0

GBIF.org supports the publication of four classes of datasets using widely accepted biodiversity data standards.

At present, the GBIF network only publishes datasets directly from organizations. Individuals who wish to publish relevant datasets should work through their affiliated organizations (see ‘Request endorsement’ below) or consider submitting a data paper to one of a growing number of journals.

Citizen scientists can contribute occurrence records indirectly by participating in the growing number of projects worldwide that publish their datasets through the GBIF network.

Secure institutional agreements

Once you decide to share data through the GBIF network, you should alert administrators of your plans to publish on behalf of your institution. Sharing open data can increase the visibility and impact of institutions, building on traditional methods like academic publications and specimen loans to reveal new opportunities for collaboration and, through the use of DOI-based citations, link directly to research uses (example).

Request endorsement

To become a data publisher, your organization must request endorsement from the GBIF community. Once you have reviewed the data publisher agreement and agree in principle to share data, we encourage you to request endorsement for your organization as soon as possible to avoid delays in publishing data.

Select publishing tools and partners

The majority of the data now shared with GBIF resides on one of the dozens of installations of the GBIF IPT: Integrated Publishing Toolkit. Other alternatives exist, including seeking in-country hosting support from GBIF participants. Highly skilled publishers can also use an API to register datasets programmatically (contact the GBIF Helpdesk for more details).

We also maintain a knowledgebase of tools and other documentation.

Prepare data for publication

Publishers who choose to share their data using Darwin Core Archives (see data standards) can familiarize themselves with the format using spreadsheet templates created for occurrence datasets, checklists and sampling-event datasets.

Using the updated GBIF Data Validator, you can check datasets prior to publication and receive specific recommendations on improving and cleaning them. The report will help, for instance, by flagging duplicate records, incomplete fields and recognized inconsistencies in formatting.

You can also prepare datasets to comply with GBIF’s data quality requirements.

Choose a Creative Commons license

In keeping with a 2014 decision by the GBIF governing board, data publishers must assign one of the three Creative Commons licences to any occurrence dataset:

  • CC0, for data made available for any use without any restrictions
  • CC BY, for data made available for any use with appropriate attribution
  • CC BY-NC, for data made available for any non-commercial use with appropriate attribution

Note that CC-BY-NC licences have a significant effect on the reusability of data. GBIF encourages data publishers to choose the most open option they can wherever possible.

Publish datasets

If you’re using an IPT, simply click the button to ‘register’ your dataset with GBIF. Your dataset and data publisher pages will appear on GBIF.org, and our real-time infrastructure will quickly begin crawling the individual occurrences. Soon, the indexed summary of the published data will start to display users’ activity and download statistics (example).