Data hosting

A quick guide for making good decisions about how to host data shared with GBIF

wikimedia-servers
Wikimedia Foundation servers. Photo 2012 Victor Grigas, Wikimedia Foundation, licensed under CC BY-SA 3.0.

GBIF.org is an index of biodiversity data published through a globally distributed network of national, thematic and project infrastructures. Within this interconnected system, it is essential for data publishers to ensure that the data they share has a persistent, stable point of access. This requirement is challenging for many institutions, especially those that are new to GBIF and may not have the facilities to host and maintain data on servers that always remain online.

One way to start addressing the challenge is to distinguish between data publishing and data hosting. While these activities are connected, there is no formal or technical requirement that the same institution must perform both tasks (even if that is generally the case).

Data publishing is the act of organizing and sharing data standardized for use through the GBIF network. An institution becomes a GBIF data publisher by completing an online registration form and receiving endorsement, either through one of the national and organizational Participants in the GBIF network or the Nodes Steering Group.

Data hosting is the act of storing the data on a stable and accessible web platform. While there is no standard arrangement for providing this service, data hosting does represent a significant commitment that requires dedicated, long-term capacity that maintains a persistent and highly reliable web-connected platform.

Regardless who hosts datasets, GBIF works to credit attribute both the data publishing institution and its country of registration. What follows is a quick guide to making informed decisions about how to host data shared with GBIF.


Hosting steps

Once your data have been organized into a supported data formats, proceed as follows:

  1. Become a GBIF data publisher by completing the publisher registration form
  2. Find a publishing platform, IPT or other, in order of preference:
    a. Hosted in your institution
    b. Hosted at a national node (if your country is a GBIF participant)
    c. Hosted by another GBIF participant or data publisher (e.g. data hosting centre)
    d. Self-hosting: set up your own publishing platform with an IPT or other installation at your institution (this requires stable, continuous Internet access)
    e. If none of the above applies, send a message to the GBIF help desk, explaining your requirements. We will find an IPT for you!
  3. Get access to the IPT and basic training
  4. Start publishing your datasets

Data hosting options

Data hosting by the data publisher institution

Data publishers with the capacity to host their own data can install their own Integrated Publishing Toolkit (IPT) or other data publishing platform (see more below)

Data hosting outside the data publisher institution

Data publishers that have limited technical capacity, or that do not wish to run their own data publishing platforms, can opt to have their data hosted externally. This will save you time and money setting up and maintaining your own IPT instance, and you should be able to receive help desk support in your own language, provided that the data host provides it. Although many possible hosting arrangements exist, organizations normally choose to work with a data host that shares an institutional, national, regional or thematic focus.

If your country is already a GBIF Participant, the first option to consider is whether the national GBIF node offers a data hosting solution. Having your data hosted by your national node makes it easier to collect data of national interest, enables you to connect with a local publisher network and provides access to help desk support from your node.

For cases where data hosting is not available from a national node, GBIF has a list of trusted data hosting centres. These data hosts meet a set of strict criteria that includes

  • Consistently maintaining and administering an online IPT
  • Demonstrating a successful track record of hosting data
  • Responding with prompt and knowledgeable help desk support.

GBIF strongly recommends using a trusted hosting centre that can establish an account for you on their IPT, which allows you to manage and publish your own datasets through GBIF.org.

A final option is for GBIF itself to host data using a cloud-hosted publishing platform. The GBIF Secretariat maintains cloud-hosted IPT installations, e.g. for the BID programme, which provides publishers with data hosting using shared hardware, software and storage services. Users of the service receive a robust, no-cost data hosting solution that is easy to migrate to a self-hosted installation in the future. However, national nodes and data hosting centres are likely to provide more hands-on service and assistance with data publishing and quality control aspects. As a result, data publishers should normally use the GBIF cloud-hosted IPT only if they are unable to find a satisfactory solution among the other hosting options.


Intro to the IPT: Integrated Publishing Toolkit

The IPT is free open-source software developed and supported by the GBIF Secretariat that organizations around the world use to publish and share biodiversity datasets through the GBIF network. The IPT can also function as a repository for data referenced in an article, as in this example of an IPT installation hosted by the Canadensys network.

Learn more about the technical requirements for hosting an IPT

Test Mode

The IPT can be installed in Test mode, which means that its hosted resources will not be indexed or publicly accessible by searching on GBIF.org. If you decide to install your own IPT, GBIF recommends that you try Test mode first in order to understand the registration process. Test mode is for running the IPT while evaluating it or conducting training; test-mode registrations will go into a test registry and resources will never be indexed.

Once you are sure that the IPT is working the way that you expect, you will have to reinstall the software in Production mode to make the data actually discoverable through GBIF. Production mode registers datasets and publishes them so they are indexed and publicly accessible through GBIF.org.

Both the IPT instance and its associated organization must be registered with GBIF. If your organization isn’t registered yet, you will be asked to complete this step and provide basic information through a short form in the IPT. Learn more about how this works in the IPT User Manual


Terms of Use

The use of an external data host by a data publisher should be negotiated between the respective parties, ideally with a service-level agreement that outlines the terms and obligations for both the data publisher and the data host. The use of GBIF’s cloud-hosted IPT will be governed by the GBIF Data Publisher Agreement.