Ingestion and Curation

fF curators are responsible for metadata mapping; translating the information provided by data contributors to the controlled vocabulary and metadata schema designed by the Repository's administrative

Ingestion

Once a dataset is accepted for deposit the ingestion process begins. It is assigned a internal accession number (alternateIdentifier) for internal tracking and a curator communicates with the depositor about confusing or conflicting information supplied on the deposit form. Any tags included on the deposit form are matched to a controlled vocabulary of subjects or are added as new subjects. This increases findability of datasets during searches. The data dictionary is checked for completeness and is also assigned an internal accession number.

Curation

Datasets undergo a rigorous cleaning process to increase usability. An fF curator performs the following actions:

  • Cleans the dataset to the standards listed below.

  • Matches column headers to available metadata elements.

  • Normalizes tags

  • Obtains a digital object identifier (DOI).

Data Cleaning

General

  • Remove leading zeros and consecutive white spaces.

  • Correct all misspellings.

  • Normalize capitalization within columns.

  • Normalize and define abbreviations.

  • Escape special characters including /@#\$%

  • Encode missing values and blank cells.

    • Missing value codes match data type:

      • String: null

      • Numeric: -9999

Georeference

  • Separate Latitude and Longitude columns.

  • Data type is numeric and values are decimal degrees.

  • Longitude value is appropriately signed negative or positive.

Date and Time

  • Date and Time are separate fields.

  • Both fields follow ISO 8601,

    • YYYY-MM-DD

    • hh:mm:ss (24-hour clock system)

Examples of cleaned datasets

Dataset Name

Description

Data that was difficult to reconcile to fF specifications.

Data collected by park staff

Data from an app created by an ecologist

Salvage and motorist reported roadkill data for the state of Idaho

An example of a dataset with renamed column headings is here.

Assigning Metadata Elements and Normalizing Tags

Column headers within the dataset are matched as accurately as possible to available metadata elements and are renamed accordingly. Changes are tracked and stored as a JSON file.

Depositor supplied tags are matched with terms in a controlled vocabulary. Most changes occur with pluralized terms and common or scientific taxon names. At the curator's discretion, tags with no matching term are added to the controlled vocabulary or the depositor is contacted with an alternative suggested term.

Obtain a DOI and Publish

After dataset level metadata is checked and cleaned, it is used to mint a DOI from DataCite for the dataset. This DOI resolves to a webpage maintained by fF that displays all metadata and provides links to download the dataset, data dictionary, and any other related files.

Last updated

Was this helpful?