LogoLogo
  • flattenedFauna
  • Audience and User Stories
  • Repository Software
  • Deposit Policy
  • Ingestion and Curation
  • Preservation Policies
  • File Naming Guide
  • License
  • Metadata Application Profile
  • Acknowledgments
  • References
  • Just for Fun
Powered by GitBook
On this page
  • Ingestion
  • Curation
  • Data Cleaning
  • Assigning Metadata Elements and Normalizing Tags
  • Obtain a DOI and Publish

Was this helpful?

Ingestion and Curation

fF curators are responsible for metadata mapping; translating the information provided by data contributors to the controlled vocabulary and metadata schema designed by the Repository's administrative

PreviousDeposit PolicyNextPreservation Policies

Last updated 6 years ago

Was this helpful?

Ingestion

Once a dataset is accepted for deposit the ingestion process begins. It is assigned a internal accession number (alternateIdentifier) for internal tracking and a curator communicates with the depositor about confusing or conflicting information supplied on the deposit form. Any tags included on the deposit form are matched to a controlled vocabulary of subjects or are added as new subjects. This increases findability of datasets during searches. The data dictionary is checked for completeness and is also assigned an internal accession number.

Curation

Datasets undergo a rigorous cleaning process to increase usability. An fF curator performs the following actions:

  • Cleans the dataset to the standards listed below.

  • Matches column headers to available .

  • Normalizes tags

  • Obtains a digital object identifier (DOI).

Data Cleaning

General

  • Remove leading zeros and consecutive white spaces.

  • Correct all misspellings.

  • Normalize capitalization within columns.

  • Normalize and define abbreviations.

  • Escape special characters including /@#\$%

  • Encode missing values and blank cells.

    • Missing value codes match data type:

      • String: null

      • Numeric: -9999

Georeference

  • Separate Latitude and Longitude columns.

  • Data type is numeric and values are decimal degrees.

  • Longitude value is appropriately signed negative or positive.

Date and Time

  • Date and Time are separate fields.

    • YYYY-MM-DD

    • hh:mm:ss (24-hour clock system)

Examples of cleaned datasets

Dataset Name

Description

Data that was difficult to reconcile to fF specifications.

Data collected by park staff

Data from an app created by an ecologist

Salvage and motorist reported roadkill data for the state of Idaho

Dataset name

Description

Derived from PDF

Derived from mortality data

Dataset

Description

Regularly updated from police reports.

Dataset

Description

Data from around the world collected by athletes and commuters who must apply for permission to submit data.

Reptile data submitted by citizens. HerpMapper can subset roadkill data.

Assigning Metadata Elements and Normalizing Tags

Column headers within the dataset are matched as accurately as possible to available metadata elements and are renamed accordingly. Changes are tracked and stored as a JSON file.

Depositor supplied tags are matched with terms in a controlled vocabulary. Most changes occur with pluralized terms and common or scientific taxon names. At the curator's discretion, tags with no matching term are added to the controlled vocabulary or the depositor is contacted with an alternative suggested term.

Obtain a DOI and Publish

Both fields follow ,

An example of a dataset with renamed column headings is .

After dataset level metadata is checked and cleaned, it is used to mint a DOI from for the dataset. This DOI resolves to a webpage maintained by fF that displays all metadata and provides links to download the dataset, data dictionary, and any other related files.

metadata elements
ISO 8601
here
DataCite
Vermont Roadkill
Glacier National Park, Canada, Roadkill
Garneau Roadkill
Idaho Roadkill
Florida Black Bear Roadkill
Colorado Cougar Roadkill
Florida Panther Roadkill
Nevada Crash Data
Adventure Scientists
Carolina Herp Atlas