Ingestion and Curation
fF curators are responsible for metadata mapping; translating the information provided by data contributors to the controlled vocabulary and metadata schema designed by the Repository's administrative
Ingestion
Once a dataset is accepted for deposit the ingestion process begins. It is assigned a internal accession number (alternateIdentifier) for internal tracking and a curator communicates with the depositor about confusing or conflicting information supplied on the deposit form. Any tags included on the deposit form are matched to a controlled vocabulary of subjects or are added as new subjects. This increases findability of datasets during searches. The data dictionary is checked for completeness and is also assigned an internal accession number.
Curation
Datasets undergo a rigorous cleaning process to increase usability. An fF curator performs the following actions:
Cleans the dataset to the standards listed below.
Matches column headers to available metadata elements.
Normalizes tags
Obtains a digital object identifier (DOI).
Data Cleaning
General
Remove leading zeros and consecutive white spaces.
Correct all misspellings.
Normalize capitalization within columns.
Normalize and define abbreviations.
Escape special characters including
/@#\$%Encode missing values and blank cells.
Missing value codes match data type:
String:
nullNumeric:
-9999
Georeference
Separate Latitude and Longitude columns.
Data type is numeric and values are decimal degrees.
Longitude value is appropriately signed negative or positive.
Date and Time
Date and Time are separate fields.
Both fields follow ISO 8601,
YYYY-MM-DDhh:mm:ss(24-hour clock system)
Examples of cleaned datasets
Dataset Name
Description
Dataset name
Description
Dataset
Description
Regularly updated from police reports.
Dataset
Description
Data from around the world collected by athletes and commuters who must apply for permission to submit data.
Reptile data submitted by citizens. HerpMapper can subset roadkill data.
An example of a dataset with renamed column headings is here.
Assigning Metadata Elements and Normalizing Tags
Column headers within the dataset are matched as accurately as possible to available metadata elements and are renamed accordingly. Changes are tracked and stored as a JSON file.
Depositor supplied tags are matched with terms in a controlled vocabulary. Most changes occur with pluralized terms and common or scientific taxon names. At the curator's discretion, tags with no matching term are added to the controlled vocabulary or the depositor is contacted with an alternative suggested term.
Obtain a DOI and Publish
After dataset level metadata is checked and cleaned, it is used to mint a DOI from DataCite for the dataset. This DOI resolves to a webpage maintained by fF that displays all metadata and provides links to download the dataset, data dictionary, and any other related files.
Last updated
Was this helpful?