Skip to Main Content

ZivaHub Data Sharing and Publishing: Organizing Your Data

This guide details all the information needed when using UCT's Institutional Repository ZivaHub

Data Organization

Data Types:

Various kinds of research projects generate and collect different kinds of data.  Data can be easily categorized into these four categories:

  • Observational
    • Usually captured in real time and not in the laboratory
    • Often irreplaceable (i.e. one time event) and not likely reproducible
    • E.g. astronomical observations, sensor readings, sensory observations etc.
  • Experimental
    • Captured in the laboratory under controlled conditions
    • Likely reproducible but can be expensive both in time and costs
    • E.g. gene sequences, microscopy, chromatograms etc.
  • Computational/Simulation
    • Computer generated from test models
    • Likely reproducible if computer inputs are preserved but is expensive both in time and costs
    • E.g. economic models, climate models etc.
  • Derived
    • Produced by existing datasets
    • Likely reproducible but can be expensive both in time and costs
    • E.g. text and data mining, compiled databases etc.

Directory Structure & Folder Naming Conventions

Directory Structure/Folder Naming Conventions:

The top level folder or directory should have the following descriptors and folder names should be kept under 32 characters

  • Project title
  • Unique identifier
  • Date (yyyy or yyyymmdd)

Folder Hierarchy Example: [Project]/[Experiment]/[Instrument Used]

FOLDER SUBSTRUCTURE - The folders/directories within the substructure should be split according to a particular theme; e.g. each folder may contain a run of an experiment or a different version of a particular dataset.

Data Versioning

You should be aware of the versioning of your research data when you save new copies of your file. Applying proper data versioning policies to your dataset will save a lot of time when you need to retrieve specific versions of your files in the future.

Here are some suggestions:

  • Include a version number, e.g "v1," "v2," or "v2.1. For e.g., DataFileName_1.0 = original document; DataFileName_1.1 = original document with minor revisions; DataFileName_2.0 = document with substantial revisions
  • Include information about what changes were made, e.g. "cropped" or "normalized"

Linked Resources