Skip to Main Content

Research Data Management (RDM): Data Organisation

UCT Libraries Research Data Services provide guidance and support for all aspects of the data lifecycle, from planning your data management strategy during the proposal phase through preserving your data at the conclusion of your project.

Often forgetting the names of your files or misplacing your files?

Putting your files into folders and naming them effectively can save many future problems of accessing your data. Investing time in setting up how you will organize your data and what you will call it can make you a more efficient researcher and your research more reproducible.

Will a future you be able to find the file you save today?

Data Organization

Data Types:

Various kinds of research projects generate and collect different kinds of data.  Data can be easily categorized into these four categories:

  • Observational
    • Usually captured in real time and not in the laboratory
    • Often irreplaceable (i.e. one time event) and not likely reproducible
    • E.g. astronomical observations, sensor readings, sensory observations etc.
  • Experimental
    • Captured in the laboratory under controlled conditions
    • Likely reproducible but can be expensive both in time and costs
    • E.g. gene sequences, microscopy, chromatograms etc.
  • Computational/Simulation
    • Computer generated from test models
    • Likely reproducible if computer inputs are preserved but is expensive both in time and costs
    • E.g. economic models, climate models etc.
  • Derived
    • Produced by existing datasets
    • Likely reproducible but can be expensive both in time and costs
    • E.g. text and data mining, compiled databases etc.

Directory Structure & Folder Naming Conventions

Directory Structure/Folder Naming Conventions:

The top level folder or directory should have the following descriptors and folder names should be kept under 32 characters

  • Project title
  • Unique identifier
  • Date (yyyy or yyyymmdd)

Folder Hierarchy Example: [Project]/[Experiment]/[Instrument Used]

FOLDER SUBSTRUCTURE - The folders/directories within the substructure should be split according to a particular theme; e.g. each folder may contain a run of an experiment or a different version of a particular dataset.

Data Versioning

You should be aware of the versioning of your research data when you save new copies of your file. Applying proper data versioning policies to your dataset will save a lot of time when you need to retrieve specific versions of your files in the future.

Here are some suggestions:

  • Include a version number, e.g "v1," "v2," or "v2.1. For e.g., DataFileName_1.0 = original document; DataFileName_1.1 = original document with minor revisions; DataFileName_2.0 = document with substantial revisions
  • Include information about what changes were made, e.g. "cropped" or "normalized"