Skip to Main Content

Niven Library - FitzPatrick Institute of African Ornithology Library Guide: Data Management

The library supports a number of themes - African ornithology, conservation biology, Antarctic and sub-antarctic biology, specifically birds.

File Formats for Long-Term Archiving

Electronic media, both hardware and software, are constantly changing as technological innovations improve on previous versions, producing more powerful and flexible solutions with which to conduct research.  Unfortunately this leads to very rapid obsolescence – it is most likely that digital data produced 20 years ago can no longer be read with today’s computers, operating systems and programmes. 

There are a number of ways to overcome data obsolescence and the format you use to store your data can assist in this endeavour.  Ensuring that data formats are interoperable  (i.e. can be read by a range of programmes and operating systems) is recommended for archiving digital data.  Formats which are considered more likely to be interoperable in the long-term are the following: 

  • Non-proprietary (proprietary formats are commercial formats like MS-doc or Adobe-PDF)
  • Open, documented standards (e.g. Portable Network Graphics - PNG; Hypertext Markup Language - HTML)
  • Machine standards (ASCII, Unicode)
  • Unencrypted
  • Uncompressed
  • Standards commonly used by global research 

Keep these standards in mind when saving your data into a format for archiving.  Keep a copy of the original as well, even if this is in a proprietary format. 

The Library of Congress has released Recommended Format Specifications for a range of digital media. These include textual works, graphic images, audio media, video media, software, datasets and databases.

Preferred formats suggested by the MIT (Massachusetts Institute of Technology) Libraries, on which this guide is modelled, suggest the following formats: 

  • PDF/A or .txt – not Word or WordPerfect
  • ASCII or .csv – not Excel
  • MPEG-4 – not Quicktime
  • TIFF or JPEG2000 – not GIF or JPG
  • XML or .rdf – not RDBMS 

Look at the UK Data Archive file formats table to get an idea of what type of file formats are considered suitable for a digital archive.

Acknowledgements: This guide is an adaptation of the one developed at the Massachusettes Institute of Technology Libraries.

Organising your Files

File Management

File management is considered intuitive; in the same way that finding information is considered intuitive.  This is true to a point, but a little bit of guidance goes a long way in assisting you to be more effective and efficient.  The best way to organise your files is to develop (and strictly maintain) conventions, both for your directory structure and for your file naming. Suggestions are:

  • Develop a hierarchical directory structure, think Linnaeus!
  • Top level folder = Project Title, Year and unique identifier.
  • Sub-folders = Field trip 1 + date, etc.
  • Versions of data = an example of the convention used for versions can be found in Wikipedia.

File Naming Conventions 

Identify your project or your field work in the file name so that it means something:

Don't use - Count Data.xlsx

Do use -African_Black_Oystercatcher_Count_Data.xlsx or ABO_Count_Data.xlsx 

For long-term data storage associated metadata would be required including temporal and spatial information as well as the name of the project and the name of the researcher.  For long-term storage .xlsx files should be converted to .csv files - but do keep a copy of the proprietary format. 

Renaming Files 

If you failed to develop a file naming convention before you started your research, you may want to rename your files according to a convention you have subsequently developed.  This could be a tedious waste of time, but fortunately tools are available to assist, some of these are even free! Those recommended by MIT Libraries are: 

 

Backing up Your Data

If you are not backing up your data, you are not managing your data! You should have 3 copies of your data

  • Original data
  • Copy of your data stored externally (e.g. on an external hard drive or server)
  • Copy of your data stored externally in a remote location (e.g. on an external hard drive at home

Types of Backup Solutions

  • External Hard Drive
  • FitzPatrick Institute NAS server (speak to Gonzalo Aguilar (gonzalo.aguilar@uct.ac.za)
  • UCT ICTS data backup servers (speak to Jenny Wood jenny.wood@uct.ac.za)
  • Cloud storage – the following are recommended by MIT Libraries
    • Amazon S3 -Requires client software, no encryption support (http://aws.amazon.com/s3/#pricing)
    • S3-based Remote Hard Drive Services such as Elephant Drive (www.elephantdrive.com) and Jungle Disk (www.jungledisk.com)
    • Mozy (from EMC) Free client software, 448-bit Blowfish encryption or AES key (http:// mozy.com/)
    • Carbonite Free client software, 1024Free 1024-bit Blowfish encryption (www.carbonite.com)

Make sure that you test your backups from time to time to make sure that your data is secure.

Life Cycle of a Dataset

The MIT Libraries have produced an excellent slide show called The Lifecycle of a Dataset.  I encourage you to have a look at this as it has a lot of useful information about managing and archiving data and about the reason for creating metadata and how to do this.