Electronic media, both hardware and software, are constantly changing as technological innovations improve on previous versions, producing more powerful and flexible solutions with which to conduct research. Unfortunately this leads to very rapid obsolescence – it is most likely that digital data produced 20 years ago can no longer be read with today’s computers, operating systems and programmes.
There are a number of ways to overcome data obsolescence and the format you use to store your data can assist in this endeavour. Ensuring that data formats are interoperable (i.e. can be read by a range of programmes and operating systems) is recommended for archiving digital data. Formats which are considered more likely to be interoperable in the long-term are the following:
Keep these standards in mind when saving your data into a format for archiving. Keep a copy of the original as well, even if this is in a proprietary format.
The Library of Congress has released Recommended Format Specifications for a range of digital media. These include textual works, graphic images, audio media, video media, software, datasets and databases.
Preferred formats suggested by the MIT (Massachusetts Institute of Technology) Libraries, on which this guide is modelled, suggest the following formats:
Look at the UK Data Archive file formats table to get an idea of what type of file formats are considered suitable for a digital archive.
Acknowledgements: This guide is an adaptation of the one developed at the Massachusettes Institute of Technology Libraries.
File Management
File management is considered intuitive; in the same way that finding information is considered intuitive. This is true to a point, but a little bit of guidance goes a long way in assisting you to be more effective and efficient. The best way to organise your files is to develop (and strictly maintain) conventions, both for your directory structure and for your file naming. Suggestions are:
File Naming Conventions
Identify your project or your field work in the file name so that it means something:
Don't use - Count Data.xlsx
Do use -African_Black_Oystercatcher_Count_Data.xlsx or ABO_Count_Data.xlsx
For long-term data storage associated metadata would be required including temporal and spatial information as well as the name of the project and the name of the researcher. For long-term storage .xlsx files should be converted to .csv files - but do keep a copy of the proprietary format.
Renaming Files
If you failed to develop a file naming convention before you started your research, you may want to rename your files according to a convention you have subsequently developed. This could be a tedious waste of time, but fortunately tools are available to assist, some of these are even free! Those recommended by MIT Libraries are:
If you are not backing up your data, you are not managing your data! You should have 3 copies of your data
Types of Backup Solutions
Make sure that you test your backups from time to time to make sure that your data is secure.
The MIT Libraries have produced an excellent slide show called The Lifecycle of a Dataset. I encourage you to have a look at this as it has a lot of useful information about managing and archiving data and about the reason for creating metadata and how to do this.