LibGuides: Research Data Management (RDM): Dealing with Sensitive Data

What is Sensitive Data

‘Sensitive data’ refers to data that needs to be protected from unauthorised access or unwarranted disclosure. It is generally considered to be:

Identifiable data: Data that can be used to identify an individual, endangered species, object or location. This identification would increase potential risk of harm, discrimination, or unwanted attention.
Proprietary data: Data that is internally generated and gives competitive advantage to its owner. This includes research data with commercialisation potential. Proprietary data may be protected under copyright, patent, or trade secret laws.
Restricted or confidential data with contractual (e.g. Research Collaboration Agreements, Project Agreements, Material Transfer Agreements, Non-Disclosure Agreements) or legal obligations (e.g. Official Secrets Act).

Sensitive data can be information that is protected against unwarranted disclosure. It can include but not limited to personal data, proprietary data and other restricted or confidential Data that should be protected from unauthorised access.

Best Practices For Handling Sensitive Data

It is recommended that you adopt a proportionate risk-based approach in handling sensitive data, throughout the data lifecycle:

Assess potential risks and consequences before commencing research
Outline data protection practices in the Data Management Plan (DMP)
Implement best practices within project teams/collaborators at the start and throughout research

For related information, see below:

NTU guidance on handling of digital and non-digital data at different levels of sensitivity.
Personal Data Protection Commission (PDPC) Singapore, Guide to securing Personal Data in Electronic Medium and Guide to Data Protection Impact Assessments
MANTRA, The University of Edinburgh, Protecting sensitive data, [Online course], CC-BY.

Best Practice for Sharing Sensitive Data

Research data should wherever possible be made available for use by others in a manner consistent with relevant legal, ethical and disciplinary frameworks and norms. For sensitive data however, please see step-by-step guidance below before sharing:

Consider data ownership, ethical concerns, intellectual property rights and other legal terms for the data.
Factor in data sharing when obtaining consent from research participants:
- In the information sheet, describe clearly who has access to the data during and after the project.
- In the consent form, offer clear choices to participants on whether they agree with archiving and/or reuse of the data from the project. Note that a participant can opt out of these activities, but still participate in your study.
Consider ways to protect the sensitive data by:
- Anonymisation or de-identification of identifiable data: For example, remove all direct identifiers, and remove or modify indirect identifiers until the risk of re-identification is negligible.
- Redacting research data to remove confidential information or third party intellectual property.
- Data encryption during data upload, download, and storage over secure platforms.
- Managed access to ensure that only bona fide researchers bound by professional obligations and specific agreements have access to the data.
Do not share or store the 'keys' to re-identification with de-identified datasets.
Consider if embargoes need to be placed on the data. This would mean that the description of the dataset is published but the embargoed data files remains restricted until a specified time.
Publish your data and metadata according to participant consent and ethics approval, and apply appropriate license, taking into account any limitations on re-use, redistribution, commercial use, etc.
If your data can't be anonymised, consider publishing a description (i.e. the metadata) which enables you to place conditions around access to the data.

See example 1 and example 2 of de-identified datasets and how they are shared in real life.

Acts of anonymisation and aggregation render sensitive data non-linkable and therefore shareable. However, this may remove valuable information from the data. In some cases, therefore, instead of making sensitive data openly available, it may be preferable to release the data, on request, to other bone fide researchers using non-disclosure data sharing agreements, in addition to applying access control.

References:

Australian National Data Service (2018), Publishing and sharing sensitive data, ANDS Guides.
Inter-university Consortium for Political and Social Research (n.d.) Guide to Social Science Data Preparation and Archiving, Best Practice Throughout the Data Life Cycle: 6^th Edition.
Chapman and Grafton (2008) Guide to Best Practices for Generalising Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-b02j-gt10

UCT around sensitive data

In terms of the UCT Intellectual Property (IP) Policy, UCT is the legal owner of research data emanating from research done by its researchers.

UCT grants the Principal Investigator (PI) of a research project the right to upload UCT research data supporting a publication required by a journal publisher or a funder and all UCT project data where this is a specific funder requirement, as long as the data complies with any ethics requirements (e.g. patient confidentiality, consent, etc.). Where a data set has been created in conjunction with researchers outside of UCT, the necessary permission(s) should be sought from the relevant institutions by the PI, prior to the upload of data.

Where data may not be shared publically (e.g. due to ethical consideration), controlled access should be applied on figshare.

Types of Sensitive Data

Personal Data: Data, whether true or not, about an individual who can be identified from that data; or from that data and other information to which the organisation has or is likely to have access.

When sharing or publishing your research data, you should be aware of the disclosure risks stemming from the release of direct identifiers or indirect identifiers in your dataset.

Direct Identifiers

Variables containing information that can explicitly identify particular individuals or units. You are recommended to remove direct identifiers before you release your dataset.

Examples:

Name/ Initials
Mailing address
Phone number
Email address
Identity card
Social Security numbers
Biometric data
Driver's license numbers
Vehicle identifiers

Indirect Identifiers

Variables that can be used together or in conjunction with other information to identify particular individuals or units.

Examples:

Gender
Race/ Ethnicity
Birth year or age
Place of birth
Rare disease or treatment
Occupation
Annual income
Postal code

Proprietary Data: Data, including any and all Intellectual Property and any rights thereof (whether registered and/or unregistered),know-how, trade secrets, whether written, oral, pictorial or in other tangible form, which gives competitive advantage to its owner. It may also include data generated or used under a restricted research funding agreement with industry partners.

Resources for Sensitive Data

Below is a compilation of resources for working with sensitive data:

NTU Research Integrity and Ethics Office (RIEO)
- NTU’s policy on Research Involving Human Subjects
NTU Institutional Review Board (IRB)
NTU Libraries
- Sensitive data LibGuide
- Anonymisation LibGuide

NTUitive
- For questions on commercialisation and IP, please contact NTUitive at this email

NTU Legal & Secretarial Office (LSO)
- LSO 101 FAQ page (e.g. copyright, ownership, IP)
- Contact page > Research Contracts (e.g. NDA)

Personal Data Protection Commission (PDPC) Singapore
- Guide to securing Personal Data in Electronic Medium
- Guide to Data Protection Impact Assessments