Research data management

Re-using and archiving existing data

Before you re-use existing data you need to be aware of the licence conditions attached to that data. Third-party data is likely to have copyright and/or other licensing issues associated with it that will stipulate if the data can be reused, modified or re-archived. This may have a significant impact on your research if your funder has stipulated that you need to make your data available at the end of your research.
Although licences vary, the three conditions commonly found in licences are attribution, copyleft and non-commerciality. These are explained below:
  • Attribution requires that you must acknowledge the source of the data when it is distributed, displayed, performed or used to derive a new work. Datasets can be particularly difficult as they are prone to attribution stacking where the contributor of each work needs acknowledging. 
  • A copyleft requirement means that any new works derived from the licensed one must be released under the same licence, and only that licence. This can be problematic if you are trying to combine data from different data sets which have been released under different copyleft licences: the derived dataset being unable to satisfy both sets of licence terms simultaneously. However, some copyleft licences do have a small amount of flexibility in allowing derivative works to be released under a compatible licence providing that it applies approximately the same conditions. For example, the GNU Project maintains a list of licences for code which permit distribution under the GNU General Public Licence (GPL) and whose terms the GPL can accommodate. 
  • A non-commercial licence prevents the licensee from exploiting the work commercially. However, what constitutes commercial use can be ambiguous depending on individual interpretation and it is advised that you consider the wider use of your research carefully. For example, a licence may preclude the data being used in support of works that are sold, such as journal articles, even though the author does not benefit financially. Often non-commercial licences are used as part of a dual-licensing regime where the alternative licence allows commercial uses but requires the licensor to pay for the privilege. This type of multiple licence is usually associated with open source software.

Additional things to consider:

  • Vanishing data – If the licence precludes you from re-archiving the data then be aware that data and datasets can vanish without notice. Data from repositories is more reliable. 
  • Purchased data – ensure you are fully aware of the any licence restrictions and conditions imposed upon its reuse. 
  • Coding data – if you are using additional coding methods such as using MIT, Apache or GPL be aware of the different licence restrictions attached. For example, as explained above GNU uses a GPL copyleft licence. 
  • Getting permission – it may be possible to seek direct permission from the data’s copyright owner if the licence is too restrictive. It is advised to seek this permission before working on the data rather than retrospectively, as permission may not be granted.
Adapted in part from Ball, A. (2014) How to License Research Data. Published by DCC and JISC Legal, under licence CC BY.