Research Data

What is research data?

Research data can be defined as any digital object created during the course of research (which might include documents, still images, video and audio files, spreadsheets, software, computer code, databases or websites) in addition to physical objects such as sketchbooks, diaries, lab notebooks, portfolios, models, or other artefacts. It also includes the documentation of Practice-as-Research.

Why look after data?

The benefits of looking after your research data effectively include:

Improving the integrity, longevity and usefulness of your research which includes mitigating the risk of accidental data loss or inappropriate release of sensitive data as well as making sure associated records are complete
Enabling data sharing and re-use, increasing the visibility, impact and integrity of research
Supporting future use and discovery
Meeting funder requirements

See "Making the case for Research Data Management" (Digital Curation Centre) for more detailed information.

Research Ethics

Middlesex has a responsibility to ensure that research conducted by its employees, researchers and students, or by others in its name is carried out in conformity with the law, and in accordance with the best current practices and principles, and follows the University’s Code of Practice for Research: Principles and Procedures (intranet login required).

The University is committed to maintaining high standards of ethics in research and the Middlesex Online Research Ethics (MORE) system is designed to support researchers at all levels to undertake research according to relevant ethical, legal and professional obligations and standards, in whatever context. It is important to remember that research data collection/analysis must not be undertaken prior to approval from your Research Ethics Committee.

The MORE site is for all student and staff researchers requiring research ethics review and approval. All users must use their MDX email to log-on and access the system.
Reviewers of research ethics applications can access the MORE Review site.

Further information is available on the research ethics intranet page (intranet login required).

Acknowledgements

We gratefully acknowledge the work of the University of Bath in the development of this guidance.

Archiving Data

A long-term archive of research data can have a number of benefits:

It can be used to demonstrate compliance with national information access legislation, e.g. Freedom of Information Act 2000, Data Protection Act 1998, Environmental Information Regulations 2004, etc., and other funding body and sponsor requirements.
It secures the ongoing accuracy, authenticity, reliability, integrity and completeness of research data by safeguarding it against loss, deterioration, unauthorised or inappropriate access, obsolescence and future incompatibility.
It facilitates a consistency of approach which adds value to the University's overall research profile, saving effort and resources over time and enabling future sharing of research data.
It increases the visibility of institutional research over time by providing robust evidence of past, current and ongoing University research activity, broadening, deepening and supporting its long-term impact.

For more information about the benefits of archiving data, see Why Deposit Data? from the UK Data Archive

Not all data need to be kept beyond the lifetime of a project; indeed in many cases it would be impractical to keep everything:

Archived data must be discoverable: if no-one can find them, no-one will use them
Archived data must be usable: if finding the few useful bits from a huge dataset is like looking for a needle in a haystack, they won't be used at all
Storage space is expensive: archived data must be robustly backed up, which can more than double the cost compared to typical storage.

Before archiving data, you should carefully consider what is and is not important to keep.

Ideally, you should consider this before you even start, when writing your data management plan.

More information

How to appraise and select research data for curation(Digital Curation Centre)

Keeping Research Data Safe

We all want to keep our data safe and secure, but there are several aspects to this which you should consider to have confidence that your data is as safe as it can be.

Files can be lost accidentally in many different ways. Even if they are not lost completely, they can occasionally become corrupted. If a file is severely corrupted it may be unusable, but even subtle corruption may introduce errors which go unnoticed while affecting the outcome of your research.

See storage and servers (log in to intranet) for further information including information about backing up data held on user controlled computers and file servers.

Things to think about

Regular backups: (ideally automated) to several different locations will ensure that if one copy is lost or corrupt, you can easily get it back. When deciding how often to back up, think about the maximum number of days' work you would be prepared to lose.
Checksum tools: A checksum is a file's digital signature, which can then be used to detect unexpected changes in their contents.
Non-digital data: If you have data which are not kept on a computer, you should make sure they are protected too.

Preventing unauthorised access

In many cases, you may wish to restrict access to your data to a specific list of individuals. This might be because it is commercially sensitive to you or an industrial partner, or includes sensitive personal information covered by the Data Protection Act.

If you are collecting or using research data about individuals, you should read the University's Data Protection guidance(log in to intranet to view), which includes information about academic research.

Things to think about

Legal requirements: You may be under legal and/or contractual obligations to protect your data. If you're not sure, you can discuss this with the university's Policy, Compliance & legal Support Officers within the Research and Knowledge Transfer Office (log in to intranet to view), who can give you advice on your collaboration or consortium agreements and laws such as the Data Protection Act.
Use of secure systems: One way to restrict access is to use a password-protected system. Commercial services such as Dropbox may be convenient, but are unlikely to provide sufficient protection against unauthorised access.
Secure passwords: Passwords are often the weak link in any secure system. Make sure you choose passwords that are long and difficult to guess. Writing them down is OK, as long as you protect your written-down password very well, just like you would with your house or car keys.
Encryption: You will sometimes need to send data to people who don't have access to your secure storage system. Encrypting a file before you send it via insecure means (e.g. email) ensures that the contents can only be read by someone who has the key.

An often-overlooked aspect of data safety is ensuring that it remains usable. Students and staff arrive and leave on a regular basis, and often it can seem easier to repeat a whole set of expensive experiments rather than try to understand data left behind by researchers who have left the university.

Things to think about

Documenting data: Record information about the structure and format of your data and the process you went through to obtain it. In some cases this can be stored in the data files themselves; if not, it can be stored in a "read me" document in the same folder as the data.
Using standards: Be aware of standard file formats and standard nomenclature (such as letters used for variables) used in your field. Consider using files in open formats so that they can be read by a variety of software.

Not all research data are digital. Most researchers keep hand-written laboratory notebooks, journals and other materials which are not kept on a computer at all.

These materials are at particular risk of loss, as you only have one of them so it's worth thinking about how you can make them safer. Digitising them can be easy on a small scale, and even if you only use the non-digital version, having a digital version too can give you some valuable piece of mind.

Anything stored on paper can be scanned fairly easily:

Document scan be scanned and sent to your University e-mail only. The scan and store option allows you to scan your documents or images directly to your USB drive/memory stick.

If it's not easy to scan, you could try taking a digital photo, but check the quality of the image to make sure you can use it if you lose the original.

Audio recordings can easily be turned into digital sound files, if the sound content is important, or transcribed if only the words are needed. You can do this yourself, or employ a professional transcription service if you have a lot of recordings to digitise.

Other materials can also be digitised, with varying degrees of difficulty. For more detail on digitisation, take a look at these digitisation guides from Jisc Digital Media.

If it is not practical to digitise the data or artefact, you should make sure that they are protected some other way. A fireproof safe could be a good investment.

More information

Digitisation guides for various media (JISC Digital Media)

Your research data are valuable assets that you have probably invested considerable time, effort and money in creating. Protecting your data from loss is therefore an important aspect of data management during your project.

To protect your data you need to consider how they are stored, backed-up and secured whilst you are still working on them.

When considering your data storage strategy, you should consider the following:

Is the storage reliable or is there a risk that the data may be lost?
How much storage will I need and will this vary during the project?
Can I access my data storage from the different places that I work?
Are my data secure and how do I ensure that they can only be accessed by authorised people?

Storage

The IT One Stop Shop (intranet link – log in to access) is the place to get started for further information about file storage. This includes links to IT policies (intranet link – log in to access) and information about file back-up (intranet link – log in to access).

Off-campus IT support (intranet link – log in to access) reminds you to think about how secure your data storage / access is when off campus.

More information

University Data Protection guidance
Guidance on University Ethics Framework and Research ethics (intranet link log in to access)
Guidance on data security UK Data Service
Guide to data encryption UK Data Service

File Organisation

The following suggestions will help you to organise your data:

Use folders - When organising your data, consider using folders to group related files in one location. The number of files or folders per group may vary depending on the nature of your data.

Apply meaningful folder names - Ensure that you use clear and appropriate folder names that relate to the area of work or study rather than the individual responsible. This will avoid confusion if group members leave and is easier for new researchers to use.

Structure folders hierarchically - Design a folder structure with broad topics at the highest level and specific folders within these. However, try to avoid nesting folders too deeply as this may cause problems with path lengths.

Separate current and completed work - you may find it helpful to move temporary drafts or completed work into separate folders. This will also make it easier to review what you need to keep as you go along.

Control access at the highest level - it is easier to set access permissions near the top of your folder structure rather than trying to control permissions for deeply nested folders. This is particularly important if you need to grant someone access to only a subset of your data, in which case you could move these data to a new, higher-level folder.

Naming conventions are rules that allow electronic and physical records to be named in a consistent and logical way.

Use of consistent and meaningful names will enable you to identify and distinguish between similar records, making data retrieval easier.

If you create large numbers of data files that would be difficult to name individually, apply your naming convention at the folder level instead.

When you agree your naming convention, consider the following suggestions:

Keep names short but meaningful - if you use abbreviations, keep a record of what these are with the data, so that others can understand and use them
Include dates in YYYY-MM-DD format, according to the international ISO 8601 standard. This allows files to be sorted into chronological order and avoids confusion when national conventions vary.
Try to avoid using spaces - use punctuation such as hyphens or underscores to separate words, particularly for files that will be available online
Avoid using dots and special characters such as \ / : * ? " < > | as these may be reserved for the operating system.
Capture relevant information in file names rather than relying on basic file properties such as date of creation. This will allow processed data relating to a single experiment or study to be grouped together
If you are repeatedly capturing the same information in a file name, consider grouping the files in a folder named with that information
When personal names are used in file or folder names, use their family name followed by initials
Consider how different versions of a file will be identified

Examples

Files in a folder are usually shown sorted by name. You can take advantage of this to have your files appear in a consistent order.

Filenames starting with special characters such as @ will appear first, followed by numbers, then the letters A to Z

For example, you might use this to arrange your files as follows:

2012-03-07_Subject-A_Audio.mp3

2012-03-07_Subject-A_Transcript-raw.docx

2012-03-07_Subject-A_Transcript-anonymised.docx

2012-04-22_Subject-B_Audio.mp3

2012-04-22_Subject-B_Transcript-raw.docx

As you work with your data it is important to distinguish between different versions or drafts of your files. Version control can help you to easily identify the current version of your data so that you avoid working on older or outdated copies. If you are working with others it can also help to link versions of the data to the time and author of the change.

There are a number of ways that different versions of data can be managed:

File naming - a simple method of version control is to create a duplicate copy and then update version information to create a unique file or folder name.

Successive versions can be numbered sequentially, with whole numbers used for major revisions and point changes indicating minor edits. e.g. 1-0, 1-1, 1-2, 2-0, 2-1.
If you are working as part of a group it may help to include the initials of the person who made the change e.g. v1-0jm, v1-1ke, v2-0gb.

Version control tables - these are included within documents and can capture more information than using file naming conventions. Version control tables typically include the new version number, date of the change, person who made the change and the nature or purpose of the change.

Version control systems - there are many automated systems available that can store a repository of files and monitor access to them, logging who made what change and when. Version control systems are particularly useful for collaborative development of code or software. GitHub is a useful place to share, manage and review code.

Please see below a number of useful pages on our Intranet to help you further. Please note, you will need to log in to access the Intranet.

Further guidance on organising data:

Guidance on organising data (UK Data Service)
Advice on choosing a file name (Jisc Digital Media)
Version control and authenticity (UK Data Service)

We acknowledge the work of the UK Data Service, the University of Glasgow, the University of Leicester, the University of Southampton and the University of Bath in the development of this guidance.

Confidential Data

The University recognises several levels of sensitive data and information in its Information Classification Framework. If you are working with sensitive data, such as those relating to individuals or commercial companies, you need to take extra precautions to ensure they can only be viewed by those with permission to do so.

Encryption is the process of obfuscating data so that only those with the correct decryption key or password are able to read them. The strength of encryption refers to how difficult it would be for an attacker to decrypt the data without knowing the key in advance, and this depends on both the method and the key used.

The tool you use for encryption should inform you of the method it will use and may give you a choice. The Information Commissioner's Office currently recommends using the AES-128 or AES-256 encryption methods, of which the latter is stronger.

Whenever setting the key to be used by an encryption method, be sure to use a strong password.

More information

Encryption – Information Commissioner's Office

sing external storage providers

While external services such as Dropbox, Google Drive and OneDrive are convenient, they do not comply fully with the University's data policies due to the following issues:

data may be stored in jurisdictions which do not provide the same level of privacy and data protection as the European Economic Area;
they do not interact well with existing University storage services;
they do not provide sufficient guarantee of continued availability;
extra precautions must be taken in order to ensure more than one person at the University has access to the data, in case of researchers leaving the University.

Such solutions should therefore be avoided for sensitive data. If you are considering using external storage providers nevertheless, perhaps because of conditions imposed by external collaborators, only consider those which will allow you to take the following security measures:

Encrypt the data in transit between your local system and the external storage, for example by using protocols such as HTTPS or SFTP.
Encrypt the data stored remotely.
Store the data only in data centres operating in jurisdictions which provide the same level of privacy and data protection as the European Economic Area (external website), or that are contractually bound by the EU Model Clauses .

Securing computer equipment

Even if the data are stored securely, there is a risk that unauthorised persons might access the data using the credentials and equipment of authorised users. There are steps that can be taken to mitigate this risk:

Encrypt the hard drives of any laptops or other portable equipment used for accessing the data.
Ensure that desktop computers are locked with a password when left unattended.
Take reasonable precautions when entering passwords that others do not observe what is entered.

For more information about securing computer equipment, please contact Project Leader: IT Security.(intranet link log in to access)

More information

University Data Protection guidance (intranet link log in to access)

Transmission over standard HTTP or email is not secure, and may be intercepted and read by third parties. Extra precautions need to be taken when transferring sensitive data between collaborators:

Email can be made more secure by putting the sensitive data in an encrypted attachment. The encryption password should be transferred by other means.
Alternatively, the entire content of email can be made secure by encrypting it with a system such as PGP. If you wish to set this up for your University email account, please contact the University IT Security Manager. (intranet link log in to access)
Data can also be transferred on removable media, such as an external hard drive, by a secure courier. The courier to be used should be agreed on and trusted by both parties. The data should be encrypted on the drive and the password sent separately.

ou should ensure that you dispose of sensitive data securely. For example, If you have collected personal data, you should ensure that your methods of disposal provide adequate protection for the identity of participants.

Furthermore, you might be required to demonstrate that you have complied with any requirements to destroy third-party data in accordance with their terms of use.

Digital data
Removal of old IT equipment should always be arranged via CCSS and the Dell Managed Service. It must never be handed onto staff for their personal use or disposed of in any other way without the express permission of CCSS Desktop Devices / Logistics Manager.

Please use the links below for more information:

Information on IT equipment for staff (intranet link, log in to access)
Confidential waste secure facilities (intranet link, log in to access)

Non-digital data
Paper-based sensitive data can be disposed of using the University's confidential waste secure facilities which are provided on all campuses for the disposal of confidential information in line with BS EN 15713:2009

More information

Data disposal – UK Data Archive

Describing Data

An important but sometimes neglected step in generating research data is writing documentation to accompany it. First and foremost this documentation will be useful to you when you come to write up your results, especially if this will be some time later, and should you wish to revisit the data in a future project. The documentation will also be vital for anyone else coming to validate your findings, evaluate your data, or build on your work.

When documenting your data, the aim is to provide enough information so that a fellow researcher who is familiar with your field, but not necessarily your work within it, should be able to understand the data, interpret them correctly, and use them in new research. You may find it helpful to consider what you would need to know in order to use someone else's data in your research. Typically this will include the method used to collect the data and how they have been recorded, structured, processed or manipulated. You may also need to provide some broader context to explain the motivation for the design decisions you have taken and the significance of what you found.

More specifically, you may need to include some of the following elements:

details of the equipment used, such as the make and model of the instrument, the settings used, information on how it was calibrated;
the text of survey instruments used, including questionnaires and interview templates;
details of who collected the data and when;
citations for any third-party data you have used;
key features of the methodology, such as the sampling technique, whether the experiment was blinded, how sample groups were subdivided;
legal and ethical agreements relating to the data, such as consent forms, data licences, approval documents or COSHH forms;
details of the file formats and standard data structures used to record data and supporting information;
a glossary of column names and abbreviations used, explaining for example which measurement resulted in the given column and what units were used;
the codebook used to analyse and encode content;
the workflow used to process and manipulate data, including steps such as applying a statistical test or removing outliers;
details of the software used to generate or process the data, including version number and platform.

You may be recording some of this information in a lab notebook or research journal. If so, you may find it convenient to record the corresponding page numbers alongside the data files until you have an opportunity to transfer the information into a documentation file.

Depending on the context there are several places where the documentation can be placed:

Within the data file: Some file formats can record information in addition to the main data content. For example, the Observations and Measurements XML standard provides a way of recording sampling strategies and observation procedures as well as measurement values.
In a separate metadata file: Some disciplines have developed special file formats or data structures for recording supporting information. For example, the Agricultural Metadata Element Set (AgMES) provides a way of describing an agricultural dataset using the subject-predicate-object structure of the Resource Description Framework (RDF).
In a readme file: Any information that cannot be recorded in a structured way (i.e. as the values of fields in a data or metadata file) can be recorded as free text within a readme file.
In a published journal article: Some of the information needed to understand data would normally be provided in a journal article reporting the research. In order to prevent duplication of effort, it is possible to refer to an article to provide more information about a dataset, but before doing so you should be sure that (a) the article provides sufficient detail, and (b) the article will be available on open access.

Readme files
A readme file is a plain text file that is named 'readme' to encourage users it to read it before looking at the remainder of the content. It can contain documentation directly or instruct the reader where to look to find more information. Even though it is free text, the file should be structured into sections as an aid to the reader. The following are suggestions for what to include:

Methodology. Describe how you collected your original data. If referring to a published article, this could simply be a statement such as, 'Full details of the methods used to create the dataset are provided in,' followed by the reference. Be sure to include a direct link to an open access copy as well as the DOI of the article. Otherwise, sufficient information should be given to enable another researcher to recreate the dataset or create a comparable one. Avoid reproducing the text of a published article verbatim if you have not retained copyright.
Third-party inputs. If you used third-party data, provide a data citation or, if they are not available from a repository, describe how you accessed them
Pre-processing. If you processed raw data you collected yourself, describe how you prepared the raw data for processing (e.g. file or folder naming conventions). If you performed any pre-processing steps on third-party data (e.g. data cleaning, reformatting), give details here.
Workflow. Provide details of the steps you took to process the data. State the software, services or scripts you used, as well as where they can be found, how to install/invoke/run them, and any special settings they require.
Outputs. If your workflow generates auxiliary files as well as data files, explain which are which. Relate the outputs of your workflow to the data files you have or will submit for archiving.
File structure and conventions. Provide details of how to interpret your data files. For example, explain which measurement each column heading represents, the units of measurement used, and any abbreviations, coding or controlled vocabulary used. Structured metadata
Social scientists often package their data and metadata together using DDI or, if the data are strongly statistical in nature, SDMX.
Many types of biological and biomedical investigation have a corresponding Minimum Information standard, setting out what information would be needed to interpret the data unambiguously and reproduce the experiment.
Geospatial data are usually packaged in a format that complies with the standard ISO 19115. There are many profiles of this standard aimed at different communities; UK researchers are encouraged to use UK GEMINI, which is in turn compliant with the European INSPIRE Directive.
Some subject-specific data archives ask for data to be submitted in a particular format. For example, the NCBI Gene Expression Omnibus specifies a metadata set to be submitted along with data, and has developed the spreadsheet-based GEOarchive format for capturing it.
Disciplinary Metadata – Digital Curation Centre (external website)
BioSharing Standards Registry (external website - now called Fairsharing), aimed at the life sciences
Community Inventory of EarthCube Resources for Geosciences Interoperability (external website), aimed at geoscience
GEOSS Standards and Interoperability Registry – IEEE (external website), aimed at Earth observation
Content Standard References – Marine Metadata Interoperability (external website), aimed at marine science
Vocabularies
The NERC Vocabulary Server provides access to many different vocabularies in use in geoscience and oceanography.
The Open Knowledge Foundation runs the Linked Open Vocabularies service, which provides access to many different vocabularies that are suitable for use in Resource Description Framework (RDF) applications.

Planning a Project

There are lots of decisions to make before you start to create your data. Making these choices early on in your project can save you time and effort later, and your decisions will affect how you can use, share and publish your data.

Many funders now expect you to show you've engaged in data planning. You can do this by writing a data management plan, which is a document describing how your data will be handled both during a research project and after the project has ended.

Data management plans

If you are applying for research funding you may be required to submit a data management plan as part of your grant application.

There are a number of templates and tools available to help you write your data management plan. You can also ask for your data management plan to be reviewed before it's finalised. A review can ensure that your plans are suitable for the type of data you'll be creating and check that you'll comply with policies and legislation relevant to your project.

Other common names for a Data Management Plan include:

Data Management and Sharing Plan
Access and Data Management Plan
Statement on Data Sharing
Data Access Plan
Technical Plan
DMP

Throughout these web pages, we use the term "Data Management Plan" to cover all of these and similar documents.

Data security questionnaires

A number of funding bodies require that data security questionnaires are completed for the projects that they fund. The IT Security Manager can advise on technical aspects of data security for these questionnaires.

Data security questionnaires

A number of funding bodies require that data security questionnaires are completed for the projects that they fund. The Project Leader - IT Security (intranet link log in to access) can advise on technical aspects of data security for these questionnaires.

Funder requirements

Many funding bodies now require data management plans to be submitted as part of grant applications, although the format and content of these plans can differ between funders.

The University of Bath has an excellent summary of funder policies which includes information on the requirements for data management plans for different funders.

The University of Bath also has excellence guidance on funder expectations for data management plans as does the London School of Hygiene and Tropical Medicine (funder requirements for data management and sharing).

Help and Information

If you would like individual help with writing a data management plan, please contact the Research Data Management Support service at [email protected]

For more information about securing computer equipment, please contact the Project Leader: IT Security. (intranet link log in to access)

More information

How to Develop a Data Management and Sharing Plan (Digital Curation Centre) [pdf version]
Checklist for a data management plan (Digital Curation Centre)
DMPonline data management planning tool (Digital Curation Centre)
Data Management Plan FAQs (Digital Curation Centre)

Writing a data management plan typically involves answering a series of questions about how you plan to create, describe, secure, retain and share your data.

Your plan should be concise and appropriate to the nature of your research, with more detailed plans for larger projects. You should justify the decisions you make and be prepared to implement your plan. You can also update your plan once your project has started to reflect changes in your research.

Because of the diversity of research, there is no single correct answer to what a data management plan should cover. However, a good data management plan should typically address the following topics:

If you're re-using existing data, what licences or terms of use will you have to comply with?
How will new data build on and relate to existing data? Why were existing data unsuitable for re-use in your new project?
What types of new data will you create and in what format? Did you chose these formats because they are standards in your discipline, are linked to the software or equipment you will use, or are open file formats?
Can you estimate the size of the data you'll create? Will it be less than 500GB, around 1TB, or substantially more than 1TB? How many boxes might non-digital data fill?
What methods will you use to capture your data and how will these ensure that your data are high quality? Will you use standard protocols, include replicates or controls, or automate data capture.

What contextual information is needed for you or someone else to understand your data? Do you need to record methodologies, equipment settings or abbreviations used?
How will you capture contextual information? Will this be in a 'readme' text file to accompany the data, or will you embed metadata directly in file properties or headers?
Are there any standards that you will use? The Digital Curation Centre maintains a list of metadata standards for different disciplines.

Where will you store your data and how will you ensure that they are backed up? Will you use University-managed data storage (intranet link log in to view) or need to set up your own back-up procedures?
How will you secure your data? What methods will you use to restrict access to your sensitive data? Will you encrypt hardware when working off campus?
How will you protect your research participants? Will you obtain informed consent for data retention and sharing? How will you anonymise data to safeguard the privacy of your participants?

Which subsets of your data will you keep at the end of your project? Will you retain anonymised versions but destroy personal data and identification keys? Will you retain all of the raw data or is a processed version more suitable to preserve? Do you need to keep all intermediary files or would you only need to refer back to input files or a final version?
How will you prepare your data for long-term preservation? Are you able to convert your data to open file formats (UK Data Archive)? What contextual information do you need to retain so that your data remain understandable and usable?
Where will you archive your data to ensure that they are preserved and sustained for several years after your project ends? Will you submit your data to a specialist data repository/centre and if so, have you consulted them about your requirements?
How big will your final dataset be and will there be any costs associated with archiving them, such as data deposit charges?

Funder Expectations

Many funding bodies now require data management plans to be submitted as part of grant applications, although the format and content of these plans can differ between funders.

This summary of funder policies(from the University of Bath) includes information on the requirements for data management plans for different funders.

The Research and Knowledge Transfer Office and the Library Research Support Team and can provide help with writing data management plans and can review them prior to submission with grant applications. If at all possible let us know as soon as possible. Contact us at [email protected].

More information

Overview of funders' data policies (Digital Curation Centre)
Funders' data policies (detailed overview (Digital Curation Centre)
Funder requirements for data management and sharing (London School of Health and Tropical Medicine)

Policy

A Research Data Management Policy has been approved for the University. The policy sets out the University's expectations for the management and sharing of research data.

The policy is relevant to all researchers, including postgraduate students. Anyone undertaking or supporting research should ensure that they are familiar with it.

Many funding councils have policies on research data that expect data underpinning published research articles to be made as openly available as possible in a timely and responsible manner.

The policy will ensure that research data are managed in a way that considers the requirements of collaborators, funders, and research participants. The policy covers:

responsibilities
data management plans
data retention after project completion
publication of data and justifications for withholding access

The policy was approved by Academic Board in June 2015, having been previously approved by Achievement Committee in May 2015. It was developed by the Research Support Team within Library & Student Support (LSS) in collaboration with the Digital Curation Centre and involved a wide consultation process, with input from representatives of all groups on which the policy may impact.

The Middlesex University Research Data Management Policy, approved by Academic Board in June 2015, sets out expectations of researchers with regard to the management and sharing of research data.

The University's Research Data Policy is aligned with the Middlesex University Code of Practice for Research.

Many of the University's main funders also have policies on research data or data sharing, broady aligned with the Research Councils UK Common Principles on Data Policy.

The University also has a number of other policies relevant to research data management including:

Data Protection (intranet link log in to access)
University Policy on Ethics and related Ethics Framework (intranet link log in to access)
Freedom of Information guidance
Intellectual Property Policy (intranet link log in to access), IP for researchers IP for students
IT Security Policy
Privacy Notice for Research Participants

More Information:

For advice on compliance with University or funder policies on research data, please contact [email protected].

Sharing and Re-using Data

Sharing research results is an established academic practice, whether through publication or through more informal means with colleagues and collaborators. The increasing digitisation of research means that it has never been easier to share data on a more detailed level.

If you're setting out on a research project, it's worth checking whether there are already data available that you might be able to use. This may show up as part of a literature review, but there are a number of dedicated data archives and repositories which you should take a look at too.

There are a number of reasons why you might consider sharing your own research data:

Sharing of data supports research integrity by allowing the analysis to be easily verified
Shared data can be a source of new collaborations, as your work is more discoverable
Published articles whose underlying data are also published often receive more citations than those whose data are kept private
Published data can often be used in novel ways not expected by the original data creators, such as large-scale meta-analyses
Where shared data are reused this can be used by the originating researcher as evidence of impact, helping career progression
Many funding bodies require data from funded projects to be shared publicly available where possible (e.g. RCUK policy on Access to Research Outputs).

Data Access Statement

Data access statements, also known as data availability statements, are used in publications to describe where data directly supporting the publication can be found and under what conditions they can be accessed.

Data access statements are required for all publications arising from publicly-funded research. They are a requirement of many funders' data policies and are a requirement of the RCUK Policy on Open Access (download the pdf, Section 3.3 ii). Inclusion of a data access statement is recommended for publications reporting other research.

Some funders have indicated that they now check for the inclusion of data access statements in publications that acknowledge their support. In particular, the requirement applies to all papers that acknowledge EPSRC funding with a publication date after 1 May 2015.

The aim of the data access statement is discoverability - the data referenced by the statement do not have to be openly available. There are many reasons why access to data should be restricted and if you are unsure about whether you should publish your data openly please [email protected] for advice.

Some journals now provide a separate section in articles for the data access statement. Where no such section exist, we suggest that you include the data access statement with the acknowledgement of funder support.

A formal data citation can also be included either with the main references or in a specified data citation section.

If you're unsure where to provide your data access statement please contact [email protected] for help.

The following are recommendations for what to include in the data access statement:

If data are openly available the name(s) of the data repositories should be provided, as well as any persistent identifiers or accession numbers for the dataset.
If there are justifiable legal or ethical reasons why your data cannot be made available, these should be included in the data access statement.
If the data themselves are not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted.
It is important that any links to the data are persistent. Digital Object Identifiers are a type of persistent URL that are provided for datasets by many specialist data archives.
If you did not collect the research data yourself but instead used existing data obtained from another source, this source should be credited.

A simple direction to interested parties to contact the author would not normally be considered sufficient.

The data access statement should be included in submitted manuscripts, even if identifiers have not yet been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.

Data access statements can also be combined with formal data citations, particularly where a publication is supported by multiple datasets archived in different locations. In this situation it may be more appropriate to cite each dataset separately, providing the persistent identifier in the citation, and direct users to the references from the data access statement. DataCite provides examples of data citations.

Please note that to prevent creating bias in metrics monitoring DOI resolutions, the URLs used in these examples are not genuine.

Depending on the nature of your data you may wish to combine information from different examples. Please contact [email protected] for help with structuring your data access statement.

Openly available data

"All data created during this research are openly available from [add in appropriate data archive e.g. Figshare at http://doi.org/10.15125/12345."

"All data supporting this study are provided as supplementary information accompanying this paper."

"All data are provided in full in the results section of this paper."

"Expression data are openly available from ArrayExpress (Accession E-MTAB-01234 at https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-01234/). Crystal structures are available from the Cambridge Crystallographic Data Centre (Identifier BATHRS) at http://doi.org/10.15125/010203. Microscopy images are openly available from Dryad at Home Page (doi.org)

Citation of multiple datasets

"This publication is supported by multiple datasets, which are openly available at locations cited in the reference section."

Secondary analysis of existing data

"This study was a re-analysis of existing data that are publicly available from EMBL at http://doi.org/10.15125/12345. Further documentation about data processing are available from the University of Bath data archive at http://doi.org/10.15125/12345."

"The study brought together existing data obtained upon request and subject to licence restrictions from a number of different sources. Full details how these data were obtained are available in the documentation available at Home Page (doi.org).

Ethical restrictions

"Anonymised interview transcripts from participants who consented to data sharing, plus other supporting information, are available from the UK Data Service, subject to registration, at http://doi.org/10.15125/12345."

"Due to ethical concerns, supporting data cannot be made openly available. Further information about the data and conditions for access are available at the: http://doi.org/10.15125/12345."

"Due to the (commercially, politically, ethically) sensitive nature of the research, no interviewees consented to their data being retained or shared. Additional details relating to other aspects of the data are available from the http://doi.org/10.15125/12345."

"Supporting data are available to bona fide researchers, subject to registration, from the UK Data Service at http://doi.org/10.15125/12345."

Commercial restrictions

Supporting data will be available from http://doi.org/10.15125/12345 after a 6 month embargo from the data of publication to allow for commercialisation of research findings."

"Due to confidentiality agreements with research collaborators, supporting data can only be made available to bona fide researchers subject to a non-disclosure agreement. Details of the data and how to request access are available at: Home Page (doi.org)

Non-digital data

"Non-digital data supporting this study are stored by the corresponding author at Middlesex University. Details of how to request access to these data are at Home Page (doi.org)

No new data created

"No new data were created during this study."

Please see below for a number of useful links for more information to help you

Data Repositories and Archives

Digital data repositories, data archives or data centres accept, preserve and disseminate research data, often for a given community. Repositories may be organised by subject (e.g. structural chemistry data, gene sequence data, social science data) or by organisation such as a research funder.

Research data are typically submitted to the repository by the data creator or owner. The data repository then takes responsibility for preserving the data, managing any access restrictions and making information about the data (metadata) discoverable.

A growing number of data repositories and databases are available that archive research data from many subject areas. Unfortunately coverage of different disciplines varies - whilst the social sciences and biosciences are well supported, relatively few data repositories accept engineering data.

To help you find a suitable data repository a number of lists have been compiled:

re3data - Registry of Research Data Repositories
Biosharing

To archive the data created by projects they support, some funders either run data centres or provide lists of recommended data repositories:

UK Data Service - funded by the ESRC
NERC data centres
Wellcome Trust guidance on data repositories
BBSRC list of data sharing resources

A number of journals support the use of Dryad, Figshare and Zenodo for data underlying scientific and medical literature. Nature's Scientific Data journal also maintains a list of recommended data archives.

There are a number of things to consider when selecting a suitable repository to archive and publish your research data:

What type of data does the repository accept and what is its subject focus?
Does the data repository already have good reputation in your field and is it recommended by your funder or journal?
Will the repository provide enough metadata to enable your data be discovered and cited by other researchers?
Will the repository issue your data with a persistent identifier, such as a Digital Object Identifier (DOI) or an accession number, that you can include in your data access statement? A search for archives in re3data allows you to tick a box restricting results to those that provide persistent identifiers.
Are access restrictions or embargoes permitted? Will the archive ensure that confidential or personal data (intranet, log-in required) are secured if that is required?
Do the archive's terms and conditions fit with the University's Intellectual Property policy (intranet, log-in required)? For example, does the archive require that you assign any copyright in the data to the archive? We recommend avoiding using archives that require transfer of rights.
What licences are available and do they comply with the University's Research Data Management Policy?
Is the archive established and well funded so that you can rely on it still preserving your data in 10 years time

If you are considering using an external data archive and require advice on its suitability, please contact [email protected] for advice.

Research data shared service

Jisc is funding a pilot service which will enable researchers to easily deposit data for publication, discovery, safe storage, long term archiving and preservation. This means that they are able to provide sustainable access to research data so it can be re-used.

Part of the project remit will be to make a business case for ongoing investment in the service.

Middlesex University is a pilot partner alongside a number of other UK universities.

Phase 1 in the project sees Middlesex implement and trial figshare for institutions - a data repository platform. Later phases will see the implementation of a preservation platform and integration work to enable interoperability.

For further information about the project see Jisc’s Research data shared service project pages as well as Jisc’s Research data blog.

If you have questions about our participation please get in touch via [email protected].

Or take a look at our Yammer group.

Restricting Access to Data

There are a number of reasons why you can justify withholding your research data. Withholding data means taking a decision not to openly publish them, even if there are obligations to funders or publishers to openly share the outputs of research.

If you receive a request for research data or information about your research under the Freedom of Information Act or a request for personal information under the Data Protection Act (login to the intranet to view) you should immediately refer the request to: Teresa Kelly - Data Protection Officer.

Research funders with data sharing policies typically require that research data are made as openly available as possible, recognising that there may be legal, ethical or commercial reasons why access to some data may need to be restricted. These restrictions typically apply at all stages of a project so that the research process is not damaged by inappropriate release of data.

Access to research data does not have to be completely open. If there are justified reasons why some research data cannot or should not be made openly available, it may still be possible to share subsets of your data either through use of consent and anonymisation, or by regulating and restricting access to specific users. This should always be considered in preference to a blanket restriction.

Even if data are identified as unsuitable for open access, other data management requirements will still apply. A data management plan should be used to identify and document reasons for withholding data and publications should still include a data access statement. This statement should include reasons why the data are not openly available and, if possible, conditions for access being granted.

The information below provides guidance to help you determine whether you may be justified in withholding your research data from publication. If you are unsure as to whether you could or should make your data openly available, please email the research data team before you publish your data.

Middlesex University recognises that information and the associated processes, systems and networks are valuable assets and that the management of personal data has important implications for individuals. Through its security policies, procedures and structures, the University will facilitate the secure and uninterrupted flow of information, both within the University and in external communications. The University believes that security is an integral part of the information sharing which is essential to academic and corporate endeavour and associated policies are in place to support information security measures throughout the University.

For further information on IT security get in touch with the Project Leader: IT Security (intranet link log in to access).

If you receive a request to access your research data or for information about your research under the Freedom of Information Act, please do not attempt to answer the request yourself but instead contact the Freedom of Information Officer. Please note that the examples below relate to funder policies on data sharing and do not necessarily constitute valid exemptions under the Freedom of Information Act 2000.

Access to data doesn't mean that everything has to either be openly available to the public or completely restricted. There are many types of access control that can be applied to all or part of your data, some of which will still allow you to share your data, but only with specific users under regulated conditions. This should always be considered in preference to a complete restriction on the whole dataset.

Access restrictions can apply to all types of research data and at any stage of a project. It is also possible that access restrictions can change over time. For example, it would be unusual to share any data during the active phase of a project before findings have been published and continued access restrictions may be required to allow time for commercialisation. However, after the end the project and once patent applications have been filed, it may be possible to release the data.

In some cases access restrictions might need to apply to an entire dataset. However, in many cases it is likely that restrictions need only apply to a subset of the data, such as raw data containing personal identifiers, allowing you to openly share the remaining data where access constraints do not apply.

In rare situations it might be necessary to restrict access to both the data and to metadata describing the data. If you think this might apply to your research, please email the research data team who will be able to advise on how best to comply with policy requirements for sharing data.

Types of access

The type of access that can be applied is not as simple as open or closed. Access to data can be restricted and regulated, allowing you more control over who is granted access to your data and when.

Publicly open access to data

Publicly open access to data would be suitable if there are no justifiable legal, ethical, contractual or commercial constraints on releasing data. This might apply only to final datasets supporting a publication, or to the completed outputs of a project, depending on your funder's requirements for data sharing.

A privileged period of exclusive access

A privileged period of exclusive access is permitted, allowing you time to analyse and publish the results of the data you have created or collected. During a project, and assuming no other constraints apply, you would only need to provide access to the subset of data supporting published findings. This would allow you time to continue to analyse and publish from your wider dataset. However, funder data policies differ on how long this period of exclusive use should last - you may be required to publish all final datasets within a defined period from either the end of project funding or the date of data collection.

Embargoes

Embargoes allow you to delay access to data, during which time access would be completely restricted. Embargoes can be used to ensure that your archived data are not published until the articles based upon them are accepted for publication or published. This might be a condition of publication for some journal publishers. Embargoes can also be used to delay access whilst patents are filed or while research is commercialised. The Policy, Compliance & Legal Support Officer in the Research and Knowledge Transfer Office (log-in to intranet to view) would be able to advise on how long this process might take and how long the embargo would need to be.

Registered access

Registered access is provided by some data archives, which require potential users to register before they are able to access data files. Registered access allows the data archive to monitor who accesses data, enabling reminders about conditions of use to be issued.

Access upon request

Access upon request might be required for some types of confidential or sensitive data. In order to manage this type of access a named contact is required for the dataset who would be responsible for making decisions about whether access is granted.

Non-disclosure agreements

Non-disclosure agreements can be used to share confidential or sensitive data with specific individuals for specific purposes and under specific terms. Contact the Policy, Compliance & Legal Support Officer in the Research and Knowledge Transfer Office (log-in to intranet to view) if you require a non-disclosure agreement for your data.

More information

Funders' data policies
Data Protection guidance (log-in to intranet to view)
University Ethics Framework (University Ethics Committee (log-in to intranet to view)
Research and Knowledge Transfer Office (log-in to intranet to view)

Personal data

Research data involving human subjects must be handled in accordance with the Data Protection Act 1998. The confidentiality of participants must be maintained and personal data should not be made available to any third party without the explicit, written, informed consent of the person to which it relates. If you are unsure about how data protection might apply to your research data, there is data protection guidance (log-in to intranet to view) available. You can also contact the data protection officer: Teresa Kelly for advice and help with wording consent statements.

Anonymised data should remove both direct and indirect identifiers, so that variables cannot be combined to reveal an individual's identity. This would apply not only to personal data but also to research data about organisations and businesses.

Although consent for sharing only relates to research data containing personal data, it is best practice to seek written informed consent from research participants for data to keep and openly share anonymised data after the project ends. Some external data archives will not accept anonymised data unless informed consent for data retention and publication was obtained.

It is possible to share only a subset of the data, where participants granted consent both to participate in the study and for their data to be retained and shared. This should be made clear in the data access statement and in any accompanying documentation so that potential users understand that the data can be re-used for new analyses, but not for validation of the original findings.

Other ethical considerations

There may be other ethical reasons why data should not be made openly available. Inappropriate release of some types of data might put research participants, the public or vulnerable groups at risk. For example:

Domestic energy usage data could be used to determine occupancy patterns in participants' homes.
Disease statistics might require anonymisation to avoid them being used to identify the location of villages in a war zone.
Spatial data that would reveal the location of an endangered species can be justifiably withheld to protect the species from poachers. This would also apply to the location of rare fossil specimens.

In these or similar situations it may still be possible to share other information from the dataset, in which case it should be made clear to future users which variables have been redacted, aggregated or anonymised in the dataset and why.

Please email the research data team for advice if you think that there may be legal or ethical reasons why your data should be withheld.

More information

Data Protection guidance (log-in to intranet to access)
Research ethics (log-in to intranet to access)
UK Data Service guidance on legal and ethical issues - includes consent and anonymisation
Freedom of Information guidance

Intellectual property rights

Before you can openly share research data you need to ensure that you have the right to do so, including ownership rights and conditions of use.

The University's intellectual property policies and guidelines are intended to address both the rights and property aspects of IP generated within the institution.

See Intellectual Property (log-in to intranet to view) for further information.

Where research is funded by an external partner, or where an external partner makes a contribution to a project, the partner may be awarded Intellectual Property Rights in the results, including the research data. This usually means that the results must be kept confidential by the University and only released under a publication protocol. It is recommended that all collaboration agreements should address the basis on which research data will be stored, accessed and published.

Third party data

If you have re-used any existing datasets that you have obtained from third parties then you need to ensure that you understand and comply with any terms under which the data may be used and shared. These data might include datasets you have downloaded from online repositories or databases, or research data shared by project collaborators.

In some situations, typically for software, you may be required to share any modifications under the same licence as the original data. There are a number of licences with this requirement and common examples include the GNU General Public Licences and the Creative Commons ShareAlike licences.

If you are not permitted to re-share or distribute the third party data directly, your documentation and data access statement should, as far as possible, provide details of where the study data were obtained so that other researchers can obtain or request access to the same data. This is particularly important for publicly available data.

See Copyright (log-in to the intranet to view) for further information.

Confidentiality

If your external research partners have provided you with any research data, or if you have collaboratively created new data, conditions for how these data may be used, retained and shared should be set out in any contracts or collaboration agreements covering the research. Please contact the Research and Knowledge Transfer Office (log-in to intranet to view) for further advice.

If your existing research-related agreements include any confidentiality clauses, or if the University owes your external partners any obligations of confidentiality in respect of certain research data, you must ensure that data publication would not breach these.

The need to comply with confidentiality clauses and contractual obligations would be valid justifications for withholding research data. However, it may be possible to share such data with other researchers, subject to non-disclosure agreements. The Research and Knowledge Transfer Office (log-in to intranet to view) should be consulted if you need to set up any research related agreements.

It is recommended that future research-related agreements ensure that publicly funded research involving third parties is planned and executed in such a way that published findings can be scrutinised and, if necessary, validated by others.

More information

Research and Knowledge Transfer Office (log-in to intranet to view)
Digital Curation Centre FAQ on open source software (external website)
Digital Curation Centre guidance on How-to Licence Research Data (external website)

We acknowledge the work of the University of Southampton (and the University of Bath) in the development of this guidance.

Commercially sensitive data

Access to research data can be restricted to protect commercially sensitive information, either created new or provided by commercial partners. These data might be provided under terms of a collaboration agreement by a third party for use only within a specific research project.

Commercially sensitive information might also include data obtained in interviews with participants employed by external organisations. Written, informed consent would be required before such commercially sensitive data could be made openly available.

In some rare situations it may not be appropriate to seek consent for sharing commercially sensitive data obtained from commercial partners or participants. However, every endeavour should be taken to maximise the potential for data sharing. Please contact the research data team for advice if you think seeking consent would jeopardise your research.

Depending on the terms of your research agreement, instead of completely restricting access it might be possible to make commercially sensitive data available only to certain users, such as bona fide researchers in a research organisation, and for a certain purpose, such as to verify and comment on a publication. In such cases a Non-Disclosure Agreement will be required, which can be set up by the Research and Knowledge Transfer Office (log-in to intranet to view).

Commercialisation

In some situations data may not be commercially sensitive but might have commercial potential. These data might eventually be suitable for sharing, but public access can be justifiably delayed to allow time to assess and protect the commercial potential of research. This might involve the use of embargoes while patent applications are filed.

Many project funders encourage the protection of intellectual property rights arising from the research they are funding, but individual funder policies should be consulted prior to using this access restriction. However, not all publishers accept restricted access to protect patent applications. PLOS's data policy is one example and individual journal requirements should be checked to avoid manuscript rejections.

If you think your research might have commercial value, contact the Research and Knowledge Transfer Office (log-in to intranet to view) for advice on commercialisation before you publish the data.

More information

EPSRC clarification and guidance on policy expectations - see expectation VI

Training and Support

If you can't find what you need please do get in touch with us at [email protected]

We are in the process of developing a programme of training – if you have a particular training need please do get in touch.

There is also some useful material in MANTRA – an online training course developed by the University of Edinburgh (updated August 2015).

Subject specific guidance

Alternatively, some subject-specific guidance booklets have been developed by external experts:

Managing and Sharing Data - UK Data Archive guidance for social scientists
A Guide to Data Management in Ecology and Evolution - British Ecological Society
Depositing Shareable Survey Data - UK Data Service guidance on social survey data
How-to guides - Digital Curation Centre

More information

Contacts/Key People

Sarah Stewart, Research Information Manager (LSS)
Nick Balstone, Research & Knowledge Transfer Office (RKTO)
Bilal Hashmi, Infrastructure Manager (CCSS)
Teresa Kelly, University Data Protection Officer
John Gilchrist, Information Governance Officer (EXC)
Tracey Cockerton, Chair of University Ethics Committee

What is research data?

Why look after data?

Research Ethics

Acknowledgements

Archiving Data

Archiving Data

Appraisal and Selection

Keeping Research Data Safe

Preventing loss and corruption

Things to think about

Controlling access

Preventing unauthorised access

Things to think about

Ensuring usability

Things to think about

Non Digital Data

Storing Data

Storage

More information

File Organisation

Naming files and folders

Examples

Version control

More information

Confidential Data

Encrypting sensitive data

Storing sensitive data

Transferring sensitive data

Disposing of sensitive data

Describing Data

What to include?

Where to record the information?

How to describe a dataset?

Planning a Project

Planning a Project

Data management plans

Data security questionnaires

Data Management Planning

Data security questionnaires

Funder requirements

Help and Information

More information

What should a data management plan cover

What data will you create or re-use?

How will you document and describe your data?

How will you protect your data and those associated with your research?

Which data will you retain and preserve after your project ends?

Funder Expectations

More information

Policy

Responsible data management

Policy development

Relevant Policies

Sharing and Re-using Data

Data Access Statement

Where to provide

What to include

Examples

Openly available data

Citation of multiple datasets

Secondary analysis of existing data

Ethical restrictions

Commercial restrictions

Non-digital data

No new data created

More information

Data Repositories and Archives

How to find a suitable data repository?

How to assess the suitability of an external data repository?

Jisc RDSS

Research data shared service

Restricting Access to Data

Information security

Freedom of information

Types of access restriction

Types of access

Publicly open access to data

A privileged period of exclusive access

Embargoes