Category: Data Repositories

Are data repositories the future? An eCRUSADers conversation with DataLoch

In this post, we will be discussing a very exciting project called DataLoch. DataLoch is a repository of linked health and social care administrative data sets from Edinburgh and the South East of Scotland. It was established in 2019 under the Data Driven Innovation (DDI) programme funded by the Edinburgh and South East Scotland City Region Deal. The programme is led by Professor Nick Mills at the University of Edinburgh but the programme is a collaborative partnership between NHS Lothian, Borders and Fife, Health and Social Care Partnerships, the University of Edinburgh, patients and the public. The ambition for the DataLoch repository is that it will help to solve some of the pressing health and social care challenges being faced in Scotland.

At present, the data linkage infrastructure and governance in Scotland is set up in such a way that bespoke linkages are created for the purposes of specific research projects that will be destroyed at the end of the project lifecycles. In contrast, within a research ready data repository like DataLoch, the access permissions to use and link data persist and expand over time without destruction. Such persistent curation increases the efficiency in creating data assets and greatly reduces the time required to provide data for projects. As is currently the case for bespoke linkages, all applications to DataLoch will go through an approvals process to ensure that the processing of the data is done in accordance with the required data protection principles, maintaining public trust and privacy of individuals’ data.

I was very excited when the DataLoch team said they would be happy to talk to eCRUSADers about their work because from my own experience, and from listening to the experiences of others, it appears that research ready data repositories may address some of the challenges that we eCRUSADers face. So, if you are interested in finding out about how such repositories might help, or if you are a researcher hoping to work with linked health and social care data in Scotland: read on!

Q: How will DataLoch help solve the problems researchers face around timing for access to data?

Q: Will DataLoch allow for flexibility in research proposals?

Q: Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

Q: Do you envisage DataLoch extending to other parts of Scotland in the future?

Q: What have been the biggest challenges you have faced when setting up DataLoch?

Q: Where will researchers access the DataLoch data? Will remote access be an option?

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

Q: Is the plan for DataLoch to continually update the data to contain the most recent data?

Q: When can researchers hope to apply for and access the DataLoch?

Solving the current challenges around conducting research with administrative data

Q: As eCRUSADers, one of the main challenges we face is that we have limited time. This makes accessing administrative records for research tricky because in many cases this can take a long time (see some of our Researcher Experience posts for examples). How will DataLoch help solve this?

“DataLoch’s governance model is agile and based on agreements with our data controllers based on a model of precedence. This means with every project our approval process is quicker – meaning researchers no longer have to experience long waiting times – while remaining robust and in line with GDPR. Research will be reviewed through a Caldicott panel and the Safe Haven delegated ethics panel. Whilst there are no fixed timescales as each project is unique, our average turnaround (from application to data delivery) is currently 3 months.”

Q: One of the other challenges Early Career Researchers face is the exploratory nature of their research. For example, in a PhD project, the researcher might set off with one research question in mind but after understanding the data more, they realise that there might be a more useful/relevant/beneficial research question they can answer. Will DataLoch allow for flexibility around the research proposal?

“There are two ways we can enable flexibility for exploratory research. Firstly, we do allow applications that are explicit about their exploratory nature. Applicants would then need to clarify the specific research nature when the exploration phase is complete. The second route a researcher could follow would be via an explicit proposal which, if it changes to accommodate a new research interest, would need a second research application to be approved.”

Work with patients and the public

Q: Of course, any use of patient and public data in research should include their views and input throughout the process. Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

“DataLoch have a public reference group which meets regularly and is part of our governance structure. The Public Reference Group have been extremely supportive of the DataLoch project and actually expressed some surprise that health data was not already more routinely linked. They are especially interested in ensuring easy to understand transparency of DataLoch’s work and that the data is being used in areas of public priority.

We would welcome volunteers to join this group, please contact us if you’re interested. We recognise that public perception does change over time and have a learning loop built into our structure that allows us to be sensitive to public opinion.”

Looking forward and reflecting back

Q: The Scottish Government’s initiative, Research Data Scotland, is trying to set up a similar model but on a Scotland wide level. Do you envisage DataLoch extending to other parts of Scotland in the future?

“Research Data Scotland have a fantastic national ambition and we would love to work with them as this idea develops. One of the points of difference for DataLoch is that we work at a regional level working closely with local clinicians to understand local context. This would be difficult to scale at a national level though there are interesting federation models that we would like to explore. Equally the granularity of the data we provide would be difficult to scale at a national level. It is also worth noting that our rich regional level data can contribute to answering some critical nationally-relevant questions.”

Q: What have been the biggest challenges you have faced when setting up DataLoch?

“Our greatest challenge has become a significant point of difference for us, we have created a virtual team that includes NHS clinicians, analysts and technical support as well as University of Edinburgh expertise. Working across boundaries has been tough but ultimately has created something unique and agile gaining the benefits that come with the expertise from both NHS and academia.”

Practical questions around using DataLoch data

Q: Where will researchers access the DataLoch data? Will remote access be an option? Safe settings are few and far between!

“Our preference is that researchers access the data via the National Safe Haven which can be done remotely.”

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

“The latest version is available on the website here. This dictionary corresponds to the current COVID-19 data set. Note that the website is under construction and some pages may move”

Q: Is the plan for DataLoch to continually update the data to contain the most recent data? Once again, this improves reproducibility of results and research transparency.

“Yes. DataLoch will be continually updating the data according to data source lag times, we will be managing version control to enable reproducibility of research as needed. We recommend researchers save their own coding and statistical analysis.”

Q: When can researchers hope to apply for and access the DataLoch?

“You can do so today, our current dataset is focused on COVID-19 but please do register for our DataLoch release newsflash via our website to receive updates.”