Tag: administrative data

Working with administrative data in Scotland: A round up of researcher experiences

We’ve hit n = 5 in terms of eCRUSADer Researcher Experience posts! It’s not quite there in terms of a sample size for claiming any statistically significant findings but I thought that it was about time we took stock of them to see if there were any common themes emerging. So, that’s what this post will briefly do.

First, the key challenges that our researchers are outlined. Next, you’ll see some direct quotes taken from the posts- in particular lots of positive messages about carrying out research with administrative data in Scotland. Finally (and hopefully most usefully for you), a list of some ‘Top-Tips’ so that your administrative data journey runs as smoothly as possible!

Key Challenges

There were some common challenges that popped up throughout the five researcher experience posts which I try to summarise below.

    • Timing Timing Timing!!!

This was (as expected) a clear theme that emerged in each of the researcher experience posts. In particular, the time taken between PBPP approval and data access.

    • Administrative datasets can be messy…

They aren’t made available to you in a ‘research ready’ format (even though a huge amount of work will have gone on behind the scenes to get them ready) and they don’t come with clearly defined data dictionaries.

    • ECR short term contracts

The nature of ECR work can means that we are often on short term contracts. Together with the issues around timing, this can have knock on consequences for our career trajectories if we don’t get access to the data in time.

Key Messages

But, it’s not all bad! Although there are real challenges involved in accessing and working with administrative data, each of the researchers we have heard from have agreed on the massive potential for administrative data in research that ultimately aims to improve outcomes for society. Here’s what they had to say:

“Administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society”

“There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients.”

“The ability to gain new insights from previously unseen data is something that should excite any researcher.”

“It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades.”

“Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!”

Top Tips and Solutions

    • Consider the time it can take to access the data and plan for this as far as possible

This is one of the issues that we are trying to shed some light on by putting together these researcher experience posts. It is rather tricky, not least because every project is different and has differing levels of complexity. However, there are some parts of the data acquisition process that are easier to plan for in terms of the time they will take. In particular, preparation of your PBPP application will probably take around 3-6 months. In terms of the time from submitting your application to the approval, this usually takes around 1-2 months. Knowing these timings means you can put them into funding applications etc. The harder bit is knowing how long it will take to get access to the data and we have heard from our researchers here that this can take up to three years! To try and understand how long things will take, it is well worth talking to your eDRIS coordinator about how frequently the datasets you have requested are linked for other projects. There may be some datasets that are harder to link than others or that have never been linked before. See if you can find any researchers who have previously worked with similar linked datasets and speak to them. They might have some good advice!

    • Have a plan B (and C!)

Unexpected things can (and probably will) crop up during your administrative data journey. And the longer things take, the more likely these unexpected events occur. The best thing you can do is have a back up plan. Better yet, have several! This may be using publicly available data, or settling for a subset of the datasets you have requested if there are particular hold ups with a specific dataset.

    • Prepare as much as you can before getting access to the data

There is actually a huge amount you can do whilst you are waiting for access to the data. You will still have to do a lot of data cleaning when you get access so one thing you can do is try and get as familiar as possible with the variables in the datasets you have requested. One idea might be to prepare a data dictionary (which includes codes) that you can ask to be transferred into the safe haven for when you begin analysis. You can also prepare some code for cleaning the data to some extent. For example, code to attach labels and value labels. You should also make sure you have done the relevant training (see the training section of the website for some useful links).

    • Acknowledge the limitations of administrative data

It is important to remember that administrative data has not been collected with research in mind. This can often mean that it wont contain all of the information you need to carry out the ‘perfect analysis’. What is important is that you are able to answer your research question with the administrative data, so be realistic. In some cases, it might be that your question would be better answered using survey data for example. 

    • Invest in the relationships with the key people involved in the data access pipeline

Get to know the people who are assisting you with data access and speak to people who have knowledge of the datasets you are requesting. Also, do both of these early on!

I hope this was useful in giving a summary of the researcher perspective of accessing and using administrative data in Scotland. It occurs to me that I haven’t yet contributed my own Researcher Experience post. I should say that overall my experience has been largely similar to those we have heard about. I gave a talk on my experience recently at a useMYdata event (if you haven’t heard of them then do check out the great work they are doing!). You can find my slides and the recording from the event here. The event was particularly focused around the researcher’s journey in accessing routine health data and how patients themselves are (or can be) involved throughout the process.

Are data repositories the future? An eCRUSADers conversation with DataLoch

In this post, we will be discussing a very exciting project called DataLoch. DataLoch is a repository of linked health and social care administrative data sets from Edinburgh and the South East of Scotland. It was established in 2019 under the Data Driven Innovation (DDI) programme funded by the Edinburgh and South East Scotland City Region Deal. The programme is led by Professor Nick Mills at the University of Edinburgh but the programme is a collaborative partnership between NHS Lothian, Borders and Fife, Health and Social Care Partnerships, the University of Edinburgh, patients and the public. The ambition for the DataLoch repository is that it will help to solve some of the pressing health and social care challenges being faced in Scotland.

At present, the data linkage infrastructure and governance in Scotland is set up in such a way that bespoke linkages are created for the purposes of specific research projects that will be destroyed at the end of the project lifecycles. In contrast, within a research ready data repository like DataLoch, the access permissions to use and link data persist and expand over time without destruction. Such persistent curation increases the efficiency in creating data assets and greatly reduces the time required to provide data for projects. As is currently the case for bespoke linkages, all applications to DataLoch will go through an approvals process to ensure that the processing of the data is done in accordance with the required data protection principles, maintaining public trust and privacy of individuals’ data.

I was very excited when the DataLoch team said they would be happy to talk to eCRUSADers about their work because from my own experience, and from listening to the experiences of others, it appears that research ready data repositories may address some of the challenges that we eCRUSADers face. So, if you are interested in finding out about how such repositories might help, or if you are a researcher hoping to work with linked health and social care data in Scotland: read on!

Q: How will DataLoch help solve the problems researchers face around timing for access to data?

Q: Will DataLoch allow for flexibility in research proposals?

Q: Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

Q: Do you envisage DataLoch extending to other parts of Scotland in the future?

Q: What have been the biggest challenges you have faced when setting up DataLoch?

Q: Where will researchers access the DataLoch data? Will remote access be an option?

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

Q: Is the plan for DataLoch to continually update the data to contain the most recent data?

Q: When can researchers hope to apply for and access the DataLoch?

Solving the current challenges around conducting research with administrative data

Q: As eCRUSADers, one of the main challenges we face is that we have limited time. This makes accessing administrative records for research tricky because in many cases this can take a long time (see some of our Researcher Experience posts for examples). How will DataLoch help solve this?

“DataLoch’s governance model is agile and based on agreements with our data controllers based on a model of precedence. This means with every project our approval process is quicker – meaning researchers no longer have to experience long waiting times – while remaining robust and in line with GDPR. Research will be reviewed through a Caldicott panel and the Safe Haven delegated ethics panel. Whilst there are no fixed timescales as each project is unique, our average turnaround (from application to data delivery) is currently 3 months.”

Q: One of the other challenges Early Career Researchers face is the exploratory nature of their research. For example, in a PhD project, the researcher might set off with one research question in mind but after understanding the data more, they realise that there might be a more useful/relevant/beneficial research question they can answer. Will DataLoch allow for flexibility around the research proposal?

“There are two ways we can enable flexibility for exploratory research. Firstly, we do allow applications that are explicit about their exploratory nature. Applicants would then need to clarify the specific research nature when the exploration phase is complete. The second route a researcher could follow would be via an explicit proposal which, if it changes to accommodate a new research interest, would need a second research application to be approved.”

Work with patients and the public

Q: Of course, any use of patient and public data in research should include their views and input throughout the process. Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

“DataLoch have a public reference group which meets regularly and is part of our governance structure. The Public Reference Group have been extremely supportive of the DataLoch project and actually expressed some surprise that health data was not already more routinely linked. They are especially interested in ensuring easy to understand transparency of DataLoch’s work and that the data is being used in areas of public priority.

We would welcome volunteers to join this group, please contact us if you’re interested. We recognise that public perception does change over time and have a learning loop built into our structure that allows us to be sensitive to public opinion.”

Looking forward and reflecting back

Q: The Scottish Government’s initiative, Research Data Scotland, is trying to set up a similar model but on a Scotland wide level. Do you envisage DataLoch extending to other parts of Scotland in the future?

“Research Data Scotland have a fantastic national ambition and we would love to work with them as this idea develops. One of the points of difference for DataLoch is that we work at a regional level working closely with local clinicians to understand local context. This would be difficult to scale at a national level though there are interesting federation models that we would like to explore. Equally the granularity of the data we provide would be difficult to scale at a national level. It is also worth noting that our rich regional level data can contribute to answering some critical nationally-relevant questions.”

Q: What have been the biggest challenges you have faced when setting up DataLoch?

“Our greatest challenge has become a significant point of difference for us, we have created a virtual team that includes NHS clinicians, analysts and technical support as well as University of Edinburgh expertise. Working across boundaries has been tough but ultimately has created something unique and agile gaining the benefits that come with the expertise from both NHS and academia.”

Practical questions around using DataLoch data

Q: Where will researchers access the DataLoch data? Will remote access be an option? Safe settings are few and far between!

“Our preference is that researchers access the data via the National Safe Haven which can be done remotely.”

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

“The latest version is available on the website here. This dictionary corresponds to the current COVID-19 data set. Note that the website is under construction and some pages may move”

Q: Is the plan for DataLoch to continually update the data to contain the most recent data? Once again, this improves reproducibility of results and research transparency.

“Yes. DataLoch will be continually updating the data according to data source lag times, we will be managing version control to enable reproducibility of research as needed. We recommend researchers save their own coding and statistical analysis.”

Q: When can researchers hope to apply for and access the DataLoch?

“You can do so today, our current dataset is focused on COVID-19 but please do register for our DataLoch release newsflash via our website to receive updates.”