Tag: data linkage

Working with administrative data in Scotland: A round up of researcher experiences

We’ve hit n = 5 in terms of eCRUSADer Researcher Experience posts! It’s not quite there in terms of a sample size for claiming any statistically significant findings but I thought that it was about time we took stock of them to see if there were any common themes emerging. So, that’s what this post will briefly do.

First, the key challenges that our researchers are outlined. Next, you’ll see some direct quotes taken from the posts- in particular lots of positive messages about carrying out research with administrative data in Scotland. Finally (and hopefully most usefully for you), a list of some ‘Top-Tips’ so that your administrative data journey runs as smoothly as possible!

Key Challenges

There were some common challenges that popped up throughout the five researcher experience posts which I try to summarise below.

    • Timing Timing Timing!!!

This was (as expected) a clear theme that emerged in each of the researcher experience posts. In particular, the time taken between PBPP approval and data access.

    • Administrative datasets can be messy…

They aren’t made available to you in a ‘research ready’ format (even though a huge amount of work will have gone on behind the scenes to get them ready) and they don’t come with clearly defined data dictionaries.

    • ECR short term contracts

The nature of ECR work can means that we are often on short term contracts. Together with the issues around timing, this can have knock on consequences for our career trajectories if we don’t get access to the data in time.

Key Messages

But, it’s not all bad! Although there are real challenges involved in accessing and working with administrative data, each of the researchers we have heard from have agreed on the massive potential for administrative data in research that ultimately aims to improve outcomes for society. Here’s what they had to say:

“Administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society”

“There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients.”

“The ability to gain new insights from previously unseen data is something that should excite any researcher.”

“It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades.”

“Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!”

Top Tips and Solutions

    • Consider the time it can take to access the data and plan for this as far as possible

This is one of the issues that we are trying to shed some light on by putting together these researcher experience posts. It is rather tricky, not least because every project is different and has differing levels of complexity. However, there are some parts of the data acquisition process that are easier to plan for in terms of the time they will take. In particular, preparation of your PBPP application will probably take around 3-6 months. In terms of the time from submitting your application to the approval, this usually takes around 1-2 months. Knowing these timings means you can put them into funding applications etc. The harder bit is knowing how long it will take to get access to the data and we have heard from our researchers here that this can take up to three years! To try and understand how long things will take, it is well worth talking to your eDRIS coordinator about how frequently the datasets you have requested are linked for other projects. There may be some datasets that are harder to link than others or that have never been linked before. See if you can find any researchers who have previously worked with similar linked datasets and speak to them. They might have some good advice!

    • Have a plan B (and C!)

Unexpected things can (and probably will) crop up during your administrative data journey. And the longer things take, the more likely these unexpected events occur. The best thing you can do is have a back up plan. Better yet, have several! This may be using publicly available data, or settling for a subset of the datasets you have requested if there are particular hold ups with a specific dataset.

    • Prepare as much as you can before getting access to the data

There is actually a huge amount you can do whilst you are waiting for access to the data. You will still have to do a lot of data cleaning when you get access so one thing you can do is try and get as familiar as possible with the variables in the datasets you have requested. One idea might be to prepare a data dictionary (which includes codes) that you can ask to be transferred into the safe haven for when you begin analysis. You can also prepare some code for cleaning the data to some extent. For example, code to attach labels and value labels. You should also make sure you have done the relevant training (see the training section of the website for some useful links).

    • Acknowledge the limitations of administrative data

It is important to remember that administrative data has not been collected with research in mind. This can often mean that it wont contain all of the information you need to carry out the ‘perfect analysis’. What is important is that you are able to answer your research question with the administrative data, so be realistic. In some cases, it might be that your question would be better answered using survey data for example. 

    • Invest in the relationships with the key people involved in the data access pipeline

Get to know the people who are assisting you with data access and speak to people who have knowledge of the datasets you are requesting. Also, do both of these early on!

I hope this was useful in giving a summary of the researcher perspective of accessing and using administrative data in Scotland. It occurs to me that I haven’t yet contributed my own Researcher Experience post. I should say that overall my experience has been largely similar to those we have heard about. I gave a talk on my experience recently at a useMYdata event (if you haven’t heard of them then do check out the great work they are doing!). You can find my slides and the recording from the event here. The event was particularly focused around the researcher’s journey in accessing routine health data and how patients themselves are (or can be) involved throughout the process.

A conversation with eDRIS: Part 1

The Electronic Data Research and Innovation Service (eDRIS) is a small team within Public Health Scotland set up to facilitate access to administrative data for research. Sometime back in the beginning of 2020, I was invited along to talk to eDRIS about eCRUSADers at one of their Development Days. My main hope from the talk was to introduce eDRIS to the eCRUSADers platform and work out if we could come up with any ideas for improving the journey that researchers and eDRIS go through together, when applying to use and using administrative records in Scotland.

Based on the Researcher Experience posts on eCRUSADers at the time (and to this day), as well as personal and published evidence, a common theme is the lengthy wait for data access. As researchers (especially ECRs who are often on temporary research contracts), it is vital that we make the best use of the time from initial contact with eDRIS, right up until data access and beyond. To do this, we need to make sure that our interactions with eDRIS are productive and efficient for both parties. My belief is that if we are to identify any areas where this journey can be improved, both parties need to understand more about one-another’s work and roles in the process.

So, on the back of my presentation to eDRIS, we chatted about the prospect of beginning to create this understanding, by putting together a couple of blog posts in conversation with eDRIS.

In this first post, I am incredibly grateful to have Jules, one of eDRIS’s Research Coordinators (RC), to describe what a typical day looks like. Jules talks through his morning and afternoon, giving us an idea of some of the daily tasks he is involved in and providing an insight into the emails and requests he receives throughout the day.

For me (as a researcher who has worked with a number of RCs on different projects), this insight was very useful and as I read about Jules’s day I had lots of further questions to ask. Jules has kindly offered to answer those questions and these will be posted in Part 2- so stay tuned!

But first off, let’s hear from Jules on his account of a day in the life of an RC. Not quite sure what an RC’s role is? Have a quick read here.

A day in the life of a Research Coordinator

For statistical disclosure control purposes (SDC), the names used here are fictional but the events described are based loosely on real incidents.

Morning

Check for new emails, only 10 from last night, great, not bad for my 25 projects! Ok, first job, do we have any SDCs… Yes, two researchers on different projects have output requests, which one first? I think I’ll do Helen’s first, she usually has done a good job of explaining the outputs and making sure there are no disclosure risks. On top of that, she only has health data, so only one data controller requirements to worry about, result! So, lets log in to the safe haven…. now, what is my password? Oh yes, the access path has changed, I need a new password. Oh well, let’s get that password reset first, that might take up to an hour and means I can’t do the other SDC.

Ok, what’s next in the Inbox. Ahh, a ‘quick question’ from James, this should be easy. Nope, he wants to add a Census variable, so…let’s check the existing permissions… Just as well, the Health and Social Care Public Benefit and Privacy Panel (HSC PBPP) and Statistics Public Benefit and Privacy Panel (SPBPP) end date is in two weeks! So, I need to ask James to submit an amendment to add the new Census variable, as well as extend the study date, so that means an amendment to SPBPP and HSC PBPP, and maybe get him to contact the National Records of Scotland (NRS) data access team to discuss if it’s possible first? Yes, that would be best. So, I’ll just email James…

Ping!

Uh-oh email from HR, my own information governance (IG) training needs refreshed, perfect timing! That reminds me, does anyone on James application need their own IG training refreshed…. yep, James and two others are about to expire. Let’s see what the data controllers accept as valid IG training… So, Census accept Safe Researcher Training (SRT) as valid for five years, but HSC PBPP have this as three years… so it’s about to expire as far as HSC PBPP are concerned… I may just ask them to do the online Medical Research Council course (MRC), as that’s quicker, and we worry about the SRT in two years’ time… So, lets email James.

“Dear James, thank you for your request to add a Census variable. The first thing to do would be to discuss feasibility with NRS, I have added their contact details below. Let me know if you need any help with your approach to them. I also noticed that your project permissions are due to expire, and some of your colleagues named on the form have IG training that is also about to expire, but only as far as HSC PBPP are concerned. Each of these changes needs to be recorded in the permissions, so we need to submit amendments to both SPBPP and HSC PBPP for: adding a new variable, extending the study duration and updating IG training. I think the best way to do this is to submit amendments to the PBPP panels for the end date and updated training, then, after you have got the go-ahead from NRS to add the new variable, we can process another amendment to add the variable, as this will take longer. Please let me know if that makes sense?”

Ping!

“Dear Jules, I can’t access the safe haven, please can you help? Thanks, Bob”

Now, is the safe haven down? Nope… So where is the issue for Bob, he didn’t say…

“Dear Bob, sorry you are having problems accessing the safe haven. Please can you let me know at which stage you are having the problem? If you can access the safe haven page, are you receiving the 2FA PIN? If not…”

Ping!

“Dear Jules, please ignore my last email, I wasn’t on the VPN, my mistake! I am in now. While I am here, please can you release the tables in my study area? These are quite urgent, and I need them today.
Thanks,
Bob”

Ok, delete my email draft. Now, do I have my own password yet… Nope. Ok Bob will have to wait, next email. Now, John wants to know where we are with his data sharing agreement. Which project is that? Oh yes, here it is, so… the data sharing agreement was sent back to the Shire Commissioners for signing three weeks ago, good question, where is that? Nothing from them…. so, lets send an email chasing it

“Dear Phyllis, Hope you are well. We have had a researcher chasing…”

Ping! 

“…the data sharing agreement for 1234-5678. We returned to you for review and signature three weeks ago, please can you let me know when you will be able to get to it? Thanks, Jules”

Ok, lets email John

“Hi John, apologies for the delay, we sent to the Shire Commissioners three weeks ago for signing…”

Ping!

” and I have contacted them to ask for an update, I will let you know as soon as I hear from them. Thanks, Jules”

Ok, where was I? No safe haven access, so no SDCs for now… so, lets check the task list… Next job is an amendment to add a researcher to Siobhan’s HSC PBPP. So this is 1.5, great, under the proportionate governance rules issued by HSC-PBPP I can process these myself.
Lunch!

Afternoon

Ok, let’s get back to it…

Ping!

An email from HSC PBPP to researcher:

“Dear Prof. Urquhart,
The HSC PBPP panel have reviewed your application and have some further questions for you before your application can be properly considered. Please provide responses below the listed queries, and return to us within two weeks:
1) Please provide a clear data flow diagram
2) Please provide a Data Privacy Impact Assessment or evidence that one is not needed. Your data protection officer should be able to offer advice.
3) Please provide evidence of public involvement in the research design
4) Please ensure your lay proposal is clearer to those with no experience of research
5) Please ensure anyone named in 1.1 to 1.5 of the PBPP form have valid IG training, there is a list in the ‘Guidance for Applicants’ available from the PBPP website.
…”

Ah this is a shame, but at least chimes with the advice I gave to the Prof. that the panel would likely pick up on these issues if we didn’t address them before submitting the application. With tight funding cycle deadlines I can sympathise with the desire to get something submitted very quickly, sadly this often creates more work, now where’s that template response… send, done.

Now, has my new Safe Haven password turned up? Nope. Ok, next

“Dear Jules,
In order to avoid SDC, please can I share my safe haven screen with my collaborators? I would only need to do this using Zoom, and with a small number of colleagues, so nothing would leave the safe haven.
Thanks,
Gary”

Oh dear…

“Dear Gary,
Please do not do this!
Sharing the safe haven screen is not allowed in any circumstances, whether screen shots, screen sharing or in person. As a reminder, these terms are detailed in the user agreement you signed and are also on the statements you accept every time you log in to the safe haven. Any outputs from the safe haven must be assessed for disclosure, please complete the request form to help speed these assessments up.
Let me know if you have any questions.
Thanks,
Jules”

Ping!

“Dear Jules,
I submitted a draft PBPP to you a few weeks ago. I know the data flow is missing, but this is because I don’t yet know what data I need. I was hoping you could just submit it anyway, to get the ball rolling.
Thanks,
Rachael”

Ok… where’s that template…

“Dear XXXX,
Please note I have not submitted your incomplete PBPP; if I had, the panel would have returned to us asking where the missing sections were. It saves time if the required sections are completed, as indicated in the ‘Guidance for applicants’ available from the PBPP website. I believe I have already provided the minimum recommended changes for the PBPP to be able to consider your application.
In this case, if the panel do not know what confidential data you are asking for, they cannot assess the risks to the privacy of the individuals in the datasets, as they don’t know which individuals you are asking for data on.
Please let me know if you have any further questions.
Thanks,
Jules”

Ok, last thing, do I have my password?.. Yes!!! Now let’s finally look at Bobs urgent SDC then Helen’s.

Ping!

“Dear Safe Haven user,
We have experienced some network issues which means we need to shut down the Safe Haven for the rest of today. The Safe Haven will be unavailable from 1530 today until 1000 tomorrow morning. Please save any work and log off.
We apologise for any inconvenience caused by this unexpected outage.
Regards,
The Safe Haven.”

What time is it??? 1528….

“Dear Bob,
Unfortunately, the Safe Haven has experienced an unexpected error and I am unable to look at your SDC request today.
Please also note that, as you have Census data, we need NRS to carry out checks and clear the outputs before we can check and release. I know you asked for the outputs today, I am afraid this is not possible; however, we will aim to have the outputs checked within our three-day turnaround target.

Apologies for the delays,
Thanks,
Jules”

I’m going home…oh wait, I am home. (Please note we have flexible working, not all staff finish at 3:30 pm)

The role of an eDRIS Research Coordinator

The two main researcher-facing roles are RCs and Analysts.

The RC role is primarily project management. RCs are assigned a number of projects that they are then responsible for. The essence of the role is to enable access to administrative datasets for researchers, where that access is granted in line with confidentiality laws (e.g. GDPR, Data Protection Act). The RC is there to provide a service to researchers to enable high quality research. In practical terms, this requires the RC to make sure they are aware of current procedures (rather than knowing the jurisprudence around the common law of confidentiality!), so we can provide researchers with the best approach to meeting each data controller’s requirements within a legal framework. There are often multiple data controllers (even within a single organisation) and each data controller has their own requirements (this is why we sometimes ask researchers to provide the same information in slightly different ways). The sheer number of datasets, each with the quirks of their respective data controllers, requires a great breadth of knowledge of the administrative data landscape. As well as projects where data are provided as part of the service, there are numerous projects where the applicants need permissions only, to do all sorts of things, ranging from setting up clinical trials to changing the way health audits are carried out.

The Analyst role is distinct from the RC role and is primarily tasked with creating the extracts for the researchers, although there are often discussions with analysts at early stages to determine feasibility of the requests. The eDRIS analysts have in-depth knowledge of many of the common health data sets, so are a good source of information, for both researchers and eDRIS RCs.
For statistical disclosure control purposes (SDC), the names used here are fictional but the events described are based loosely on real incidents.