Working with administrative data in Scotland: A round up of researcher experiences
We’ve hit n = 5 in terms of eCRUSADer Researcher Experience posts! It’s not quite there in terms of a sample size for claiming any statistically significant findings but I thought that it was about time we took stock of them to see if there were any common themes emerging. So, that’s what this post will briefly do.
First, the key challenges that our researchers are outlined. Next, you’ll see some direct quotes taken from the posts- in particular lots of positive messages about carrying out research with administrative data in Scotland. Finally (and hopefully most usefully for you), a list of some ‘Top-Tips’ so that your administrative data journey runs as smoothly as possible!
There were some common challenges that popped up throughout the five researcher experience posts which I try to summarise below.
Timing Timing Timing!!!
This was (as expected) a clear theme that emerged in each of the researcher experience posts. In particular, the time taken between PBPP approval and data access.
Administrative datasets can be messy…
They aren’t made available to you in a ‘research ready’ format (even though a huge amount of work will have gone on behind the scenes to get them ready) and they don’t come with clearly defined data dictionaries.
ECR short term contracts
The nature of ECR work can means that we are often on short term contracts. Together with the issues around timing, this can have knock on consequences for our career trajectories if we don’t get access to the data in time.
But, it’s not all bad! Although there are real challenges involved in accessing and working with administrative data, each of the researchers we have heard from have agreed on the massive potential for administrative data in research that ultimately aims to improve outcomes for society. Here’s what they had to say:
“Administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society”
“There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients.”
“The ability to gain new insights from previously unseen data is something that should excite any researcher.”
“It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades.”
“Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!”
Top Tips and Solutions
Consider the time it can take to access the data and plan for this as far as possible
This is one of the issues that we are trying to shed some light on by putting together these researcher experience posts. It is rather tricky, not least because every project is different and has differing levels of complexity. However, there are some parts of the data acquisition process that are easier to plan for in terms of the time they will take. In particular, preparation of your PBPP application will probably take around 3-6 months. In terms of the time from submitting your application to the approval, this usually takes around 1-2 months. Knowing these timings means you can put them into funding applications etc. The harder bit is knowing how long it will take to get access to the data and we have heard from our researchers here that this can take up to three years! To try and understand how long things will take, it is well worth talking to your eDRIS coordinator about how frequently the datasets you have requested are linked for other projects. There may be some datasets that are harder to link than others or that have never been linked before. See if you can find any researchers who have previously worked with similar linked datasets and speak to them. They might have some good advice!
Have a plan B (and C!)
Unexpected things can (and probably will) crop up during your administrative data journey. And the longer things take, the more likely these unexpected events occur. The best thing you can do is have a back up plan. Better yet, have several! This may be using publicly available data, or settling for a subset of the datasets you have requested if there are particular hold ups with a specific dataset.
Prepare as much as you can before getting access to the data
There is actually a huge amount you can do whilst you are waiting for access to the data. You will still have to do a lot of data cleaning when you get access so one thing you can do is try and get as familiar as possible with the variables in the datasets you have requested. One idea might be to prepare a data dictionary (which includes codes) that you can ask to be transferred into the safe haven for when you begin analysis. You can also prepare some code for cleaning the data to some extent. For example, code to attach labels and value labels. You should also make sure you have done the relevant training (see the training section of the website for some useful links).
Acknowledge the limitations of administrative data
It is important to remember that administrative data has not been collected with research in mind. This can often mean that it wont contain all of the information you need to carry out the ‘perfect analysis’. What is important is that you are able to answer your research question with the administrative data, so be realistic. In some cases, it might be that your question would be better answered using survey data for example.
Invest in the relationships with the key people involved in the data access pipeline
Get to know the people who are assisting you with data access and speak to people who have knowledge of the datasets you are requesting. Also, do both of these early on!
I hope this was useful in giving a summary of the researcher perspective of accessing and using administrative data in Scotland. It occurs to me that I haven’t yet contributed my own Researcher Experience post. I should say that overall my experience has been largely similar to those we have heard about. I gave a talk on my experience recently at a useMYdata event (if you haven’t heard of them then do check out the great work they are doing!). You can find my slides and the recording from the event here. The event was particularly focused around the researcher’s journey in accessing routine health data and how patients themselves are (or can be) involved throughout the process.