Category: Researcher Experience

Working with administrative data in Scotland: A round up of researcher experiences

We’ve hit n = 5 in terms of eCRUSADer Researcher Experience posts! It’s not quite there in terms of a sample size for claiming any statistically significant findings but I thought that it was about time we took stock of them to see if there were any common themes emerging. So, that’s what this post will briefly do.

First, the key challenges that our researchers are outlined. Next, you’ll see some direct quotes taken from the posts- in particular lots of positive messages about carrying out research with administrative data in Scotland. Finally (and hopefully most usefully for you), a list of some ‘Top-Tips’ so that your administrative data journey runs as smoothly as possible!

Key Challenges

There were some common challenges that popped up throughout the five researcher experience posts which I try to summarise below.

    • Timing Timing Timing!!!

This was (as expected) a clear theme that emerged in each of the researcher experience posts. In particular, the time taken between PBPP approval and data access.

    • Administrative datasets can be messy…

They aren’t made available to you in a ‘research ready’ format (even though a huge amount of work will have gone on behind the scenes to get them ready) and they don’t come with clearly defined data dictionaries.

    • ECR short term contracts

The nature of ECR work can means that we are often on short term contracts. Together with the issues around timing, this can have knock on consequences for our career trajectories if we don’t get access to the data in time.

Key Messages

But, it’s not all bad! Although there are real challenges involved in accessing and working with administrative data, each of the researchers we have heard from have agreed on the massive potential for administrative data in research that ultimately aims to improve outcomes for society. Here’s what they had to say:

“Administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society”

“There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients.”

“The ability to gain new insights from previously unseen data is something that should excite any researcher.”

“It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades.”

“Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!”

Top Tips and Solutions

    • Consider the time it can take to access the data and plan for this as far as possible

This is one of the issues that we are trying to shed some light on by putting together these researcher experience posts. It is rather tricky, not least because every project is different and has differing levels of complexity. However, there are some parts of the data acquisition process that are easier to plan for in terms of the time they will take. In particular, preparation of your PBPP application will probably take around 3-6 months. In terms of the time from submitting your application to the approval, this usually takes around 1-2 months. Knowing these timings means you can put them into funding applications etc. The harder bit is knowing how long it will take to get access to the data and we have heard from our researchers here that this can take up to three years! To try and understand how long things will take, it is well worth talking to your eDRIS coordinator about how frequently the datasets you have requested are linked for other projects. There may be some datasets that are harder to link than others or that have never been linked before. See if you can find any researchers who have previously worked with similar linked datasets and speak to them. They might have some good advice!

    • Have a plan B (and C!)

Unexpected things can (and probably will) crop up during your administrative data journey. And the longer things take, the more likely these unexpected events occur. The best thing you can do is have a back up plan. Better yet, have several! This may be using publicly available data, or settling for a subset of the datasets you have requested if there are particular hold ups with a specific dataset.

    • Prepare as much as you can before getting access to the data

There is actually a huge amount you can do whilst you are waiting for access to the data. You will still have to do a lot of data cleaning when you get access so one thing you can do is try and get as familiar as possible with the variables in the datasets you have requested. One idea might be to prepare a data dictionary (which includes codes) that you can ask to be transferred into the safe haven for when you begin analysis. You can also prepare some code for cleaning the data to some extent. For example, code to attach labels and value labels. You should also make sure you have done the relevant training (see the training section of the website for some useful links).

    • Acknowledge the limitations of administrative data

It is important to remember that administrative data has not been collected with research in mind. This can often mean that it wont contain all of the information you need to carry out the ‘perfect analysis’. What is important is that you are able to answer your research question with the administrative data, so be realistic. In some cases, it might be that your question would be better answered using survey data for example. 

    • Invest in the relationships with the key people involved in the data access pipeline

Get to know the people who are assisting you with data access and speak to people who have knowledge of the datasets you are requesting. Also, do both of these early on!

I hope this was useful in giving a summary of the researcher perspective of accessing and using administrative data in Scotland. It occurs to me that I haven’t yet contributed my own Researcher Experience post. I should say that overall my experience has been largely similar to those we have heard about. I gave a talk on my experience recently at a useMYdata event (if you haven’t heard of them then do check out the great work they are doing!). You can find my slides and the recording from the event here. The event was particularly focused around the researcher’s journey in accessing routine health data and how patients themselves are (or can be) involved throughout the process.

Researcher Experience: Dr Feifei Bu

In this first Research Experience post of 2021 we hear from Dr Feifei Bu, Senior Research Fellow in the Department of Behavioural Science and Health at the University College London (UCL). Feifei first started working with administrative data in 2014 when she worked with the National Pupil Database linked to Understanding Society survey data (UK Household Longitudinal Study). In 2015, she joined the University of Stirling and started working on projects that were using administrative extensively. In particular, she worked with Scottish Morbidity Record (SMR) data linked with the Social Care Survey (now Source) and Healthy Ageing in Scotland (HAGIS). From there, her interest in carrying out research using administrative data continued into her current position at UCL where she has worked with Hospital Episode Statistics (HES) linked with English Longitudinal Study of Ageing (ELSA). She has also worked with de-identified Whole Systems Integrated Care (WSIC) data. All in all, Feifei has been carrying out research using administrative datasets for around seven years.

Overview of my research

My work using administrative data has been mainly around health service utilisation. Collaborating with colleagues from Stirling and Dundee, we had looked at the cost of hospital admissions for people with cognitive spectrum disorders using SMR data. In 2019, I worked on a project on the relationships between social factors and health outcomes amongst older adults using ELSA linked with HES. We looked at how loneliness and social isolation were associated with the risk of hospitalisation related to fall, cardiovascular disease and respiratory disease respectively. More recently, I led a project looking at how patient activation (a measure of people’s knowledge, skills and confidence to manage their own health and wellbeing) was related to the usage of different health care services, including GP and non-GP primary care, elective and emergency inpatient admissions, outpatient and A&E attendances. At the moment, I am involved in an ESRC funded project looking at how indoor temperature is related to secondary care health service utilisation using ELSA linked with HES.

Summary of any challenges faced

Unlike survey data that are usually thoroughly cleaned and well documented, administrative data often require some extra work. Based on my own experience, for example, the episode order variable comes with the SMR or HES data cannot be taken for granted. In some cases, it could be important to further sort them into the correct order. Also, it may take some detective work to find out what a specific variable measures or how data were collected in practice and by who—this could be critical for data interpretation.

A unique strength of administrative data is that they offer objective and detailed measures that are usually unavailable in surveys. However, as these data were not collected for research purposes, there is often a lack of other critical information that we would like to take into account in our research. If data linkage is not possible, this is an even tougher challenge than the one above.

Due to data protection purposes, administrative data often need to be analysed in a safe setting, like a data safe haven. This can usually be accessed via a remote desktop connection, but in some cases, you might need to go to a secure access point that is not necessarily local. This will slow down your progress significantly. Some administrative data are stored in data warehouses, in which case researchers need to extract data that are relevant to them using programming language, like SQL. In other instances, researchers may not have access to the data warehouse directly and data extraction need to be done by a data analyst. This would require a lot of planning ahead as well as communication back and forth. Finally, data access is time-limited in most cases. It may ‘expire’ before getting everything published. This is something that needs to be taken into account when applying for data access.

Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!

Thoughts for fellow and future eCRUSADers

As previous Researcher Experience posts have mentioned already, the access application can take a long time to go through. It is important to plan ahead especially if you are on a tight schedule—either for your PhD or other funded projects.

It is important to acknowledge the limitations of administrative data, in particular, the lack of critical information that need to be ‘controlled for’ in analyses. We should not rule out the possibility that survey data may serve our research purposes better. Here is a note to myself, and to be shared with eCRUSADers: our passion for data should not outweigh a solid research design.

Public Benefit Privacy Panel Timelines

Project: Social Care Survey linked to Scottish Morbidity Record

Preparation of PBPP application: – December 2015- April 2016 (approximately 4 months)

Submission to initial PBPP approval: April 2016 – August 2016 (approximately 4 months)

PBPP approval to data access: August 2016 – April 2018 (approximately 2 years)

Publications using administrative data

Bu, F., Abell, J., Zaninotto, P., & Fancourt, D. (2020). A longitudinal analysis of loneliness, social isolation and falls amongst older people in EnglandSci Rep, 10 (1), 20064. doi:10.1038/s41598-020-77104-z

Bu, F., Zaninotto, P., & Fancourt, D. (2020). Longitudinal associations between loneliness, social isolation and cardiovascular eventsHeart. doi:10.1136/heartjnl-2020-316614

Bu, F., Philip, K., & Fancourt, D. (2020). Social isolation and loneliness as risk factors for hospital admissions for respiratory disease among older adultsThorax. doi:10.1136/thoraxjnl-2019-214445

Hapca, S., Guthrie, B., Cvoro, V., Bu, F., Rutherford, A. C., Reynish, E., & Donnan, P. T. (2018). Mortality in people with dementia, delirium, and unspecified cognitive impairment in the general hospital: prospective cohort study of 6,724 patients with 2 years follow-upClin Epidemiol, 10, 1743-1753. doi:10.2147/CLEP.S174807

Researcher Experience: Dr Drew Altschul

With all that has been going on it has been a wee while since we heard from a researcher who is in the thick of working with administrative data in Scotland. In this Researcher Experience post, we hear from Dr Drew Altshul, Research Associate in the Department of Psychology at the University of Edinburgh, who has been navigating the administrative data landscape in Scotland for around two years. Drew works with a large linked data set of the Scottish Mental Survey 1947, 36 day sample, Scottish Longitudinal Study (census data), Prescribing Information System (PIS) and Scottish Morbidity Record for Mental Health Inpatient and Day Case (SMR04).

In Drew’s account of working with administrative data, the familiar challenges of timing, unforeseen circumstances and working in the safe setting, rear their heads. However, like the other researchers we have heard from, the ‘seeing the glass half full’ attitude and optimism for the need to press on in spite of these challenges endures. In particular, Drew points out the useful discoveries him and his colleagues made whilst waiting for data access, which would ultimately improve their research output in the long run. I think this point rings true for me especially, after all, eCRUSADers wouldn’t exist if it weren’t for the wait for data.

Over to you Drew:

Overview of my research

I’ve yet to do much of work with our main variables of interest, as we only recently were granted access to a few of the data sets we requested. However, while we were working on obtaining and waiting for access we followed some side avenues in part to prepare ourselves for working with the data, and in part because we thought of research questions that we thought were interesting in their own right. For example, we are interested in how early life socioeconomic conditions, commonly represented by the father’s occupational social class, relate to mental health later on in life. However, our data set is based on the participants of the Scottish mental survey 1947; these individuals were all born in 1936, and because of World War II, reports of fathers’ occupations from censuses carried out during participants’ early lives are unreliable, not representative, and often missing. In order to improve on our data set, we dug deeper into the data we were aiming to link, pulling out additional, historical occupation information, and coding these data ourselves. This in turn lead to a machine learning approach to classifying historical social class data, which can be used in the future by people working with historical social class data. So it goes to show how much interesting, useful work you can wind up doing along the way!

Summary of any challenges faced

The process is long and convoluted, and at seemingly every turn. I was fortunate because I joined the project relatively late, although when I joined we thought we would have access to the data in a few months’ time, rather than two years later. I did what I could to help with the application processes, but ultimately this work predominantly falls on the shoulders of a single person, and most of one’s time in this area is not spent working on forms, but waiting for other people to get back to you.

A large amount of time and effort goes into processing and preparing data before linkage, but that does not mean that the data are clean and easy to work with once you get a hold of them. You are likely going to need to spend significant time cleaning and otherwise processing your data before you can analyse them.

There are advantages to having to layout analyses in advance during the application process: essentially, this forces you to pre-register your work, which is an important step in doing reproducible science. However, a run-of-the-mill pre-registration has considerable flexibility, and this is not so much the case with the analyses we plan for our data. All output must be checked for privacy and security concerns, so if we want to tweak an analysis or run a sensitivity analysis, for instance at the request of a reviewer, every different analysis that we want to take out of the safe haven environment needs to be checked, and that process can take weeks.

Thoughts for fellow and future eCRUSADers

You ought to think very carefully about timing, in particular you ought to expect significant delays. If possible, try to plan for multiple scenarios, and make sure you have meaningful work you can do while you wait out the access process. The processes for accessing data are supposedly being streamlined and improving, but it is worth investing in your relationships with the people along the data access pipeline, as they are best served to help you manage your expectations.

It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades. These are types of data that sometimes cannot be obtained in any other way, and this allows for novel, meaningful research questions to be asked and answered.

Public Benefit Privacy Panel Timelines

Preparation of PBPP application: 01/06-2018 – 21/08/2018 (about 12 weeks)

Submission to initial PBPP approval: 05/10/2018 (about 12 weeks)

PBPP approval to data access: 16/06/2020 (about 1 year and 6 months)

Researcher Experience: Dr David Henderson

It’s a new year and this week we hear from a new researcher, namely, Dr David Henderson. David is a Research Fellow at Edinburgh Napier University and Scottish Centre for Administrative Data Research (SCADR). He is no new face to the eCRUSADers scene and has built up a wealth of knowledge and expertise in the administrative data sets he has worked with over the last four years. In particular, David has worked closely with the Scottish Social Care Survey (SCS), both at local (Renfrewshire Council) and national level. His PhD work utilised the national SCS linked to Prescribing Information System data, Unscheduled Care Data Mart and the NHS Central Register. Additionally, David has worked with the Scottish Programme for Improving Clinical Effectiveness in Primary Care (SPICE – PC) data.

In this post, David describes his PhD work and provides an outstanding demonstration of the wealth of knowledge that research using administrative data can offer. He also gives us an insight into some of the unexpected externalities that can significantly impact project timescales, but which are hard to plan for. Similarly to our previous Researcher Experience posts from Dr Catherine Hanna and Matthew Iveson, David highlights timing as one of the major difficulties he has experienced throughout his research career using administrative data.

David’s positivity emanates throughout this blog post and he does an excellent job at echoing the feelings that I hear time and time again from researchers in this area. Those are, a genuine understanding of the need for the legal processes in place to protect patient data, coupled with frustrations with the parts of the processes which inhibit researchers abilities to use this data to its full potential, all together with a positive attitude that things are slowly but surely improving. As David points out, things are changing in Scotland and we look forward to hearing very soon from the Chief Statisticain Roger Halliday, on the Scottish Government’s plans for the new Research Data Scotland


Brief overview of my research

Using the linked data set described above, the focus of my research has been investigating the association between multimorbidity (more than one long-term condition) and social care receipt. I am also analysing interactions between health and social care services, with a particular interest in unscheduled care.

Good social care data has been difficult to come by in the past – not just in Scotland, but internationally. I have been lucky to be one of the first group of researchers to get access to the Social Care Survey collected by the Scottish Government in a format that can be linked to health-based data sources.

So far, provisional results show us that increasing age and severity of multimorbidity are associated with higher social care receipt. This was anticipated, but we have never been able to show it empirically before the cross-sectoral linkage.

We have also been able to describe the receipt of social care by socioeconomic position (SEP) using the Scottish Index of Multiple Deprivation (SIMD). This is new and, to my knowledge, hasn’t been described elsewhere on such a large scale. Here we find that those with lower SEP are more likely to receive social care. (All these patterns are shown in the figure below). However, due to a lack of good measures, we can’t tell if the provision of care matches need for care.

My latest piece of work has been looking at whether receipt of social care influences unplanned admission to hospital. Using time-to-event (survival) analysis we can see that, for those over 65, people who receive social care are twice as likely to have an unplanned admission (again these results are provisional at the moment).

© David Henderson
© David Henderson

Summary challenges faced 

The barriers I have faced are, no doubt, similar to others using linked data -the main one being time. Approvals, extraction, linkage etc. all takes considerable time and as a researcher you are not in control of these timescales. A good example is shown by a sub-project for my PhD which was to use social care data from one local authority area only. The council in question were exceptionally helpful and keen to share data. They were very patient whilst I organised ethics and approvals on the academic side. However, by the time I was ready to talk data sharing agreements they had operational pressures (specifically the 2017 local elections) which tied up their legal team. After this we were all hopeful about making progress, but a certain Prime Minister went for a walk in the woods at Easter and decided to call a general election! Cue another 6-week delay until the legal team could start negotiating an agreement. We eventually got there but this illustrates that the data controllers are at the mercy of higher forces as well and it is impossible to set meaningful deadlines.

I am very fortunate to be in a position to keep working with my PhD data in my current role and keep asking questions of the large amount of data we have. However, I have moved university in order to this. This means I now have to repeat the process of ethics, data sharing agreements, privacy impact assessments etc. This is absolutely necessary as my current employers need to make sure that all legal aspects are covered, but there is nothing more soul-destroying than recreating the (significant) amount of work that goes into the required forms (initially completed two years previously). Fortunately, work is afoot at the Scottish Government to make this process obsolete and centralise access to research data sets – however this is still in early stages and we are currently unsure as when this will be operational or what exactly will be available. For now, the pain must endure!

My reflections

Although there are difficulties in using administrative data for research purposes and delays can be frustrating at times, it is still (incredibly) a really rewarding process. The ability to gain new insights from previously unseen data is something that should excite any researcher. More importantly, data linkage offers the potential to improve society by answering questions that can’t be asked with traditional methods. Well worth an extra ethics form (even if I grumble about it!).

Public Benefit Privacy Panel Timelines

Preparation of PBPP application: Around 3 months

Submission to initial PBPP approval: Around 1 month

PBPP approval to (initial) data access: Around 6 months (plus 3 months for social care data)

Researcher Experience: Dr Catherine Hanna

This week we hear from Dr Catherine Hanna, Research Fellow and PhD student (Cancer Research UK Clinical Trials Fellowship) at the Institute of Cancer Sciences, University of Glasgow. For about one year, Catherine has been working with Greater Glasgow and Clyde (GG&C) Chemocare data linked to both Scottish Cancer Registry (SMR06) and Cancer Quality Performance Indicators (QPI) data. She also has approval from the Public Benefit Privacy Panel (PBPP) (approval granted in June 2018) to obtain a national linked cancer data set for her project. She is currently awaiting access to this data.

In this post, Catherine tells us a bit about her research and what she has done with the GGC data, as well as the challenges she has faced in terms of applying for and getting access to the national data. 

Brief overview of Catherine’s research

My research investigates how we can assess the impact of oncology clinical trials. It is important to be able to demonstrate that trials testing new oncology treatments are having real life impacts such as changing practice, changing health and saving money. Analysing this impact helps us to identify which trials are making real world differences, and subsequently, to design more impactful trials in the future.

I am conducting a case study to assess the impact of the Short Course Oncology Treatment (SCOT) trial (1). This study investigated if treating patients with a diagnosis of colorectal cancer with 3 months of chemotherapy following surgery was non-inferior to treating with 6 months of chemotherapy. The trial results have shown that giving a shorter duration of treatment does not make a significant difference to the percentage of patients who are disease free at 3 years. Patients in the 3 month arm of the trial also had significantly less side effects from the treatment, especially with regards to peripheral nerve damage.

Gaining access to GGC Chemocare data, linked to QPI and SMR06 data sets, has enabled me to assess the impact of the SCOT trial on changing clinical practice. There was a significant change in prescribing practices for patients with colorectal cancer after the results of the SCOT trial were publicised. This will translate to a cost saving for the GGC health board and will result in less patients in GGC experiencing debilitating peripheral nerve damage as a result of their adjuvant chemotherapy treatment. A poster with the preliminary results of this analysis was presented at National Cancer Research Institute 2018.

In the next stages of my project, I plan to investigate the impact of the SCOT trial on prescribing on a national scale by using routinely collected chemotherapy data from the three cancer networks in Scotland (South East Scotland (SCAN), West of Scotland (WOSCAN) and North of Scotland (NOSCAN)). My project is running alongside, and will be using a sub-set of, the COloRECTal Repository (CORECT-R) data at the University of Edinburgh (part of an even wider project at the University of Leeds).  PBPP approval for my project was granted in June 2018, however, I do not yet have access to this data.  Below, I outline some of the lessons I have learned during the application to access this national data.

Summary challenges faced 

(1) When data is on databases out with Information Services Division (ISD), often at a local or regional level, this makes the process of data linkage more challenging and costly. Often, there is not the expertise at a local level to extract and transfer data and working relationships between local analysts and those coordinating data linkage centrally do not exist. Specifically, there are few examples of previous linkage of chemotherapy prescribing data (held locally) on a national scale.

(2) Data linkage requires a pre-specified list of the data variables from each data set. Often these lists are not publicly available, or even defined, and it can be time consuming and difficult to generate variable lists which are required for the data linkage process.

(3) Evidence of funding to perform data linkage and make use of national linkage services is often required for PBPP approval. However, depending on the time between submission and the data linkage occurring, it can be several years before the funds are used.

(4) If a researcher is funded for a specified period, the time taken for PBPP approval and data acquisition means that the researcher may not have an opportunity to analyse the data. There is also a risk that the research question will be less relevant than at the time of submission.

My reflections

There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients. The potential to pioneer the use of routine data for research purposes in Scotland is obvious; however, the practicalities of currently accessing and using this data are not straightforward.

My advice for anyone planning to work with national Scottish data, based on my experience:

  • Apply for access to data early and be aware that data acquisition may take longer than expected depending on your project.
  • Think about the costs of data linkage, especially if you want to link data sets that are not currently stored in ISD. The size and subsequent cost of a data linkage project is often based on the number of databases used (especially those outside ISD), rather than on the size of the finalised database.
  • Define which variables from the data set you will require early and be clear why you require each variable for your analysis.

Public Benefit Privacy Panel Timelines

Preparation of PBPP application:

Submission to initial PBPP approval: April 2018- October 2018 (around 6 months)

PBPP approval to (initial) data access: October 2018- June 2020 (around 1 year and 8 months)

  1. Iveson TJ, Kerr RS, Saunders MP, Cassidy J, Hollander NH, Tabernero J, et al. 3 versus 6 months of adjuvant oxaliplatin-fluoropyrimidine combination therapy for colorectal cancer (SCOT): an international, randomised, phase 3, non-inferiority trial. The Lancet Oncology. 2018;19(4):562-78.

Researcher Experience: Matthew Iveson

Our first Researcher Experience post is from Matthew Iveson, Senior Data Scientist at the University of Edinburgh. Matthew has been working with Scottish administrative records for about four years. Data sets he has worked with include Scottish Morbidity Records, Scottish Census, Prescribing Information System, NHS Central Register, NRS Births, Deaths and Marriages, Scottish Stroke Care Audit. He has also worked with the Scottish Longitudinal Study, a set of pre linked administrative data sets. We asked Matthew to tell us a bit about his research and the routine data he has worked with, what he saw were some of the key challenges in accessing and using administrative records, and to offer his thoughts to early career researchers hoping to work with this kind of data. 

Brief overview of Matthew’s research

My work has mainly focused around using data linkage to reconstruct the life-courses of individuals who took part in the Scottish Mental Survey 1947, a nation-wide survey of age-11 thinking skills conducted in Scottish Schools in 1947. These individuals, now aged over 80 years-old, have experienced a lifetime of changes in health and socioeconomic circumstances, and are an extremely important opportunity for examining how early-life circumstances can have a lasting impact on health and wellbeing across the life course. 

So far, I have used linked data to show that individuals with higher childhood cognitive ability, better socioeconomic circumstances and more education are less likely to die, less likely to report a long-term function-limiting illness in older age, more likely to be economically active in later life, more likely to retire later and of their own volition, and so on. I’ve also tried to establish the mechanisms by which childhood advantage affects health and wellbeing. I am currently waiting for data to examine whether factors from across the life course can be used to predict whether someone will require care in later life (including the type of care required), how well individuals can recover from a stroke, and whether someone will respond to a given antidepressant medication. 

Summary of challenges faced

One of the biggest issues I faced was in terms of timing. In some instances I have been waiting over 3 years for data. There have been several delays along the way, due to changes to the data access process (both over time and between organisations), queues for submitting forms to data controllers, changes to the legal landscape for data sharing (such as GDPR) and loss of submitted paperwork. The problem is that these delays are relatively common, and they result in a timescale that is not achievable under normal funding conditions. Since most early-career researchers find themselves on short-term contracts, they risk not getting data before their contracts expire, and since they are judged more than most on their productivity, these delays can seriously hamper a researcher’s career trajectory. 

The delays also highlight the fragility of the data access process. Getting to know key people in each organisation is one of the best ways to get through the process smoothly, but if these people leave their expertise often go with them. One example is that, during my project, the lawyer in charge of reviewing requests for census data left. Their replacement was understandably less confident about data sharing, and decided to re-review the laws surrounding the use of census data for research. Data controllers and other involved organisations need to ensure that knowledge and expertise are distributed across their teams, and need to invest in the infrastructure and staff that can ensure a robust system for the future.  

Thoughts for early-career researchers 

While organisations need to make things easier, researchers themselves need to manage their own expectations – gaining access to routinely-collected data, especially linked data, takes a very significant amount of time and effort. It’s worth planning well in advance and making sure that you can stay busy and productive while you wait for data to arrive. It’s also worth thinking about pre-linked datasets such as the Scottish Longitudinal Study if you’re short on time. Regardless of how you engage with routinely collected data and how long it takes, bear in mind that you’re learning an incredibly rare and valuable set of skills. Things are slowly getting better, faster and easier, but organisations are still fine-tuning their processes and a lot of the data is still new to the research scene. If you do have the time – and the perseverance – then administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society. 

Public Benefit Privacy Panel Timelines

  1. Project: Childhood cognitive function and later-life recovery: Linking the Scottish Mental Survey 1947 to healthcare and administrative data.
    a. Preparation of health PBPP application: 01/08/2016 – 09/04/2017 (approximately 8 months)
    b. Submission to initial health PBPP approval: 10/04/2017 – 25/05/2017 (approximately 1 month)
    c. Health PBPP approval to data access: 25/05/2017 – ongoing
    d. Note: this last delay is on StatsPBPP’s end (i.e., census team), rather than health PBPP
  2. Project: Childhood cognitive function and use of long-term care across the life course: Linking the Scottish Mental Survey 1947 to healthcare and administrative data.
    a. Preparation of health PBPP application: 01/08/2016 – 09/04/2017 (approximately 8 months)
    b. Submission to initial health PBPP approval: 12/04/2017 – 25/05/2017 (approximately 1 month)
    c. Health PBPP approval to data access: 25/05/2017 – ongoing
    d. Note: this last delay is again on StatsPBPP’s end (i.e., census team), rather than health PBPP
  3. Project: Mental health within the family and between generations – Phase 1: Linking the Scottish Mental Survey 1947 cohort to mental health outcomes
    a. Preparation of health PBPP application: 27/06/2018 – 18/09/2018 (approximately 3 months)
    b. Submission to initial health PBPP approval: 19/09/2018 – 05/10/2018 (approximately 1 month)
    c. Health PBPP approval to data access: 05/10/2018 – 16/06/2020 (approximately 1 year and 8 months)