Category: Researcher Experience

Researcher Experience: Dr Drew Altschul

With all that has been going on it has been a wee while since we heard from a researcher who is in the thick of working with administrative data in Scotland. In this Researcher Experience post, we hear from Dr Drew Altshul, Research Associate in the Department of Psychology at the University of Edinburgh, who has been navigating the administrative data landscape in Scotland for around two years. Drew works with a large linked data set of the Scottish Mental Survey 1947, 36 day sample, Scottish Longitudinal Study (census data), Prescribing Information System (PIS) and Scottish Morbidity Record for Mental Health Inpatient and Day Case (SMR04).

In Drew’s account of working with administrative data, the familiar challenges of timing, unforeseen circumstances and working in the safe setting, rear their heads. However, like the other researchers we have heard from, the ‘seeing the glass half full’ attitude and optimism for the need to press on in spite of these challenges endures. In particular, Drew points out the useful discoveries him and his colleagues made whilst waiting for data access, which would ultimately improve their research output in the long run. I think this point rings true for me especially, after all, eCRUSADers wouldn’t exist if it weren’t for the wait for data.

Over to you Drew:

Overview of my research

I’ve yet to do much of work with our main variables of interest, as we only recently were granted access to a few of the data sets we requested. However, while we were working on obtaining and waiting for access we followed some side avenues in part to prepare ourselves for working with the data, and in part because we thought of research questions that we thought were interesting in their own right. For example, we are interested in how early life socioeconomic conditions, commonly represented by the father’s occupational social class, relate to mental health later on in life. However, our data set is based on the participants of the Scottish mental survey 1947; these individuals were all born in 1936, and because of World War II, reports of fathers’ occupations from censuses carried out during participants’ early lives are unreliable, not representative, and often missing. In order to improve on our data set, we dug deeper into the data we were aiming to link, pulling out additional, historical occupation information, and coding these data ourselves. This in turn lead to a machine learning approach to classifying historical social class data, which can be used in the future by people working with historical social class data. So it goes to show how much interesting, useful work you can wind up doing along the way!

Summary of any challenges faced

The process is long and convoluted, and at seemingly every turn. I was fortunate because I joined the project relatively late, although when I joined we thought we would have access to the data in a few months’ time, rather than two years later. I did what I could to help with the application processes, but ultimately this work predominantly falls on the shoulders of a single person, and most of one’s time in this area is not spent working on forms, but waiting for other people to get back to you.

A large amount of time and effort goes into processing and preparing data before linkage, but that does not mean that the data are clean and easy to work with once you get a hold of them. You are likely going to need to spend significant time cleaning and otherwise processing your data before you can analyse them.

There are advantages to having to layout analyses in advance during the application process: essentially, this forces you to pre-register your work, which is an important step in doing reproducible science. However, a run-of-the-mill pre-registration has considerable flexibility, and this is not so much the case with the analyses we plan for our data. All output must be checked for privacy and security concerns, so if we want to tweak an analysis or run a sensitivity analysis, for instance at the request of a reviewer, every different analysis that we want to take out of the safe haven environment needs to be checked, and that process can take weeks.

Thoughts for fellow and future eCRUSADers

You ought to think very carefully about timing, in particular you ought to expect significant delays. If possible, try to plan for multiple scenarios, and make sure you have meaningful work you can do while you wait out the access process. The processes for accessing data are supposedly being streamlined and improving, but it is worth investing in your relationships with the people along the data access pipeline, as they are best served to help you manage your expectations.

It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades. These are types of data that sometimes cannot be obtained in any other way, and this allows for novel, meaningful research questions to be asked and answered.

Public Benefit Privacy Panel Timelines

Preparation of PBPP application: 01/06-2018 – 21/08/2018 (about 12 weeks)

Submission to initial PBPP approval: 05/10/2018 (about 12 weeks)

PBPP approval to data access: 16/06/2020 (about 1 year and 6 months)

Researcher Experience: Dr David Henderson

It’s a new year and this week we hear from a new researcher, namely, Dr David Henderson. David is a Research Fellow at Edinburgh Napier University and Scottish Centre for Administrative Data Research (SCADR). He is no new face to the eCRUSADers scene and has built up a wealth of knowledge and expertise in the administrative data sets he has worked with over the last four years. In particular, David has worked closely with the Scottish Social Care Survey (SCS), both at local (Renfrewshire Council) and national level. His PhD work utilised the national SCS linked to Prescribing Information System data, Unscheduled Care Data Mart and the NHS Central Register. Additionally, David has worked with the Scottish Programme for Improving Clinical Effectiveness in Primary Care (SPICE – PC) data.

In this post, David describes his PhD work and provides an outstanding demonstration of the wealth of knowledge that research using administrative data can offer. He also gives us an insight into some of the unexpected externalities that can significantly impact project timescales, but which are hard to plan for. Similarly to our previous Researcher Experience posts from Dr Catherine Hanna and Matthew Iveson, David highlights timing as one of the major difficulties he has experienced throughout his research career using administrative data.

David’s positivity emanates throughout this blog post and he does an excellent job at echoing the feelings that I hear time and time again from researchers in this area. Those are, a genuine understanding of the need for the legal processes in place to protect patient data, coupled with frustrations with the parts of the processes which inhibit researchers abilities to use this data to its full potential, all together with a positive attitude that things are slowly but surely improving. As David points out, things are changing in Scotland and we look forward to hearing very soon from the Chief Statisticain Roger Halliday, on the Scottish Government’s plans for the new Research Data Scotland


Brief overview of my research

Using the linked data set described above, the focus of my research has been investigating the association between multimorbidity (more than one long-term condition) and social care receipt. I am also analysing interactions between health and social care services, with a particular interest in unscheduled care.

Good social care data has been difficult to come by in the past – not just in Scotland, but internationally. I have been lucky to be one of the first group of researchers to get access to the Social Care Survey collected by the Scottish Government in a format that can be linked to health-based data sources.

So far, provisional results show us that increasing age and severity of multimorbidity are associated with higher social care receipt. This was anticipated, but we have never been able to show it empirically before the cross-sectoral linkage.

We have also been able to describe the receipt of social care by socioeconomic position (SEP) using the Scottish Index of Multiple Deprivation (SIMD). This is new and, to my knowledge, hasn’t been described elsewhere on such a large scale. Here we find that those with lower SEP are more likely to receive social care. (All these patterns are shown in the figure below). However, due to a lack of good measures, we can’t tell if the provision of care matches need for care.

My latest piece of work has been looking at whether receipt of social care influences unplanned admission to hospital. Using time-to-event (survival) analysis we can see that, for those over 65, people who receive social care are twice as likely to have an unplanned admission (again these results are provisional at the moment).

© David Henderson
© David Henderson

Summary challenges faced 

The barriers I have faced are, no doubt, similar to others using linked data -the main one being time. Approvals, extraction, linkage etc. all takes considerable time and as a researcher you are not in control of these timescales. A good example is shown by a sub-project for my PhD which was to use social care data from one local authority area only. The council in question were exceptionally helpful and keen to share data. They were very patient whilst I organised ethics and approvals on the academic side. However, by the time I was ready to talk data sharing agreements they had operational pressures (specifically the 2017 local elections) which tied up their legal team. After this we were all hopeful about making progress, but a certain Prime Minister went for a walk in the woods at Easter and decided to call a general election! Cue another 6-week delay until the legal team could start negotiating an agreement. We eventually got there but this illustrates that the data controllers are at the mercy of higher forces as well and it is impossible to set meaningful deadlines.

I am very fortunate to be in a position to keep working with my PhD data in my current role and keep asking questions of the large amount of data we have. However, I have moved university in order to this. This means I now have to repeat the process of ethics, data sharing agreements, privacy impact assessments etc. This is absolutely necessary as my current employers need to make sure that all legal aspects are covered, but there is nothing more soul-destroying than recreating the (significant) amount of work that goes into the required forms (initially completed two years previously). Fortunately, work is afoot at the Scottish Government to make this process obsolete and centralise access to research data sets – however this is still in early stages and we are currently unsure as when this will be operational or what exactly will be available. For now, the pain must endure!

My reflections

Although there are difficulties in using administrative data for research purposes and delays can be frustrating at times, it is still (incredibly) a really rewarding process. The ability to gain new insights from previously unseen data is something that should excite any researcher. More importantly, data linkage offers the potential to improve society by answering questions that can’t be asked with traditional methods. Well worth an extra ethics form (even if I grumble about it!).

Researcher Experience: Dr Catherine Hanna

This week we hear from Dr Catherine Hanna, Research Fellow and PhD student (Cancer Research UK Clinical Trials Fellowship) at the Institute of Cancer Sciences, University of Glasgow. For about one year, Catherine has been working with Greater Glasgow and Clyde (GG&C) Chemocare data linked to both Scottish Cancer Registry (SMR06) and Cancer Quality Performance Indicators (QPI) data. She also has approval from the Public Benefit Privacy Panel (PBPP) (approval granted in June 2018) to obtain a national linked cancer data set for her project. She is currently awaiting access to this data.

In this post, Catherine tells us a bit about her research and what she has done with the GGC data, as well as the challenges she has faced in terms of applying for and getting access to the national data. 

Brief overview of Catherine’s research

My research investigates how we can assess the impact of oncology clinical trials. It is important to be able to demonstrate that trials testing new oncology treatments are having real life impacts such as changing practice, changing health and saving money. Analysing this impact helps us to identify which trials are making real world differences, and subsequently, to design more impactful trials in the future.

I am conducting a case study to assess the impact of the Short Course Oncology Treatment (SCOT) trial (1). This study investigated if treating patients with a diagnosis of colorectal cancer with 3 months of chemotherapy following surgery was non-inferior to treating with 6 months of chemotherapy. The trial results have shown that giving a shorter duration of treatment does not make a significant difference to the percentage of patients who are disease free at 3 years. Patients in the 3 month arm of the trial also had significantly less side effects from the treatment, especially with regards to peripheral nerve damage.

Gaining access to GGC Chemocare data, linked to QPI and SMR06 data sets, has enabled me to assess the impact of the SCOT trial on changing clinical practice. There was a significant change in prescribing practices for patients with colorectal cancer after the results of the SCOT trial were publicised. This will translate to a cost saving for the GGC health board and will result in less patients in GGC experiencing debilitating peripheral nerve damage as a result of their adjuvant chemotherapy treatment. A poster with the preliminary results of this analysis was presented at National Cancer Research Institute 2018.

In the next stages of my project, I plan to investigate the impact of the SCOT trial on prescribing on a national scale by using routinely collected chemotherapy data from the three cancer networks in Scotland (South East Scotland (SCAN), West of Scotland (WOSCAN) and North of Scotland (NOSCAN)). My project is running alongside, and will be using a sub-set of, the COloRECTal Repository (CORECT-R) data at the University of Edinburgh (part of an even wider project at the University of Leeds).  PBPP approval for my project was granted in June 2018, however, I do not yet have access to this data.  Below, I outline some of the lessons I have learned during the application to access this national data.

Summary challenges faced 

(1) When data is on databases out with Information Services Division (ISD), often at a local or regional level, this makes the process of data linkage more challenging and costly. Often, there is not the expertise at a local level to extract and transfer data and working relationships between local analysts and those coordinating data linkage centrally do not exist. Specifically, there are few examples of previous linkage of chemotherapy prescribing data (held locally) on a national scale.

(2) Data linkage requires a pre-specified list of the data variables from each data set. Often these lists are not publicly available, or even defined, and it can be time consuming and difficult to generate variable lists which are required for the data linkage process.

(3) Evidence of funding to perform data linkage and make use of national linkage services is often required for PBPP approval. However, depending on the time between submission and the data linkage occurring, it can be several years before the funds are used.

(4) If a researcher is funded for a specified period, the time taken for PBPP approval and data acquisition means that the researcher may not have an opportunity to analyse the data. There is also a risk that the research question will be less relevant than at the time of submission.

My reflections

There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients. The potential to pioneer the use of routine data for research purposes in Scotland is obvious; however, the practicalities of currently accessing and using this data are not straightforward.

My advice for anyone planning to work with national Scottish data, based on my experience:

  • Apply for access to data early and be aware that data acquisition may take longer than expected depending on your project.
  • Think about the costs of data linkage, especially if you want to link data sets that are not currently stored in ISD. The size and subsequent cost of a data linkage project is often based on the number of databases used (especially those outside ISD), rather than on the size of the finalised database.
  • Define which variables from the data set you will require early and be clear why you require each variable for your analysis.
  1. Iveson TJ, Kerr RS, Saunders MP, Cassidy J, Hollander NH, Tabernero J, et al. 3 versus 6 months of adjuvant oxaliplatin-fluoropyrimidine combination therapy for colorectal cancer (SCOT): an international, randomised, phase 3, non-inferiority trial. The Lancet Oncology. 2018;19(4):562-78.

Researcher Experience: Matthew Iveson

Our first Researcher Experience post is from Matthew Iveson, Senior Data Scientist at the University of Edinburgh. Matthew has been working with Scottish administrative records for about four years. Data sets he has worked with include Scottish Morbidity Records, Scottish Census, Prescribing Information System, NHS Central Register, NRS Births, Deaths and Marriages, Scottish Stroke Care Audit. He has also worked with the Scottish Longitudinal Study, a set of pre linked administrative data sets. We asked Matthew to tell us a bit about his research and the routine data he has worked with, what he saw were some of the key challenges in accessing and using administrative records, and to offer his thoughts to early career researchers hoping to work with this kind of data. 

Brief overview of Matthew’s research

My work has mainly focused around using data linkage to reconstruct the life-courses of individuals who took part in the Scottish Mental Survey 1947, a nation-wide survey of age-11 thinking skills conducted in Scottish Schools in 1947. These individuals, now aged over 80 years-old, have experienced a lifetime of changes in health and socioeconomic circumstances, and are an extremely important opportunity for examining how early-life circumstances can have a lasting impact on health and wellbeing across the life course. 

So far, I have used linked data to show that individuals with higher childhood cognitive ability, better socioeconomic circumstances and more education are less likely to die, less likely to report a long-term function-limiting illness in older age, more likely to be economically active in later life, more likely to retire later and of their own volition, and so on. I’ve also tried to establish the mechanisms by which childhood advantage affects health and wellbeing. I am currently waiting for data to examine whether factors from across the life course can be used to predict whether someone will require care in later life (including the type of care required), how well individuals can recover from a stroke, and whether someone will respond to a given antidepressant medication. 

Summary of challenges faced

One of the biggest issues I faced was in terms of timing. In some instances I have been waiting over 3 years for data. There have been several delays along the way, due to changes to the data access process (both over time and between organisations), queues for submitting forms to data controllers, changes to the legal landscape for data sharing (such as GDPR) and loss of submitted paperwork. The problem is that these delays are relatively common, and they result in a timescale that is not achievable under normal funding conditions. Since most early-career researchers find themselves on short-term contracts, they risk not getting data before their contracts expire, and since they are judged more than most on their productivity, these delays can seriously hamper a researcher’s career trajectory. 

The delays also highlight the fragility of the data access process. Getting to know key people in each organisation is one of the best ways to get through the process smoothly, but if these people leave their expertise often go with them. One example is that, during my project, the lawyer in charge of reviewing requests for census data left. Their replacement was understandably less confident about data sharing, and decided to re-review the laws surrounding the use of census data for research. Data controllers and other involved organisations need to ensure that knowledge and expertise are distributed across their teams, and need to invest in the infrastructure and staff that can ensure a robust system for the future.  

Thoughts for early-career researchers 

While organisations need to make things easier, researchers themselves need to manage their own expectations – gaining access to routinely-collected data, especially linked data, takes a very significant amount of time and effort. It’s worth planning well in advance and making sure that you can stay busy and productive while you wait for data to arrive. It’s also worth thinking about pre-linked datasets such as the Scottish Longitudinal Study if you’re short on time. Regardless of how you engage with routinely collected data and how long it takes, bear in mind that you’re learning an incredibly rare and valuable set of skills. Things are slowly getting better, faster and easier, but organisations are still fine-tuning their processes and a lot of the data is still new to the research scene. If you do have the time – and the perseverance – then administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society.