Background: using patient data in researchPatient data, for example hospital records and GP records, is collected as part of routine National Health Service (NHS) care. They constitute one of the largest sources of health data in existence. Over the years, researchers, policy makers and others have sought to harness their potential in carrying out evidence-based research, seeking to enhance our understanding of disease, improve patient care and service delivery. At no other time has using patient data for research been more in the spotlight than under the current COVID-19 pandemic. As researchers, we have a duty to ensure that we recognise the individuals who sit behind that data. But even more than that, we should seek to involve patients in our research, because really, who understands what they have experienced better than them? As an Early Career Researcher (ECR) working with patient data, and coming from a non-clinical background, appreciating the individuals behind the ‘numbers’ is not something that my training in Econometrics prepared me for. Of course, my primary motivation for pursuing my career in health research is to make a difference to individuals lives. Nonetheless, it is all too easy to become buried in the methods, producing fancy charts and output displaying significance stars, that the people behind the numbers become blurred in the background. The use of patient data by social scientists and ECRs - who are often limited in resources, contacts and time- is becoming more common. With this comes an increased need to ensure that those researchers know how to recognise and include the patient voice in their research, and how to be transparent about their uses of patient data. In what follows are some questions and answers from UPD’s Communications Officer Grace Annan-Callcott, who kindly agreed to talk to eCRUSADers about using patient data in research and in particular about public/patient engagement.
A conversation with Understanding Patient DataHow much and what sort of public/patient engagement work does UPD do? Grace pointed me to a couple of recent things they have been working on. Firstly, the Fair Partnerships Report, one of UPD’s “largest pieces of engagement work to date, which looked into what the public thinks about different kinds of businesses and organisations using NHS data”. So, how do patients feel about the use of their data? The Fair Partnerships work was a mixed methods public engagement programme consisting of round table discussions, citizen’s juries and an online survey (completed by just over 2,000 adults from across the UK). A key finding of the report was that “all NHS data partnerships must aim to improve health and care”. I believe this point will resonate with many ECRs, who often have difficulty in demonstrating “How will your research benefit the public?” Will our PhDs or first post-doc research projects actually translate into patient/public benefit? We can get ourselves all worked up when writing applications to use patient data, trying to demonstrate and perhaps exaggerate the public/patient benefit of our research. Could making false promises undermine trust further? Whilst we are entirely motivated by the hope that our early career research will translate into public/patient benefit, it is likely that it will not, at least not to begin with. But as ECRs working with administrative health records, we discover things that we did not set out to, we develop skills in analysing complex data sets, we generate new research questions, all of which could have patient/public benefit in the future. That being said, the responsibility lies with us to be both realistic and transparent about the aims of our research and the potential public/patient benefit that it could have. After we have carried out our research, we must be transparent and document what we have learned and how that learning will go on to contribute towards patient/public benefit at a later stage. We need today’s ECRs to be trained in analysing patient data, otherwise tomorrow’s patient/public benefit might not emerge. Have you done any public/patient engagement with Scottish patients? It is great to hear that UPD are hoping to do work with Scottish patients. I am not aware of any groups in Scotland who are carrying out similar work with public/patients across the board (do get in touch if you are!). For now, can we assume the views from the Fair Partnerships participants would also hold for the Scottish population? As Research Data Scotland (RDS) looms on the horizon, it appears Scotland has much further to go in terms of gathering views from the public on how their data is used. Should all researchers working with administrative health data do public/patient involvement? In an ideal world, we would carry out public/patient involvement in our PhDs and post-docs. However, ECRs may have limited contacts, resources and time, meaning it might not be feasible to do so. In particular, if you are working with a large national dataset, would it be realistic to capture representative views of the country on how you plan to use their data? Well maybe not, but there are other things we can do. For one, Grace pointed out that “use MY data have created a data citation to help researchers acknowledge the contribution patients make to research”. This citation is a means to show gratitude to patients for allowing researchers access to their data, as well as enhancing the visibility of that use. Another thing that crossed my mind was getting someone you know, with no knowledge about the research you are doing, to read your research proposal. Can they see the public/patient benefit in what you are proposing to do? The outbreak of COVID-19 has clearly pushed the use of patient data into the headlines and accelerated the use of patient data in research (see the OpenSAFELY project in England). I asked Grace if UPD feel this presents an opportunity to demonstrate how we can safely and successfully use patient data in research or a challenge to maintain public trust in the use of their data? Are there any other UPD resources that you would recommend to eCRUSADers working with Scottish administrative health data? Thanks very much for taking the time to answer these questions Grace. There's clearly some great work going on at UPD and there is definitely a lot that researchers who are working with patient data can learn from that work. It would be great to see more public and patient engagement work on using patient data in Scotland- if anyone reading is familiar with any then do get in touch! Look out for our next People Make Data post where we will be hearing from useMYdata. #datadaveslives #admindata
- What is the PBPP?
- What is the legislation and principles covering aspects of information governance for the use of NHS Scotland data for purposes other than direct care?
- What is the remit of PBPP?
- When do you need a PBPP application?
- How does the PBPP application process work?
- How long is your PBPP application going to take?
- How to fill in your PBPP application according to the 5 Safes
- Top Tips for filling in your PBPP application
- Group discussion and reflection on the concerns raised
- Final thoughts
- Useful definitions
Can you tell us a little about your role in ECTU?My role involves a variety of tasks – however, primarily my role is the statistical reporting of trials run from within ECTU. I typically have up to eight active trials throughout the year. My role varies on these – I am Trial Statistician for approximately half of them, and the ‘reporting’ statistician for the other half. When I have my reporting statistician hat on, I’m responsible for the statistical programming and generating the analysis and results.
How trials have you worked on that have involved using administrative data?Since I joined ECTU in 2014, I have worked on three trials using administrative data. Two of them used solely routine healthcare data and the third one is running currently, based on a blend of routine data plus data captured within the trial.
Is the use of administrative data in trials becoming more common over time?The use of administrative data in the trials setting is definitely becoming more common since clinical trials are known to be expensive and time-consuming. The use of administrative healthcare data is viewed as a more efficient means of understanding the health of the population using readily available data. However, there is a trade-off in terms of the quality of the data being captured.
What was the High-STEACS trial?High- Sensitivity Troponin in the Evaluation of patients with suspected Acute Coronary Syndrome (High-STEACS) was a step wedge, cluster- randomised control trial. In plain English this means… It’s a relatively recent study design that’s increasingly being used to evaluate service delivery type interventions. The design involves crossover of clusters (usually hospitals or other healthcare settings) from control (standard care) to an alternative intervention until all the clusters are exposed to the intervention. This differs to traditional parallel studies where only half of the clusters will receive the intervention and the other half will receive the control. This diagram helps to demonstrate the difference in designs: The population of interest were patients presenting in hospital with heart attack symptoms. The trial sought to test a new high-sensitivity cardiac troponin assay against the standard care contemporary assay. Specifically, to test if the new assay could detect heart attacks earlier and with a more accurate diagnosis.
How were patients enrolled into the trial and how does this differ from a standard trial?Step wedge trials usually randomise at a cluster (hospital) level, rather than randomising patients individually, so this was the main difference to a standard trial. So patients were enrolled rather than randomised into the trial. Standard trials require patient consent before randomisation, but in this context, individual patient consent was not needed due to the randomisation being performed at hospital level. Appropriate approvals for consent were sought through the hospitals. If patients presenting with heart attack symptoms at any of the hospitals were eligible for the trial (based on our pre-specified inclusion/exclusion criteria), then we had permission (at hospital level) to include them in the study and use their securely anonymised data.
How many patients were enrolled into the trial?Approximately 48,000 patients were enrolled from 10 hospital sites in NHS Lothian (3 sites) and NHS Greater Glasgow and Clyde (7 sites), over a period of just under three years.
Which administrative data sets were used?We used a total of 12 distinct data sources which were a combination of general administrative datasets and datasets more specific to our area of research from locally held electronic health care records. Prescribing data was obtained from the Prescribing Information System, also ECG data, plus general patient demographics. Trial-specific outcome data was obtained from the Scottish Morbidity Record (SMR01) and also from the register of deaths (National Records of Scotland). All data were captured separately for each Health Board – there is currently no amalgamated data source which holds all data. Health Boards are the owners of their own data. The main linking mechanism for these 12 data sources was the patient CHI (Community Health Index) number. To ensure patient anonymity, CHI numbers were securely encrypted prior to use.
How did you get approval for these data sets? How long did this approvals process take?Approvals were required at a number of levels. We required ethics approval, approval to use patient data without consent and Health and Social Care approval (through the Privacy Approvals Committee, predecessor to the Public Benefit Privacy Panel). There were also health board specific approvals required for local data to be released. In addition, we required data supplier approval. Finally, approval was needed for the data to be hosted on the Safe Haven platform. This process was long! This was ongoing throughout the duration of the trial. Although the data was being captured automatically via routine records, the final dataset wasn’t confirmed until relatively late on in the process due to complexities of mapping locally held healthcare records. One of the advantages of the national datasets is that they are the same across all health boards.
Where were the data sets stored?Datasets from NHS Lothian and NHS GG&C were supplied separately in their own Safe Havens. The combined dataset was hosted on the NHS Lothian Safe haven space on the National Safe Haven analysis platform .
How did the linkage of the data sets happen?The data sources from both health boards were combined and hosted on the National Safe Haven analysis platform. This wasn’t a straightforward process. Although we’d anticipated capturing exactly the same patient data across both health boards, the reality was quite different. Data were captured in different formats with different variable names and different definitions. So there was an unexpected element of data cleaning required before the data could effectively be merged into one large analysis dataset. The final linkage was done using the securely encrypted CHI number for each patient.
What do you see as the major benefits of using administrative data in this setting?Use of administrative data in this context is a more efficient process – less resource spent on the administrative aspects of trial enrolment e.g. capturing demographic details such as age, sex, postcode or medical history. Using administrative data also gave us the opportunity to research a large representative patient population in comparison to the setting of an RCT where a strict pre-specified population, not necessarily representative of the target population, are studied.
Overall, what were the major challenges of the study?From the data side of things, ensuring the correct data was extracted was difficult. The diagram above is very over-simplified view of what happened! The reality of picking up the required variables from two separate health boards which capture data very differently was difficult. Another challenging aspect was ensuring that a patient wasn’t enrolled more than once in the study. Patients can present in any hospital with heart attack symptoms more than once, so we needed to ensure they weren’t included in the study each time they came to hospital. This required a de-duplication algorithm using encrypted and de-identified patient data. However, I think the biggest challenge was for those in the team tasked with obtaining the correct approvals. It was underestimated how complex this would be. While approval for the national datasets was straightforward and the eDRIS team were very helpful, processes for locally held data at the time of trial set up were not established. Legislation around patient data confidentiality was continually changing, so we were faced with keeping abreast of new legislation as time progressed. The safe haven networks are now more established and hopefully, the processes are more straight forward.
Is there anything you would do differently next time?I think the data validation aspect of the trial is crucial. Ideally we would have had more time spent on this in order to ensure the data was as correct as possible. Involving the clinical team much sooner in this process would have helped - they have a really important role to play in terms of ensuring the data picked up makes sense from a clinical perspective. For High-STEACS, the access to the data was highly restricted and did not include the clinical team. Many of the data discrepancies were only picked up at the final review stage once data and results had been released out of the Safe Haven area. Working within the Safe Haven environment creates time lags on both sides of the process – data being imported into the Safe Haven and also results exported out at the end take time. We hadn’t considered this time lag when working to tight timelines.
Do you know if anyone is using the learning from this trial for future trials of this kind?The High-STEACS trial was directly followed by the HiSTORIC trial, addressing similar research questions and using many of the same data sources. So we have been through the loop again which has made for a more streamlined process. Other trials within ECTU are also making use of the learnings from High-STEACS, particularly from the governance and approvals side of things.
Thanks for sharing this with us Catriona! It is great to see that administrative data are being utilised alongside clinical trials in Scotland. It is also interesting to hear that despite being part of a trials unit like ECTU, the High-STEACS team still faced many of the same challenges that we eCRUSADers have experienced when using administrative data for research. In particular, we can relate to the issues of permissions, timing and working within the Safe Haven environment. Overall, it seems that the timing issues were due to the use of the locally held data rather than using the national data.
Brief overview of David's researchUsing the linked data set described above, the focus of my research has been investigating the association between multimorbidity (more than one long-term condition) and social care receipt. I am also analysing interactions between health and social care services, with a particular interest in unscheduled care. Good social care data has been difficult to come by in the past - not just in Scotland, but internationally. I have been lucky to be one of the first group of researchers to get access to the Social Care Survey collected by the Scottish Government in a format that can be linked to health-based data sources. So far, provisional results show us that increasing age and severity of multimorbidity are associated with higher social care receipt. This was anticipated, but we have never been able to show it empirically before the cross-sectoral linkage. We have also been able to describe the receipt of social care by socioeconomic position (SEP) using the Scottish Index of Multiple Deprivation (SIMD). This is new and, to my knowledge, hasn’t been described elsewhere on such a large scale. Here we find that those with lower SEP are more likely to receive social care. (All these patterns are shown in the figure below). However, due to a lack of good measures, we can’t tell if the provision of care matches need for care. My latest piece of work has been looking at whether receipt of social care influences unplanned admission to hospital. Using time-to-event (survival) analysis we can see that, for those over 65, people who receive social care are twice as likely to have an unplanned admission (again these results are provisional at the moment).
Summary challenges facedThe barriers I have faced are, no doubt, similar to others using linked data -the main one being time. Approvals, extraction, linkage etc. all takes considerable time and as a researcher you are not in control of these timescales. A good example is shown by a sub-project for my PhD which was to use social care data from one local authority area only. The council in question were exceptionally helpful and keen to share data. They were very patient whilst I organised ethics and approvals on the academic side. However, by the time I was ready to talk data sharing agreements they had operational pressures (specifically the 2017 local elections) which tied up their legal team. After this we were all hopeful about making progress, but a certain Prime Minister went for a walk in the woods at Easter and decided to call a general election! Cue another 6-week delay until the legal team could start negotiating an agreement. We eventually got there but this illustrates that the data controllers are at the mercy of higher forces as well and it is impossible to set meaningful deadlines. I am very fortunate to be in a position to keep working with my PhD data in my current role and keep asking questions of the large amount of data we have. However, I have moved university in order to this. This means I now have to repeat the process of ethics, data sharing agreements, privacy impact assessments etc. This is absolutely necessary as my current employers need to make sure that all legal aspects are covered, but there is nothing more soul-destroying than recreating the (significant) amount of work that goes into the required forms (initially completed two years previously). Fortunately, work is afoot at the Scottish Government to make this process obsolete and centralise access to research data sets – however this is still in early stages and we are currently unsure as when this will be operational or what exactly will be available. For now, the pain must endure!
My reflectionsAlthough there are difficulties in using administrative data for research purposes and delays can be frustrating at times, it is still (incredibly) a really rewarding process. The ability to gain new insights from previously unseen data is something that should excite any researcher. More importantly, data linkage offers the potential to improve society by answering questions that can’t be asked with traditional methods. Well worth an extra ethics form (even if I grumble about it!).
The conference, organised by Administrative Data Research Wales (ADRW) at the University of Swansea and sponsored by Administrative Data Research UK (ADRUK), the Economic and Social Research Council (ESRC), and the Welsh Government, had the central theme of ‘Public data, for Public Good’. A theme which, most appropriately, reminds us that whilst carrying out our research, the data we are using belongs to the public and we must hold this in the forefront of our minds as we use it to generate better outcomes for them.
The three days were jam packed with plenary keynotes, parallel sessions, rapid fire sessions, a visit to Cardiff Castle and for the super geeks- a whole lot of Rubik’s Cubing 🤓
Unlike some other international conferences I have been to, where it can be difficult to see the relevance of research from one country translate over to your own, the 2019 ADR was completely all highly relevant. In fact, a clear takeaway from the conference was the message that there is a huge amount to be learned from how things are done elsewhere.
There were so many talks I wanted to go to and as always is the way with parallel sessions, it simply wasn’t possible to get to them all. I decided to try and attend as many as possible which focussed on administrative data infrastructure and ethics in using public data.
In this post, I offer a summary of take home messages and expand on a couple of the keynote talks and parallel session talks of interest in this area. I could have written a whole post on the excellent work presented from researchers in Scotland, including some from fellow eCRUSADers, but alas this will have to wait for another time!
ADR 2019 Take Home Messages
- The potential of administrative data for research is huge. Especially in Scotland where linkage across several research domains is possible.
- There is a general movement towards the use of large data repositories (or data lakes/data lochs/integrated data systems- perhaps we need to agree on one definition?) of ready linked data which will speed up access for researchers whilst maintaining public privacy and ultimately make better use of public data for public good.
- Issues with data access, particularly concerning timing, are not unique to Scotland.
- Some countries seem to be further forward than Scotland and the rest of the UK in this regard (notably Australia and Canada) and there is much to be learned from work going on around the world.
- Whilst the message of public trust and transparency was front and centre throughout the conference, I felt there was little demonstration of how this is being done in practice and how, beyond using public data safely, researchers can contribute directly to building that trust.
- Don’t be fooled by the ADR Rubik’s cube- it is a lot harder than it initially looks!
At the end of the three days, it was great to see Michael Fleming, researcher from the University of Glasgow, receiving the Best Paper Award for Evidence to Support Policy Making on his work using linked education and health data to explore outcomes for children treated for chronic conditions. Before presenting Michael with his award, Emma Gordon, Director of ADRUK, acknowledged the long wait (from memory just under 2 years) before Michael got access to the data for his research and asked the audience:
“Can you imagine having to wait so long for data?”
Sadly, we certainly can. In fact, there are a considerable number of us here in Scotland (and almost certainly elsewhere) who have.
It was great to hear Emma highlight this and I do think that the conference really sent a message of hope to eCRUSADers and researchers more generally, that things are improving in Scotland. It was certainly motivating to see the future of administrative data research already being put into practice in many countries around the world. But, there is still a long way to go.
In the meantime, you should definitely join eCRUSADers to hear the latest on the administrative data front and get in touch to share your experiences so that we can all learn from them.
Hit subscribe at the top of this page!
Keynote talks to highlight
Garry Coleman, Associate Director of Data Access at NHS DigitalThe first keynote presentation on day one was given by NHS Digital’s Garry Coleman who, despite being likened to Scrooge and Smaug the dragon, fiercely guarding NHS administrative data, outlined the suite of changes that NHS Digital have made over the last year in order to improve access to NHS data for researchers. These included the introduction of a fast stream service for repeated applications and those with precedent, published ‘standards’ to help researchers know what is expected from their application, and the establishment of Data Access Environment (DAE). DAE is the new cloud technology in England whereby researchers can access patient data for research without the need for the data to leave NHS Digital. The platform went live in May 2019 and aims to provide researchers with faster access to ready linked data sets with built in tools for more powerful data visualisation and analysis. There’s a YouTube video on it here. It all sounds very good, as well as very familiar. I wonder how will this platform compare to Research Data Scotland? Whilst ensuring public trust in all that we do with public data, was at the heart of Garry’s talk, it was not clear how much, if any, public engagement by NHS Digital has been done around the use of the new DAE system. I’ve had a quick peruse of the NHS Digital website and can’t see any evidence of it on there either. Perhaps it is there somewhere and I am missing it? In any case, given the need to be transparent and ensure that public trust is at the heart of using administrative data for research, we perhaps need more than the hope that the public are aware and are happy for this to be going ahead. What was clear from Garry’s talk was that he was actively seeking feedback from the research community on how they have found the data access processes of NHS Digital and he expressed a genuine interest in making things easier for researchers.
John Pullinger, Former Head of the Government Statistical Service and Chief Executive of the UK Statistics Authority.
On day two, the first plenary keynote was from John Pullinger, who offered his thoughts on “Lots of lovely numbers but why does everyone make it so difficult?” Clearly, John has an immense amount of experience in this field and he did an excellent job of taking us on a journey with him from the 70s when he first began working with the limited administrative data that was available then, to present day where administrative data are all around us. John’s message was clear, for us to have a social licence to operate with the public’s data, it is incumbent on us to earn their trust. This is in fact just as important as our research itself. He highlighted the importance of seeing legislation around the use of public data like GDPR, as enablers to research rather than impediments. Finally, John pointed out the need to be realistic with what the data can tell us and not to say something more than what the evidence tells us. Once again, this comes back to the need to earn the trust of the public and not doing anything that might undermine that trust.
For me, John really instilled in my mind the fundamental need to remember whose data we are using and that we are very much still on the journey to earning their trust.
Parallel sessions to highlight
In this talk, Robert McMillan talked about the Georgia Policy Labs, a ‘data lake’ which hosts many ready linkable administrative data sets for policy makers and researchers to access and analyse to conduct research on a number of key policy areas. Robert highlighted the secure cloud infrastructure, separation of duties and secure data rooms which ensure that data are stored and used in a safe way. He also mentioned the ‘master data sharing agreement’ which they have to allow access to this data lake. Time was tight so there wasn’t really time to go into detail on this, though I am sure the Scottish Government would be interested to know more as they work towards implementing Research Data Scotland. In her talk, Anna Ferrante discussed her work in merging the Data Linkage Western Australia and the Centre for Data linkage to form the Population Health Research Network (PHRN). The PHRN is a national network of data centres which links data collected across Australia on the entire population. Its infrastructure allows for the safe and secure linkage of data collections across a wide range of sources. Like some of the other talks throughout the conference, Anna talked about the Bloom filter structure PHRN uses to probabilistically and anonymously link between administrative datasets. Not surprisingly given the amount of research that comes out of Australia using linked administrative data, Anna’s was one of many Australian talks which highlighted the level of maturity of the administrative data infrastructure in Australia compared to Scotland. Michael Schull talked about a project he is involved in that is building a partnership between health service researchers and computer scientists to develop a high-performance computing platform for the analysis of large linked administrative datasets. The goal of the partnership is to use artificial intelligence and machine learning to improve health and health care, which of course requires an infrastructure that has the power to store and manage large quantities of data. Michael talked about some of the things they have learned from working with computer scientists. Michael stated that the hardware for this infrastructure was in fact the easy part.... Della Jenkins presented on the work being carried out at Actionable Intelligence for Social Policy (AIS), an organisation which works with state and local governments to implement Integrated Data Systems that link administrative data across government agencies. Della’s talk reiterated many of the messages that John Pullinger had highlighted in his Key Note speech earlier in the day. Namely, as the emergence of initiatives for integrated administrative data (AKA linked administrative data) in research continues to grow, it is vital that we build awareness and infrastructure with public involvement every step of the way. The group have a very useful report and toolkit on their website: Tools for Talking (and Listening) About Data Privacy for Integrated Data Systems. Although the report is aimed at government agencies and their partners who are using linked administrative data, the content is helpful more generally in terms of steps to develop a social licence for using linked public data. It’s well worth a look. Andy Boyd gave a great talk on work that was carried out by Closer and NHS Digital looking at the possibility of different infrastructures for the onward sharing of longitudinal study data that is linked to administrative records, which currently cannot be released outside of the cohort study institution. The work identified five onward data-sharing models and concluded that although greater clarity is needed in order to effectively share anonymised data and to do so internationally, there are opportunities for developments and the large community of longitudinal cohort studies in the UK might be able to facilitate part of those processes. Full report available here. One of the final talks I went to was from Mike Robling at the University of Cardiff. I’d been looking forward to going to this talk because I had already heard of the CENTRIC study, work which hopes to develop “training for UK researchers that enhances their understanding of public perspectives and governance requirements and improves their practice when working with routine data”. In his talk, Mike outlined the results from the focus groups with stakeholders, workshops with members of the public, and online survey filled in by researchers. In summary, the study found that there is both a need and an appetite for training researchers in public engagement and in the complex regulations and requirements around using routine data for research. I very much look forward to seeing the training resources that CENTRIC produce and think they will help to fill the existing gap when it comes to researchers improving public engagement and public trust.
One final reflection on privacy
This post may have been more aesthetically exciting if I had been able to fill it with photographs of the speakers whom I went to see present. Sadly, it isn’t because none of the speakers said one way or the other if they were happy to be photographed. Given the nature of the conference, I decided to err on the side of caution and assume that this meant the speaker had not given consent to having their photograph taken and plastered on social media. Of course not everyone shared this view, which made me then wonder if I was silly not to take any photographs in the first place. Or maybe I should have asked the speaker before if they would mind. And so I wonder, in the name of transparency, might it be possible for speakers to quickly mention at the beginning of their talks if they are happy to be photographed? Just an idea. Here's me giving my talk 🙂
The full 2019 ADR conference proceedings are available here where you can access abstracts of all talks from the three days. Thanks to all of the organisers for putting on such a great event and to the speakers for sharing their exciting work.
As always, I would love to hear your thoughts on this so please comment/share/email me!
Brief overview of Catherine's researchMy research investigates how we can assess the impact of oncology clinical trials. It is important to be able to demonstrate that trials testing new oncology treatments are having real life impacts such as changing practice, changing health and saving money. Analysing this impact helps us to identify which trials are making real world differences, and subsequently, to design more impactful trials in the future. I am conducting a case study to assess the impact of the Short Course Oncology Treatment (SCOT) trial (1). This study investigated if treating patients with a diagnosis of colorectal cancer with 3 months of chemotherapy following surgery was non-inferior to treating with 6 months of chemotherapy. The trial results have shown that giving a shorter duration of treatment does not make a significant difference to the percentage of patients who are disease free at 3 years. Patients in the 3 month arm of the trial also had significantly less side effects from the treatment, especially with regards to peripheral nerve damage. Gaining access to GGC Chemocare data, linked to QPI and SMR06 data sets, has enabled me to assess the impact of the SCOT trial on changing clinical practice. There was a significant change in prescribing practices for patients with colorectal cancer after the results of the SCOT trial were publicised. This will translate to a cost saving for the GGC health board and will result in less patients in GGC experiencing debilitating peripheral nerve damage as a result of their adjuvant chemotherapy treatment. A poster with the preliminary results of this analysis was presented at National Cancer Research Institute 2018. In the next stages of my project, I plan to investigate the impact of the SCOT trial on prescribing on a national scale by using routinely collected chemotherapy data from the three cancer networks in Scotland (South East Scotland (SCAN), West of Scotland (WOSCAN) and North of Scotland (NOSCAN)). My project is running alongside, and will be using a sub-set of, the COloRECTal Repository (CORECT-R) data at the University of Edinburgh (part of an even wider project at the University of Leeds). PBPP approval for my project was granted in June 2018, however, I do not yet have access to this data. Below, I outline some of the lessons I have learned during the application to access this national data.
Summary challenges faced(1) When data is on databases out with Information Services Division (ISD), often at a local or regional level, this makes the process of data linkage more challenging and costly. Often, there is not the expertise at a local level to extract and transfer data and working relationships between local analysts and those coordinating data linkage centrally do not exist. Specifically, there are few examples of previous linkage of chemotherapy prescribing data (held locally) on a national scale. (2) Data linkage requires a pre-specified list of the data variables from each data set. Often these lists are not publicly available, or even defined, and it can be time consuming and difficult to generate variable lists which are required for the data linkage process. (3) Evidence of funding to perform data linkage and make use of national linkage services is often required for PBPP approval. However, depending on the time between submission and the data linkage occurring, it can be several years before the funds are used. (4) If a researcher is funded for a specified period, the time taken for PBPP approval and data acquisition means that the researcher may not have an opportunity to analyse the data. There is also a risk that the research question will be less relevant than at the time of submission.
My reflectionsThere is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients. The potential to pioneer the use of routine data for research purposes in Scotland is obvious; however, the practicalities of currently accessing and using this data are not straightforward. My advice for anyone planning to work with national Scottish data, based on my experience:
- Apply for access to data early and be aware that data acquisition may take longer than expected depending on your project.
- Think about the costs of data linkage, especially if you want to link data sets that are not currently stored in ISD. The size and subsequent cost of a data linkage project is often based on the number of databases used (especially those outside ISD), rather than on the size of the finalised database.
- Define which variables from the data set you will require early and be clear why you require each variable for your analysis.
- Iveson TJ, Kerr RS, Saunders MP, Cassidy J, Hollander NH, Tabernero J, et al. 3 versus 6 months of adjuvant oxaliplatin-fluoropyrimidine combination therapy for colorectal cancer (SCOT): an international, randomised, phase 3, non-inferiority trial. The Lancet Oncology. 2018;19(4):562-78.
Brief overview of Matthew's researchMy work has mainly focused around using data linkage to reconstruct the life-courses of individuals who took part in the Scottish Mental Survey 1947, a nation-wide survey of age-11 thinking skills conducted in Scottish Schools in 1947. These individuals, now aged over 80 years-old, have experienced a lifetime of changes in health and socioeconomic circumstances, and are an extremely important opportunity for examining how early-life circumstances can have a lasting impact on health and wellbeing across the life course. So far, I have used linked data to show that individuals with higher childhood cognitive ability, better socioeconomic circumstances and more education are less likely to die, less likely to report a long-term function-limiting illness in older age, more likely to be economically active in later life, more likely to retire later and of their own volition, and so on. I’ve also tried to establish the mechanisms by which childhood advantage affects health and wellbeing. I am currently waiting for data to examine whether factors from across the life course can be used to predict whether someone will require care in later life (including the type of care required), how well individuals can recover from a stroke, and whether someone will respond to a given antidepressant medication.
Summary of challenges facedOne of the biggest issues I faced was in terms of timing. In some instances I have been waiting over 3 years for data. There have been several delays along the way, due to changes to the data access process (both over time and between organisations), queues for submitting forms to data controllers, changes to the legal landscape for data sharing (such as GDPR) and loss of submitted paperwork. The problem is that these delays are relatively common, and they result in a timescale that is not achievable under normal funding conditions. Since most early-career researchers find themselves on short-term contracts, they risk not getting data before their contracts expire, and since they are judged more than most on their productivity, these delays can seriously hamper a researcher’s career trajectory. The delays also highlight the fragility of the data access process. Getting to know key people in each organisation is one of the best ways to get through the process smoothly, but if these people leave their expertise often go with them. One example is that, during my project, the lawyer in charge of reviewing requests for census data left. Their replacement was understandably less confident about data sharing, and decided to re-review the laws surrounding the use of census data for research. Data controllers and other involved organisations need to ensure that knowledge and expertise are distributed across their teams, and need to invest in the infrastructure and staff that can ensure a robust system for the future.
Thoughts for early-career researchersWhile organisations need to make things easier, researchers themselves need to manage their own expectations – gaining access to routinely-collected data, especially linked data, takes a very significant amount of time and effort. It’s worth planning well in advance and making sure that you can stay busy and productive while you wait for data to arrive. It’s also worth thinking about pre-linked datasets such as the Scottish Longitudinal Study if you’re short on time. Regardless of how you engage with routinely collected data and how long it takes, bear in mind that you’re learning an incredibly rare and valuable set of skills. Things are slowly getting better, faster and easier, but organisations are still fine-tuning their processes and a lot of the data is still new to the research scene. If you do have the time – and the perseverance – then administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society.
Is this blog going to be of interest to me?Answer the following questions:
- Do you work with or want to work with administrative data (Scottish or otherwise)?
- Do you want to hear about interesting research that is going on (in Scotland and further afield) which uses administrative data?
- Are you interested in possible training opportunities for working with sensitive and complicated administrative data sets?
Why is there a need for the eCRUSADers blog?Currently, Scotland is in a unique position to produce population level research due to the way it routinely collects information about Scots across a number of key domains – health, education, social care etc. Additionally, these data sets can be linked together, creating an invaluable source of information to carry out social research, which could ultimately have a positive impact on the lives of people living in Scotland and further afield. However, navigating the administrative data landscape is complex, working with administrative data is tricky, and the resources with which to carry out these tasks are scarce. These issues are particularly challenging for Early Career Researchers (ECRs) who have limited time and often knowledge about how to traverse this landscape. The eCRUSADers blog will provide somewhere for them to start.
The purpose of the eCRUSADers blog?The purpose of the eCRUSADers blog is three-fold:
- To provide a platform for the sharing of information and experiences
- To enhance our understanding of what is working and where there is room for improvement
- To encourage discussion around what can be done to keep Scotland on the trajectory of becoming a world leader in research using administrative data