Pete’s contribution to the promotion of the safe use of patient data in research is second to none and it is a real privilege to have him offer his insights in his post. In his neat analogy that follows, Pete helps us really get to the bottom of what sharing patient data is all about.
“We’ve been sharing information since before language was developed”
At a most basic level, creatures that live in groups, from insects to humans, share information between themselves such as the presence of danger and the location of food. This is because it is a good method of protecting the group and helping it to flourish. Whilst living in caves, our gestures, grunts and groans gradually became more sophisticated allowing us to share more detailed information that evolved into language. However, even today ninety percent of our communication (and therefore information) is still non-verbal. You can tell things about a person just by such things as their facial expressions, how they sit or move their body, their tone and volume of voice, the level of eye contact. We all sub-consciously and consciously do this to enhance the communication of our thoughts and feelings. It helps us to form relationships and friendships. Surely, acquiring information is the reason we send our children to school and why we study. We exchange information about our thoughts and our feelings when we socialise.
But when it comes to personal medical information this is, of course, a little bit different – or is it? Whilst many of us like to share some of this information, there may have some aspects that we feel we want to keep to ourselves. Of course, it is our right to keep that information to ourselves if we wish or tell a trusted person in confidence.
Medical data is just bits of information held electronically. But information, when held as data, can be easily shared with others for both benefit and, potentially, disadvantage. However, if that data is anonymised (in other words all information is removed that might identify us) and it is added to information from thousands of other people, might we hold a more relaxed view? And if that data was only accessible by trusted people, authorised to access that information only for a very specific and approved purpose should we have any substantial concern?
As current or future patients, we benefit from improved treatments and services because previous patients shared their medical information. Do we not, in turn, have a moral obligation to share our information to benefit our children, grand-children and future generations of humanity?
I believe that, providing the current legally required data security controls in place and those that hold the data are open and transparent (about who data is accessed by and why), there is no logical reason why we should not share our anonymised medical data – for the benefit of us all.
In Drew’s account of working with administrative data, the familiar challenges of timing, unforeseen circumstances and working in the safe setting, rear their heads. However, like the other researchers we have heard from, the ‘seeing the glass half full’ attitude and optimism for the need to press on in spite of these challenges endures. In particular, Drew points out the useful discoveries him and his colleagues made whilst waiting for data access, which would ultimately improve their research output in the long run. I think this point rings true for me especially, after all, eCRUSADers wouldn’t exist if it weren’t for the wait for data.
Over to you Drew:
Overview of my research
I’ve yet to do much of work with our main variables of interest, as we only recently were granted access to a few of the data sets we requested. However, while we were working on obtaining and waiting for access we followed some side avenues in part to prepare ourselves for working with the data, and in part because we thought of research questions that we thought were interesting in their own right. For example, we are interested in how early life socioeconomic conditions, commonly represented by the father’s occupational social class, relate to mental health later on in life. However, our data set is based on the participants of the Scottish mental survey 1947; these individuals were all born in 1936, and because of World War II, reports of fathers’ occupations from censuses carried out during participants’ early lives are unreliable, not representative, and often missing. In order to improve on our data set, we dug deeper into the data we were aiming to link, pulling out additional, historical occupation information, and coding these data ourselves. This in turn lead to a machine learning approach to classifying historical social class data, which can be used in the future by people working with historical social class data. So it goes to show how much interesting, useful work you can wind up doing along the way!
Summary of any challenges faced
The process is long and convoluted, and at seemingly every turn. I was fortunate because I joined the project relatively late, although when I joined we thought we would have access to the data in a few months’ time, rather than two years later. I did what I could to help with the application processes, but ultimately this work predominantly falls on the shoulders of a single person, and most of one’s time in this area is not spent working on forms, but waiting for other people to get back to you.
A large amount of time and effort goes into processing and preparing data before linkage, but that does not mean that the data are clean and easy to work with once you get a hold of them. You are likely going to need to spend significant time cleaning and otherwise processing your data before you can analyse them.
There are advantages to having to layout analyses in advance during the application process: essentially, this forces you to pre-register your work, which is an important step in doing reproducible science. However, a run-of-the-mill pre-registration has considerable flexibility, and this is not so much the case with the analyses we plan for our data. All output must be checked for privacy and security concerns, so if we want to tweak an analysis or run a sensitivity analysis, for instance at the request of a reviewer, every different analysis that we want to take out of the safe haven environment needs to be checked, and that process can take weeks.
Thoughts for fellow and future eCRUSADers
You ought to think very carefully about timing, in particular you ought to expect significant delays. If possible, try to plan for multiple scenarios, and make sure you have meaningful work you can do while you wait out the access process. The processes for accessing data are supposedly being streamlined and improving, but it is worth investing in your relationships with the people along the data access pipeline, as they are best served to help you manage your expectations.
It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades. These are types of data that sometimes cannot be obtained in any other way, and this allows for novel, meaningful research questions to be asked and answered.
Public Benefit Privacy Panel Timelines
Preparation of PBPP application: 01/06-2018 – 21/08/2018 (about 12 weeks)
Submission to initial PBPP approval: 05/10/2018 (about 12 weeks)
PBPP approval to data access: 16/06/2020 (about 1 year and 6 months)
In the second of our series, ‘People Make Data’, we hear from use MY data, an independent UK movement that was set up in June 2015 by a number of patients, carers and relatives to address the slowdown in cancer research in the wake of the Care.data programme. Since then, the remit of use MY data has grown and they now cover all types of patient data.
Ailson Stone (Coordinator) and Chris Carrigan (Expert Data Adviser) are part of the use MY data Secretariat, which works on behalf of the use MY data members. Alison and Chris have kindly put this post together to share with eCRUSADers a little about how use MY data started, the work that they do and most helpfully, some key messages for early career researchers to take away.
Involving patients and maintaining transparency are paramount to ensuring public trust. Any researcher who is working with patient data should make sure they do just that and a great place to start is by looking at the great work being carried out by use MY data. The importance of doing so has been duly amplified by the current COVID-19 pandemic and there is a need to learn from past mistakes at such a time.
Thank you very much to Alison and Chris for sharing this post with us.
Bringing the patient voice to discussions about patient data
Back in the middle of 2016 NHS England announced the closure of the Care.data programme. You might wonder why this is relevant to eCRUSADers, or indeed why we’d start a blog that way. We should explain.
In October 2013 NHS England set out their intentions to link data, for the first time, from different NHS providers spanning primary and secondary providers, “supporting studies that identify patterns in diseases, responses to different treatments, and the effectiveness of different services.”
However well-intentioned the concept, by February 2014 ministers had halted the programme, following several media stories about data being sold and amidst concerns about lack of awareness, lack of consultation and lack of clarity about how the data would be used.
By 2015 around 1.5 million patient records were flagged as opting out of Care.data. The loss of public trust, negative media coverage and huge risk-aversion by data controllers, saw data access for researchers grind to a near halt or actual halt.
At the National Cancer Intelligence Network conference in Belfast in June 2015 the widespread dissatisfaction amongst researchers and patients came to a head, in a panel Q&A with the chief medical officers of the UK. Data access for cancer research had effectively stopped. But researchers needed to use the data, and patients wanted their data to be used.
The birth of use MY data
In response to these concerns, patients, carers and relatives gathered together, over a lunch time session. The session was jointly hosted by Cancer52, Cancer Research UK and the National Cancer Intelligence Network.
Its focus was on how current cancer patients could help turn their data into the best outcomes for future cancer patients. ‘Donate your data’ was proposed – an organisation where patients would willingly give their data for this purpose. The patients attending the session showed unanimous support for the concept of donating their data.
A Working Group discussed the practical ways forward. Most felt that the name ‘Donate Your Data’ was not accurate – as cancer patients said their data had already been taken, and the focus should therefore be on how their data would/could be used. This led to the name change of use MY data. The movement rapidly expanded to include patients, relatives and carers from all disease areas.
From its beginnings in 2015, use MY data has evolved through a rolling programme of engagement and communications. We have hosted at least two patient data workshops each year, alongside a programme of campaigns, networking, and presence at national events. We are increasingly being asked to 1) advise other organisations about patient engagement and 2) engage with organisations who hold patient data.
Most recently we have moved to deliver a series of webinars, open to all, but designed and delivered by and with our members.
“The strength of use MY data is the multiple patient voices that it brings to the discussion.”
Since the outset, our members have stated some clear, common, basic principles about how patient should be used. But with a wide and varied membership, not all members agree on all areas, as you might expect.
We very much see this spectrum of views as an asset, as we believe that we need a diverse set of views to help in discussions and decision making. So, whilst some traditional organisations might see conflicting views as a something of a challenge, we see this as a real benefit.
There are several principles which underpin use MY data and are about data being used for the benefit of patients and society:
Simple access to data
The patient voice
Recognition of the patient (citation).
“As a data user, most of our work relies on using data – it’s important we are transparent about how we use patient data.”
Anyone using patient data must ensure that this happens in a way that will engender trust. It needs to be done openly and transparently, be subject to challenge and allow individuals a choice in how their data is used.
The concept of transparency has grown in prominence but translating the concept into action is often seen as difficult.
As a researcher, what does it mean to be transparent? use MY data has produced a checklist, which we have encouraged people to use:
Accessible – easy access to information
Understandable – the right language for the audience
Relevant – addresses audience concerns
Useable – in a form that meets the audience needs
Assessable – is checkable/provides sufficient detail
Being as proactive with ‘bad news’ as with ‘good news’
Being timely with communication.
A particular frustration for members is the inconsistency about how controls, which are placed around patient data, are interpreted and applied. If all parts of the UK are subject to the same laws (the General Data Practice Regulation (GDPR) and Common Law), why do the different parts of the UK implement different policies about data access?
“Is the patient voice heard enough so that we learn from patient’s experience? To be honest, no.”
We believe that the patient voice should be included in all discussions about patient data and are actively working to make this happen.
It can be hard for an individual voice to be heard, so one very practical thing that use MY data does is to collate the voices of patients in response to national consultations. That has included consultations on topics of Artificial Intelligence, Trusted Research Environments and Private Healthcare Information and from organisations including the Information Commissioners Office, the National Data Guardian for England and the MHRA.
Because it has a philosophy of positive engagement and co-design, use MY data is seen increasingly as a trusted group through which patient opinions, views and indeed speakers can be sourced. Another practical thing we do is to work with the organisers of large events focused on patient data, and lobby for patient speaker(s) to be included.
Similarly, we highlight engagement opportunities to use MY data members – to sit on groups, panels and committees that are concerned with patient data.
“I think the policy makers should get out the message of the benefits of using the data and the controls around that data so that people feel better reassured.”
We have created a library of ‘case studies’ where patients, relatives and carers speak direct to camera, highlighting the benefits of using patient data correctly. We have seen these short videos being used in several different places, in the UK and beyond.
Then there is the point about recognition. Early on our members created the Patient Data Citation. The Citation acknowledges the use of patient data in analysis or research, highlighting that research is only possible because of patients:
“This work uses data provided by patients and collected by the NHS as part of their care and support”
Public Health England began to use the statement and it was then adopted by Understanding Patient Data, who were instrumental in spreading the message. It is now seen as a standard and has been widely adopted by national bodies, academia and commercial users.
“We need data and tissue and protein samples to enable us to do research which will lead to earlier diagnosis and better outcomes.”
Another example of action we have taken is The Issue with Tissue campaign. At our May 2018 workshop, members discovered that there is an extremely low use of donated human tissue samples – approximately only 15% of samples are ever used. Our members wanted to understand this, highlight areas for improvement and lobby for these improvements to be made.
We are working jointly with partners at the Medicines Discover Catapult and Incisive Health on this campaign, in which patients’ voices are directly influential. We have seen these voices and the report beginning to be used widely, despite the pressures of the pandemic response.
“Joining use MY data has helped me understand better the importance of my data. Research and all data should be shared.”
Our members are patient advocates who are either patients, relatives or carers.
We have another layer of membership – our associate members who are united by their interest in supporting our work. This are clinicians, researchers, charity workers, academics, public and commercial sector workers.
As an inclusive movement, we encourage new members from all backgrounds, so that we can collectively build confidence in the use of patient data, to save lives and improve outcomes.
Our current areas of focus are around the uses of data in relation to the COVID-19 response and in particular whether the positive advances made in terms of data collection, linkage and access in all parts of the UK can be retained in a safe, open and transparent manner, improving the experience of researchers with support of patients.
“We need to be clear and concise on the benefits, risks and how these will be managed, creating a clear strategy for engaging patients and the public.”
We will only realise the potential benefits of patient by ensuring the patient voice has a fundamental role. And all this has to be done in an open and transparent way. That remains the aim for use MY data.
As one delegate from a recent workshop commented; “Despite the challenges, I think more could be done to help researchers and less blaming of them please.”
So, if we had to summarise some key points for early career researchers, we would give three things to remember and take away:
Don’t be afraid – the patient voice can often be the solution, not the problem
Be transparent in all that you do – the benefits of transparency outweigh the difficulties – and communicate this clearly.
Whenever you use patient data, please Say what you do, and do what you say.
In this post, we hear from Roger Halliday, Chief Statistician at the Scottish Government and lead on the new Research Data Scotland (RDS) service. RDS aims to make better use of existing administrative, public sector data in Scotland, with the ultimate goal of improving the well being of the people of Scotland.
We managed to catch up with Roger and ask him a few questions on the current progress of RDS, how COVID-19 has impacted this progress, how researchers can engage more with government, and much more. Thanks to Roger and his team for answering our questions!
If you want to jump to a specific question you can click on the questions below. Otherwise, just keep scrolling!
Q: What are the main challenges being faced in Scotland in terms of reaching its full potential for producing quality research using routine data?
Scotland has been enabling high quality research in a secure and ethical way for many years. However, it can be unclear what public sector data is available for use in research and this data can be of unknown or poor quality. It can also take too long to access data with it being dispersed between and within public sector organisations.
Q: How do you think Scotland can contribute to research using administrative data?
Scotland has a rich and varied data ecosystem as well as access to a unique talent pool with the skills and experience to allow us to maximise the value of our data and deliver award-winning research projects. Thanks to investment from the Edinburgh and South East Scotland City Region Deal we are also investing in a 10-year data-driven innovation programme.
The Scottish Government, alongside Scotland’s leading academic institutions and public bodies are committed to facilitating secure data sharing for research in the public good. We want to work with data controllers and users to improve the quality of data for research while also making access more cost-effective, faster and more streamlined. At the same time, we need to ensure there is ongoing trust, support and feedback from the public.
Q: How will Research Data Scotland help to overcome those challenges?
RDS’s overall mission is to improve the economic, social and environmental well being in Scotland by enabling access to, and linkage of, data about people, places and businesses for research in the public good.
To underpin RDS’s mission, we have outlined the following principles:
RDS will only enable access to data for research that is for the public good
RDS will ensure that researchers and RDS staff can only access data once an individual’s personal identity has been removed
RDS will ensure that all data about people, businesses or places is always kept in a controlled and secured environment
RDS will only create a dataset if it is requested for a research programme or study that is in the public good
All income that RDS generates will be re-invested into services to help researchers continue to access data
Firms that access public data for the public good through RDS will share any commercial benefits back into public services
RDS will be transparent about what data it provides access to and how it is being used for public benefit
In the development of RDS, as we are working through the service design, we’re speaking to researchers across Scotland to understand and test out approaches to addressing these challenges. These solutions may take some time to be realised, but RDS will put us on the right path and take leadership in seeing this through.
Q: How has the development of RDS been impacted by COVID-19?
In response to COVID-19, we have accelerated the development of RDS by delivering the COVID-19 research data service. We have achieved this by bringing together expertise, resource and capabilities from a range of existing data-led programmes across the public sector and by working closely with universities across Scotland. This service aims to provide consolidated guidance on accessing data, information governance and the analytical environment to support COVID-19 related research. It accelerates the work that was already underway to support the development of Research Data Scotland as well as ADR Scotland. This is an initial RDS baseline service to support Scotland’s response to the pandemic and will evolve as the management of the current public health emergency progresses.
RDS is building on and repurposing the existing data infrastructure in Scotland. This includes resources, expertise and capabilities offered by our service delivery partners and partner organisations, including Public Health Scotland (eDRIS), National Records of Scotland, The Edinburgh Parallel Computing Centre and HDR-UK, alongside accredited facilities, such as the Scottish National Safe Haven while also working in partnership with Scottish Universities.
Once fully established, RDS will provide a single point of access to help researchers access a suite of key data from across the public sector.
Q: How are you engaging with the public during the set-up of RDS?
We want to ensure that there is ongoing trust, support and feedback from the public as we build the service with a flexible and modular approach. To do this, we want to continue to gather feedback on its design and implementation from data controllers, users and the general public as we build the service. We have set-up a website – ResearchData.Scot – where we will share information on the new service as well as opportunities for the public to provide input as they become available. There is a wealth of useful information and resources already provided by the eCRUSADers platform and our intention is that RDS will complement the valuable work already done in this space. We are also working with the Scottish Centre for Administrative Data Research (SCADR) public panel to seek their feedback on RDS’s approach and objectives.
Q: Are there any other countries in particular that you think Scotland can look to as an exemplar in this area?
We have looked at a range of other countries including Wales, Singapore, Australia, Canada, Denmark and New Zealand in order to understand and learn from their experiences in developing solutions to similar challenges. This has shown us that a successful data-led research programme must be built with the support of data controllers and users throughout its development. Any new service must be cost-effective, faster and more streamlined and there must be ongoing trust, support and feedback from the public established throughout.
Q: How can we promote collaborative working between academic groups and analysts working within the public sector & government?
The Scottish Government and SCADR form the ADR Scotland partnership that aims to enable government policies and interventions to be informed by the best evidence available through the use and analysis of administrative data.
Specifically, ADR Scotland is developing a new model with the creation of curated, themed data sets that are maintained and used repeatedly to answer new and different questions. This is a sustainable research resource that represents greater value for money, and more efficient use of data already collected.
Research is divided into key themes called Strategic Impact Programmes (SIPs) which are designed to address the key social challenges identified in the Scottish Government’s National Performance Framework (NPF), as well responding to policy priorities in the UK more broadly. We are committed to sharing our research findings in a form that is easily digestible and useable by government policy makers and wider society, as well outputs for academics and the data community.
“Ultimately, health data is collected by people, from people and for people. If researchers want to be trusted with data, we should trust people to help us shape the rules so that the data revolution in healthcare benefits everyone.” Natalie Banner, Nature Medicine, 2020.
The post is the first of a series of posts, in which we will hear from UK based organisations who are working in this area and patients themselves.
Background: using patient data in research
Patient data, for example hospital records and GP records, is collected as part of routine National Health Service (NHS) care. They constitute one of the largest sources of health data in existence. Over the years, researchers, policy makers and others have sought to harness their potential in carrying out evidence-based research, seeking to enhance our understanding of disease, improve patient care and service delivery. At no other time has using patient data for research been more in the spotlight than under the current COVID-19 pandemic.
As researchers, we have a duty to ensure that we recognise the individuals who sit behind that data. But even more than that, we should seek to involve patients in our research, because really, who understands what they have experienced better than them?
As an Early Career Researcher (ECR) working with patient data, and coming from a non-clinical background, appreciating the individuals behind the ‘numbers’ is not something that my training in Econometrics prepared me for. Of course, my primary motivation for pursuing my career in health research is to make a difference to individuals lives. Nonetheless, it is all too easy to become buried in the methods, producing fancy charts and output displaying significance stars, that the people behind the numbers become blurred in the background.
The use of patient data by social scientists and ECRs – who are often limited in resources, contacts and time- is becoming more common. With this comes an increased need to ensure that those researchers know how to recognise and include the patient voice in their research, and how to be transparent about their uses of patient data.
In what follows are some questions and answers from UPD’s Communications Officer Grace Annan-Callcott, who kindly agreed to talk to eCRUSADers about using patient data in research and in particular about public/patient engagement.
A conversation with Understanding Patient Data
How much and what sort of public/patient engagement work does UPD do?
“We tend to commission public engagement work on issues around patient data. Our aim is to feed the insights we get through to policy and practice”.
Grace pointed me to a couple of recent things they have been working on. Firstly, the Fair Partnerships Report, one of UPD’s “largest pieces of engagement work to date, which looked into what the public thinks about different kinds of businesses and organisations using NHS data”.
Secondly, “a new project with the National Data Guardian and Science wise, which will explore ‘public benefit’ and shape new guidance to make public benefit assessments more consistent across both health & social care”.
So, how do patients feel about the use of their data?
“I’d suggest having a look at the fair partnerships report, or this deck where we collected lots of public attitudes research together in one place. It needs an update with a few bits of recent research, but many of the key messages still stand”.
The Fair Partnerships work was a mixed methods public engagement programme consisting of round table discussions, citizen’s juries and an online survey (completed by just over 2,000 adults from across the UK). A key finding of the report was that “all NHS data partnerships must aim to improve health and care”. I believe this point will resonate with many ECRs, who often have difficulty in demonstrating “How will your research benefit the public?” Will our PhDs or first post-doc research projects actually translate into patient/public benefit? We can get ourselves all worked up when writing applications to use patient data, trying to demonstrate and perhaps exaggerate the public/patient benefit of our research. Could making false promises undermine trust further?
Whilst we are entirely motivated by the hope that our early career research will translate into public/patient benefit, it is likely that it will not, at least not to begin with. But as ECRs working with administrative health records, we discover things that we did not set out to, we develop skills in analysing complex data sets, we generate new research questions, all of which could have patient/public benefit in the future. That being said, the responsibility lies with us to be both realistic and transparent about the aims of our research and the potential public/patient benefit that it could have. After we have carried out our research, we must be transparent and document what we have learned and how that learning will go on to contribute towards patient/public benefit at a later stage. We need today’s ECRs to be trained in analysing patient data, otherwise tomorrow’s patient/public benefit might not emerge.
Have you done any public/patient engagement with Scottish patients?
“We haven’t done work with Scottish patients yet, and we’re very aware this is a gap for us. It’s something we’d like to do in the near future”.
It is great to hear that UPD are hoping to do work with Scottish patients. I am not aware of any groups in Scotland who are carrying out similar work with public/patients across the board (do get in touch if you are!). For now, can we assume the views from the Fair Partnerships participants would also hold for the Scottish population? As Research Data Scotland (RDS) looms on the horizon, it appears Scotland has much further to go in terms of gathering views from the public on how their data is used.
Should all researchers working with administrative health data do public/patient involvement?
“I’d suggest patient and public involvement is always valuable, to shape the work you’re doing too, not just because ‘it’s the right thing to do’. With the fair partnerships report, we got lots of incredibly interesting and useful insights on a challenging topic”.
In an ideal world, we would carry out public/patient involvement in our PhDs and post-docs. However, ECRs may have limited contacts, resources and time, meaning it might not be feasible to do so. In particular, if you are working with a large national dataset, would it be realistic to capture representative views of the country on how you plan to use their data?
Well maybe not, but there are other things we can do. For one, Grace pointed out that “use MY data have created a data citation to help researchers acknowledge the contribution patients make to research”. This citation is a means to show gratitude to patients for allowing researchers access to their data, as well as enhancing the visibility of that use.
Another thing that crossed my mind was getting someone you know, with no knowledge about the research you are doing, to read your research proposal. Can they see the public/patient benefit in what you are proposing to do?
The outbreak of COVID-19 has clearly pushed the use of patient data into the headlines and accelerated the use of patient data in research (see the OpenSAFELY project in England). I asked Grace if UPD feel this presents an opportunity to demonstrate how we can safely and successfully use patient data in research or a challenge to maintain public trust in the use of their data?
“Interesting question. I think it does both! It’s brought a lot of visibility to how important patient data is both for making decisions about public health and for research, especially research for new treatments or therapeutics. Patient data has never been more present in the national discourse as it is now. However, it also presents risks to public trust, as decisions are being made quickly, and sometimes with not enough timely transparency from government to the public about what’s happening”.
Are there any other UPD resources that you would recommend to eCRUSADers working with Scottish administrative health data?
“Definitely the data citation – it would be great if you’d be able to use that on your research papers and communications.”
“Also, lots of partners say this is really helpful – it’s research we did into the best words to talk about data, to make what’s happening more accessible”
Thanks very much for taking the time to answer these questions Grace. There’s clearly some great work going on at UPD and there is definitely a lot that researchers who are working with patient data can learn from that work. It would be great to see more public and patient engagement work on using patient data in Scotland- if anyone reading is familiar with any then do get in touch!
Look out for our next People Make Data post where we will be hearing from useMYdata.
Date of course: Wednesday 11 March 2020 Organised by: Wellcome Trust Clinical Research Facility Post summary: In this post I provide a run through of the course: The Whys and Hows of applying to the Public Benefit and Privacy Panel for Health and Social Care (PBPP). As the title suggests, the course – delivered by PBPP Manager Dr Marian Aldhous – covered two main areas: Why would you need to apply to the PBPP and how would you go about doing this. My thanks go to Marian, who has kindly let me use her slides to write this post.
In a rush? Check skip to the Top Tips for filling in your application and some of my reflections on the course (where you will also find links to an example Tooth fairy PBPP and associated documents!).
PBPP is a combination of a patient privacy panel and an information governance panel. They were set up by the Scottish Government eHealth to provide a single, consistent, open and transparent scrutiny process for health data to be used for different purposes, including research.
They exist to ensure the right balance between safeguarding the privacy of people in Scotland and the duty of Scottish public bodies to make the best use of data. PBPP provide leadership in the complex privacy and information governance domains so that:
Scottish people gain the benefits from the use of data
Emerging information risks are managed
Public concerns around privacy are addressed
Protection of privacy in the public interest is promoted
They have a scrutiny role on behalf of patients with respect to the information you are going to find out about the patient, in work that is not related to their direct care and information not in the public domain. They seek to check if the use of the data is justified, reasonable and will it achieve its purpose. Further, they want to scrutinise how damaging it would be if the information was leaked.
They are there to ensure that applicants have considered the public benefits and privacy implications for participants and their data. Moreover, they are there to provide assurance of the ‘technical and organisational arrangements’ to ensure respect for the data minimisation principle (GDPR Article 89(1)).
What was really clear from Marian’s presentation on the role of PBPP was that they are not there to trip applicants up or to prevent work from going ahead.
2. What is the legislation and principles covering aspects of information governance for the use of NHS Scotland data for purposes other than direct care?
The UK Data Protection Act 2018 applies when processing (that basically means using or storing) personal data for living individuals, this includes pseudononymous data.
For personal data
For the lawful processing of personal data we look to Article 6(1) of the GDPR which states that the processing of personal data is lawful only if and to the extent that at least one of the following apply:
a) The subject has consented b) Performance of contract c) Compliance with legal obligation (under specific legislation) d) Protection of vital interests i.e. to save someone’s life e) Performance of a task that is in the public interest f) Legitimate interests of controller
Point (e) is the most common legal basis used for the processing of personal data given in PBPP. Note that there are very good reasons why the others are NOT used. Specifically, consent for taking part in research, under the Research Governance Framework, is different from consent obtained for processing data under GDPR. This is one of the reasons you are NOT encouraged to use consent as their legal basis under 6.1. or 9.2. Also, legitimate interests can only be used by non-public authority / sector bodies (commercial or charities).
So, 6.1(e) is the most common because it is the most appropriate for the tasks usually covered by PBPP applications.
For sensitive personal data
For the lawful processing of special category sensitive data, we look at Article 9 of the GDPR:
(1) Processing of personal data revealing:
racial or ethnic origin, political opinions , religious or philosophical beliefs , or trade union membership , and the processing of genetic data, biometric data, data concerning health (physical and mental) or data concerning natural person’s sex life or sexual orientation shall be prohibited.
(2) Paragraph 1 shall not apply if one of the following apply: a) Subject has given explicit consent b) Necessary for obligations and rights of controller /subject for employment or social security c) Necessary for vital interests of subject d) Legitimate activity of non for profit body for political, philosophical, religious or trade union aim e) Data made public by the subject f) Necessary for legal claims or judicial capacity of courts g) Substantial public interest h) Preventative or occupational health, assessment of working capacity of employee, medical diagnosis, provision of health and social care i) Public interest in public health j) Necessary for archiving in public interest, scientific or historical research purposes or statistical purposes in accordance with article 89(1). (Article 89(1): subject to appropriate safeguards for the rights and freedoms of the data subject.)
The most appropriate basis chosen depends on the purpose of the application. If your application is for the use of health data, it would usually be covered by one of 9.2(h), 9.2(i) or 9.2(j), as these are the bases linked to health. For applications looking at NHS/medical processes (e.g. audits, health care planning or service improvement) then 9.2(h) would be used. For public health or infection control, you would most likely use 9.2(i). For any research, 9.2(j) should be used. If you are ever in doubt about this, you can always talk to your eDRIS coordinator to get advice.
The Common Law Duty of Confidentiality also applies to personal data that are not already in the public domain, for example patients have shared personal medical information with their GP and they expect it to be kept confidential. The Caldicott Principles and Data Protection Principles outline the special circumstances under which this information can be shared.
The PBPP replaces the Privacy Advisory Committee (which covered research), National Caldicott Scrutiny Panel (which covered both research and non-research), and CHI Advisory Group (which also covered research and non-research).
PBPP have the authority to scrutinise applications for the use of NHS Scotland controlled data and National Records of Scotland controlled NHS Central Registry data for research, healthcare service planning and improvement, audit and other well defined and bona fide purposes. This scrutiny covers the whole process from patient to data provision/analysis.
There is a single PBPP form for all applicants. Detailed guidance is also given to fill in the form (this is covered in the second part of this post). Entry to PBPP goes through the Electronic Data Research and Innovation Service (eDRIS). The eDRIS team provide advice to applicants on the data sets and variables that are available. They also advise on the capability of that data to meet the objectives of the applicants proposal. Further, they provide help to fill in the PBPP form itself. They also work closely with the PBPP team when helping applicants prepare their applications. The eDRIS team work on the provision of data from different sources and organise access to the Safe Haven and carry out disclosure checks. Finally they offer support for data analysis. Clearly, a very busy team that cover a wide range of areas! The diagram below outlines these roles:
Note as well that there are two PBPP’s- a health one (or health and social care PBPP) and a stats one. All Non-NHSS (External) data go to the stats PBPP (S-PBPP). This includes ScotXEd education data, NRS census data (which takes a minimum of 6 months for data after S-PBPP approval), social care data, HMRC and DWP data (though possible in theory, you are unlikely to be able to obtain this but that’s another story…). There tends to be longer time frames involved for getting approval for external data sets.
So, the whole process (or the eDRIS sandwich) looks like:
I found this diagram really helpful in providing a picture of how the scrutiny process works. All applications go to Tier 1. Around 5 applications are scrutinised every fortnight (in 2017/18, the panel saw 136 applications). They are assessed according to a proportionate governance traffic light system relating to the criteria set out in the PBPP application. Those assessed as Green are all OK at Tier 1 and are approved or approved with some conditions e.g. ethical approval to be obtained. Sometimes the will require clarification of minor points/changes to the form which would then be checked by the PBPP manager and approved. Those that are Amber (medium risk) may need further clarification from applicants. Those responses will need to be reviewed by the same people who reviewed the application at the panel meeting; this happens by email and the panel does not meet again. Those that are classed as Red have issues that cannot be tolerated, they are referred to Tier 2, with or without clarification. Applications can also be referred for a re-submission due to too many major changes being needed. Amendments can also be made after approval but this should be the exception. Any amendment must be within the original scope of approval. They can be made for things like change of institution, addition of variables, changes to storage location/mechanisms etc. Amendment forms are available on the PBPP website and must be submitted via your eDRIS coordinator.
6. How long is your PBPP application going to take?
This is the question we all really want to know the answer to, especially when we are planning projects with limited funding. The timing can be split up into three puzzle pieces:
This stage of the process is mainly down to you (at least once you have been allocated an eDRIS coordinator). The time taken in this stage depends on the number of iterations needed in your application, so making sure you have been thorough and clear when first filling it in will help. It will also be influenced by the complexity and clarity of the project- you’ve got to be incredibly clear and concise when outlining your research plans. Top-Tip: use diagrams where you can!
PBPP submission to PBPP approval
This part of the process is mostly very well defined and evidence is available on these timings. The figure below shows data from the 2017/18 PBPP annual report. Clocked days is the number of working days the application is being processed by the PBPP. The time for applicants to respond to any queries regarding the application is not included in clocked days. The ‘total’ number of working days from submission until the final decision is made, includes any time spent back with the applicant.
The Tier 1 panel meet every fortnight and see 5 applications. The timing for PBPP scrutiny and review is dependent on the number of iterations the application needs to go through and the speed of panel members responding. The complexity and clarity of the proposal are also important factors which could affect the time to approval. Tier 1 is faster than Tier 2 (they meet less often and by definition your application will have already been through Tier 1 processes).
This appears to be the most uncertain part as it depends on so many factors. These include, the waiting list for an eDRIS analyst, if you are requesting data from different sources. The timing is also affected by the overall complexity of the project, the amount of data required and the requirement for data sharing agreements.
7. How to fill in your application according to the 5 Safe Principles
So, we know that the PBPP are there to weigh up the public benefit versus the privacy risk of applications. They carry out this assessment by considering the Five Safe Principles which coincidentally correspond to sections in the application:
When you are filling in your application you must demonstrate how you meet the 5 Safe Principles. In what follows, I outline the main questions that PBPP ask you to answer in your application. Some of them overlap somewhat and they should not be treated as a complete check list (every project is different!), but they will help to ensure you demonstrate the 5 Safes.
The PBPP will be looking for:
Who has access to the data?
Who needs to know? Caldicott Principle 1!
How responsible are the applicants/analysts?
What is their knowledge and experience?
What training do they have?
IG training is required for an application (applicants, PHD supervisors, clinical leads, data custodians and anyone who is accessing patient level data (including pseudonymised data) needs to have up to data IG training)
Links to possible courses are on the PBPP website
Training must be renewed every 3 years
Who is responsible to ensure the applicants do what they say? Accountability principle!
The PBPP will be looking for:
Which organisation is responsible for the data?
Which organisation is the data controller? Affects main contact, which DPO should be consulted, purpose of the proposal
Responsible for the data
Researchers with NHS / University contracts
Who will keep the researchers accountable?
Does this change at different points in your proposal?
How safe is each organisation?
Is it a known public organisation / charity /company?
Write about the whole process- from patient to data analysis.
Is the use of data necessary? Can it be done another way?
Be clear about variables requested
Bear in mind the principles of data minimisation
Justify the need for every single variable
Is the project ethical?
Where will the data go? Who will access it? Top Tip: Use flow diagrams! This can really help you to see what agreements will be needed, between which organisations.
What is the population for which data requested?
Would they expect their data to be used for this purpose?
How will the processing take place?
Is the processing lawful, fair and transparent?
You MUST state the legal basis for processing data. GDPR Article 6(1) for personal data (including pseudonymised data) and GDPR Article 9(2) for special category data.
How will the rights of the subjects be upheld?
What is the public benefit?
Has the applicant carried out any public engagement? (may not apply to all applications)
Have lay people been involved in the project design? If not, why not?
Do the public see the benefit in the project you wish to do?
Would they feel that the types of data requested are reasonable?
Has any peer review of the proposal been carried out?
Has there been a review from ethics?
NHS REC opinion
University ethics committee
Has the applicant assessed the privacy risks?
Have they carried out a Data Protection Impact Assessment? Note that this can be a legal requirement, depending on the nature of the processing. If not, why not? (It’s good practice to do this and a lot of it overlaps with the content required in the PBPP).
If you are a data processor, you will need a Data Processing Agreement setting out the processing instructions.
Approvals from out with Scotland
Approvals from another Data Controller for linkage to non-health data.
The PBPP will be looking for:
How identifiable are the data?
Are identifiers used for processing only? Make this clear!
Do combinations of variables make individuals identifiable e.g. rare diseases in small populations?
Are the data anonymised or pseudonymised?
Are the data highly sensitive?
Are you adhering to the principles of data minimisation?
Are the data relevant?
Too much data? Are all variables necessary? Can you use partial or derived variables?
Too little data? Will they fulfil the aims?
Justification for requesting these data variables
Are all the details necessary e.g. full dates, full postcodes?
What will happen to the data at end of project?
What are the sources of data requested?
For new data
How is it being collected?
Who is the data controller?
For existing datasets
Who are the data controllers?
If not NHSS do you have permission?
Who is carrying out the cohort identification and/or data linkage and how? Should be by third party.
How do individuals know about the use of their data?
What would individuals expect you to do with their data?
Participant information leaflets
Privacy notices on NHS Board websites
Generic NHS leaflets/website links
From where will the data be accessed?
Will it be accessed in a Safe Haven? This is what NHS Scotland prefers!
If not in Safe Haven, why not? Consider:
How secure is the data collection process?
How secure is the transfer of data?
Will the data be accessed securely (data protection principle 6)?
Will it be accessed remotely?
Can anyone see over your shoulder?
Will the data be pseudonymised?
How will access be monitored?
Will the data be transferred securely?
Will the data be stored securely?
For how long?
Will it be destroyed? If so how?
What will be the outputs of the analysis?
Disclosure control. Beware small numbers! Groups < 5-10
Who will do disclosure control?
How aggregated is the data?
How identifiable is the data within the outputs?
Is there any confidentiality risk from publication?
What will happen to the data at the end of the analysis and at the end of the project?
9. Group discussions and reflection on the concerns raised
The general feeling in the room was that the course was very helpful. However, there were concerns raised by some participants. One concern was around ethics and knowing what ethics is required. It seemed some were confused as to what ethical approval they required and they felt they were filling in a lot of forms. I disagreed with this, as an academic who has worked with administrative health data, the ethics side of things was actually the more straightforward part. But I’d be keen on hear others views on this. It’s no surprise that another concern was on timing, but clearly timing depends on so many factors which are highly individualised to specific projects.
On timings, we have those three pieces of the puzzle: writing your application to submission; submission to approval; approval to data access. The middle piece is very clear, at least for the majority of projects, and timings are published in the PBPP annual reports. The other two depend on many external factors. What can we do to influence them?
Puzzle Piece 1: Writing your application.
I’d strongly suggest taking this course or reading this blog post (hey if you’ve read this far, you’re already part way there!). If you’ve done the background work thoroughly and you write a good application, it won’t need to go through as many iterations with your eDRIS coordinator and you will save yourself some time and make the lives of eDRIS easier.
This is the tricky piece and the timing at this stage will vary hugely from project to project. At least, that’s what I assume. But the truth is, we don’t really know. So what can we do? This is one of the reasons I set up eCRUSADers, to try and build up an understanding of the time it will take to get access to data. But realistically I doubt every PBPP applicant is about to come forward and share their experiences with us. One suggestion might be to publish data at the point of data access which outlines clearly the data sets/variables requested and the time timelines for the three parts of the puzzle. This could take the form of simply the PBPP application or just a table filled in with those timings. Alternatively, end of project reports could be made available which detail this information.
Once we know the timing from approval to data access, as well as the factors which might influence them e.g. what data sets are requested, how many years, etc, we would be better equipped to plan for research projects which have limited timelines.
Overall, The Whys and Hows of Applying to the Public Benefit Privacy Panel for Health and Social Care is a very useful course and I’d recommend you get a space on it if you are thinking about using Scotland’s administrative health data. It will take you half a day but it could save you much more time in the long run. I’d maybe even go further and say that it should be compulsory…. The PBPP is not there to trip you up, it’s there to ensure the balance of public benefit and privacy risk. They are on our side and just as keen to make the processes easier and quicker as we are. Timing remains our biggest challenge and there are bits and pieces we can do to speed things up. Having said that, the biggest timing challenge we face is from PBPP approval to data access. Unfortunately, there is little we can do to influence this and that has to change.
Anonymous data are not able to identify any individual in the data. Removal of identifiers does not necessarily make the data anonymous. In anonymous data, no combination of variables would allow an individual to be directly or indirectly identified. Anonymous data is irreversible. It is not subject to the Data Protection Act 2018.
Controllers are the main decision-makers – they exercise overall control over the purposes and means of the processing of personal data. If two or more controllers jointly determine the purposes and means of the processing of the same personal data, they are joint controllers. However, they are not joint controllers if they are processing the same data for different purposes. Controllers shoulder the highest level of compliance responsibility – you must comply with, and demonstrate compliance with, all the data protection principles as well as the other GDPR requirements. You are also responsible for the compliance of your processor(s). (from the Information Commissioner’s Office website)
Processors act on behalf of, and only on the instructions of, the relevant controller. Processors do not have the same obligations as controllers under the GDPR and do not have to pay a data protection fee. However, if you are a processor, you do have a number of direct obligations of your own under the GDPR. (from the Information Commissioner’s Office website)
Any information which either alone, or combined with any other data leads to the identification of individual(s). This could be a name or phone number, IP address or cookie identifier.
Pseudonomymous data are data that have been altered so that no direct identification of any individual can occur. However, additional information is held by you or someone else that allows the identification of an individual. This is personal data and is subject to the Data Protection Act 2018.
Personal data which are subject to more scrutiny when determining the lawful processing. They include things like race, ethnicity, medical conditions (physical and mental), sexual life, religion, philosophical beliefs, politics and trade union memberships, criminal convictions/alleged offences, genetic and biometric data. (from the Information Commissioner’s Office website)
In this post, Catriona Keerie, Senior Statistician within Edinburgh Clinical Trials Unit (ECTU) talks to us about her work within ECTU and her involvement on a rare Scottish trial that used administrative health data. She provides some great diagrams to help along the way, which I can tell you are essential if you want to understand the complicated structure of the data! Catriona also highlights some of the key challenges the team faced in terms of data access and use and offers her reflections on what they learned from the project which could help other trials like this one in the future.
Can you tell us a little about your role in ECTU?
My role involves a variety of tasks – however, primarily my role is the statistical reporting of trials run from within ECTU. I typically have up to eight active trials throughout the year. My role varies on these – I am Trial Statistician for approximately half of them, and the ‘reporting’ statistician for the other half. When I have my reporting statistician hat on, I’m responsible for the statistical programming and generating the analysis and results.
How trials have you worked on that have involved using administrative data?
Since I joined ECTU in 2014, I have worked on three trials using administrative data. Two of them used solely routine healthcare data and the third one is running currently, based on a blend of routine data plus data captured within the trial.
Is the use of administrative data in trials becoming more common over time?
The use of administrative data in the trials setting is definitely becoming more common since clinical trials are known to be expensive and time-consuming. The use of administrative healthcare data is viewed as a more efficient means of understanding the health of the population using readily available data. However, there is a trade-off in terms of the quality of the data being captured.
It’s a relatively recent study design that’s increasingly being used to evaluate service delivery type interventions. The design involves crossover of clusters (usually hospitals or other healthcare settings) from control (standard care) to an alternative intervention until all the clusters are exposed to the intervention. This differs to traditional parallel studies where only half of the clusters will receive the intervention and the other half will receive the control. This diagram helps to demonstrate the difference in designs:
The population of interest were patients presenting in hospital with heart attack symptoms. The trial sought to test a new high-sensitivity cardiac troponin assay against the standard care contemporary assay. Specifically, to test if the new assay could detect heart attacks earlier and with a more accurate diagnosis.
How were patients enrolled into the trial and how does this differ from a standard trial?
Step wedge trials usually randomise at a cluster (hospital) level, rather than randomising patients individually, so this was the main difference to a standard trial. So patients were enrolled rather than randomised into the trial. Standard trials require patient consent before randomisation, but in this context, individual patient consent was not needed due to the randomisation being performed at hospital level. Appropriate approvals for consent were sought through the hospitals.
If patients presenting with heart attack symptoms at any of the hospitals were eligible for the trial (based on our pre-specified inclusion/exclusion criteria), then we had permission (at hospital level) to include them in the study and use their securely anonymised data.
How many patients were enrolled into the trial?
Approximately 48,000 patients were enrolled from 10 hospital sites in NHS Lothian (3 sites) and NHS Greater Glasgow and Clyde (7 sites), over a period of just under three years.
Which administrative data sets were used?
We used a total of 12 distinct data sources which were a combination of general administrative datasets and datasets more specific to our area of research from locally held electronic health care records. Prescribing data was obtained from the Prescribing Information System, also ECG data, plus general patient demographics. Trial-specific outcome data was obtained from the Scottish Morbidity Record (SMR01) and also from the register of deaths (National Records of Scotland). All data were captured separately for each Health Board – there is currently no amalgamated data source which holds all data. Health Boards are the owners of their own data.
The main linking mechanism for these 12 data sources was the patient CHI (Community Health Index) number. To ensure patient anonymity, CHI numbers were securely encrypted prior to use.
How did you get approval for these data sets? How long did this approvals process take?
Approvals were required at a number of levels. We required ethics approval, approval to use patient data without consent and Health and Social Care approval (through the Privacy Approvals Committee, predecessor to the Public Benefit Privacy Panel). There were also health board specific approvals required for local data to be released. In addition, we required data supplier approval. Finally, approval was needed for the data to be hosted on the Safe Haven platform.
This process was long! This was ongoing throughout the duration of the trial. Although the data was being captured automatically via routine records, the final dataset wasn’t confirmed until relatively late on in the process due to complexities of mapping locally held healthcare records. One of the advantages of the national datasets is that they are the same across all health boards.
Where were the data sets stored?
Datasets from NHS Lothian and NHS GG&C were supplied separately in their own Safe Havens. The combined dataset was hosted on the NHS Lothian Safe haven space on the National Safe Haven analysis platform .
How did the linkage of the data sets happen?
The data sources from both health boards were combined and hosted on the National Safe Haven analysis platform. This wasn’t a straightforward process. Although we’d anticipated capturing exactly the same patient data across both health boards, the reality was quite different.
Data were captured in different formats with different variable names and different definitions. So there was an unexpected element of data cleaning required before the data could effectively be merged into one large analysis dataset.
The final linkage was done using the securely encrypted CHI number for each patient.
What do you see as the major benefits of using administrative data in this setting?
Use of administrative data in this context is a more efficient process – less resource spent on the administrative aspects of trial enrolment e.g. capturing demographic details such as age, sex, postcode or medical history.
Using administrative data also gave us the opportunity to research a large representative patient population in comparison to the setting of an RCT where a strict pre-specified population, not necessarily representative of the target population, are studied.
Overall, what were the major challenges of the study?
From the data side of things, ensuring the correct data was extracted was difficult. The diagram above is very over-simplified view of what happened! The reality of picking up the required variables from two separate health boards which capture data very differently was difficult.
Another challenging aspect was ensuring that a patient wasn’t enrolled more than once in the study. Patients can present in any hospital with heart attack symptoms more than once, so we needed to ensure they weren’t included in the study each time they came to hospital. This required a de-duplication algorithm using encrypted and de-identified patient data.
However, I think the biggest challenge was for those in the team tasked with obtaining the correct approvals. It was underestimated how complex this would be. While approval for the national datasets was straightforward and the eDRIS team were very helpful, processes for locally held data at the time of trial set up were not established. Legislation around patient data confidentiality was continually changing, so we were faced with keeping abreast of new legislation as time progressed. The safe haven networks are now more established and hopefully, the processes are more straight forward.
Is there anything you would do differently next time?
I think the data validation aspect of the trial is crucial. Ideally we would have had more time spent on this in order to ensure the data was as correct as possible. Involving the clinical team much sooner in this process would have helped – they have a really important role to play in terms of ensuring the data picked up makes sense from a clinical perspective.
For High-STEACS, the access to the data was highly restricted and did not include the clinical team. Many of the data discrepancies were only picked up at the final review stage once data and results had been released out of the Safe Haven area.
Working within the Safe Haven environment creates time lags on both sides of the process – data being imported into the Safe Haven and also results exported out at the end take time. We hadn’t considered this time lag when working to tight timelines.
Do you know if anyone is using the learning from this trial for future trials of this kind?
The High-STEACS trial was directly followed by the HiSTORIC trial, addressing similar research questions and using many of the same data sources. So we have been through the loop again which has made for a more streamlined process. Other trials within ECTU are also making use of the learnings from High-STEACS, particularly from the governance and approvals side of things.
Thanks for sharing this with us Catriona! It is great to see that administrative data are being utilised alongside clinical trials in Scotland. It is also interesting to hear that despite being part of a trials unit like ECTU, the High-STEACS team still faced many of the same challenges that we eCRUSADers have experienced when using administrative data for research. In particular, we can relate to the issues of permissions, timing and working within the Safe Haven environment. Overall, it seems that the timing issues were due to the use of the locally held data rather than using the national data.
It’s a new year and this week we hear from a new researcher, namely, Dr David Henderson. David is a Research Fellow at Edinburgh Napier University and Scottish Centre for Administrative Data Research (SCADR). He is no new face to the eCRUSADers scene and has built up a wealth of knowledge and expertise in the administrative data sets he has worked with over the last four years. In particular, David has worked closely with the Scottish Social Care Survey (SCS), both at local (Renfrewshire Council) and national level. His PhD work utilised the national SCS linked to Prescribing Information System data, Unscheduled Care Data Mart and the NHS Central Register. Additionally, David has worked with the Scottish Programme for Improving Clinical Effectiveness in Primary Care (SPICE – PC) data.
In this post, David describes his PhD work and provides an outstanding demonstration of the wealth of knowledge that research using administrative data can offer. He also gives us an insight into some of the unexpected externalities that can significantly impact project timescales, but which are hard to plan for. Similarly to our previous Researcher Experience posts from Dr Catherine Hanna and Matthew Iveson, David highlights timing as one of the major difficulties he has experienced throughout his research career using administrative data.
David’s positivity emanates throughout this blog post and he does an excellent job at echoing the feelings that I hear time and time again from researchers in this area. Those are, a genuine understanding of the need for the legal processes in place to protect patient data, coupled with frustrations with the parts of the processes which inhibit researchers abilities to use this data to its full potential, all together with a positive attitude that things are slowly but surely improving. As David points out, things are changing in Scotland and we look forward to hearing very soon from the Chief Statisticain Roger Halliday, on the Scottish Government’s plans for the new Research Data Scotland.
Brief overview of my research
Using the linked data set described above, the focus of my research has been investigating the association between multimorbidity (more than one long-term condition) and social care receipt. I am also analysing interactions between health and social care services, with a particular interest in unscheduled care.
Good social care data has been difficult to come by in the past – not just in Scotland, but internationally. I have been lucky to be one of the first group of researchers to get access to the Social Care Survey collected by the Scottish Government in a format that can be linked to health-based data sources.
So far, provisional results show us that increasing age and severity of multimorbidity are associated with higher social care receipt. This was anticipated, but we have never been able to show it empirically before the cross-sectoral linkage.
We have also been able to describe the receipt of social care by socioeconomic position (SEP) using the Scottish Index of Multiple Deprivation (SIMD). This is new and, to my knowledge, hasn’t been described elsewhere on such a large scale. Here we find that those with lower SEP are more likely to receive social care. (All these patterns are shown in the figure below). However, due to a lack of good measures, we can’t tell if the provision of care matches need for care.
My latest piece of work has been looking at whether receipt of social care influences unplanned admission to hospital. Using time-to-event (survival) analysis we can see that, for those over 65, people who receive social care are twice as likely to have an unplanned admission (again these results are provisional at the moment).
Summary challenges faced
The barriers I have faced are, no doubt, similar to others using linked data -the main one being time. Approvals, extraction, linkage etc. all takes considerable time and as a researcher you are not in control of these timescales. A good example is shown by a sub-project for my PhD which was to use social care data from one local authority area only. The council in question were exceptionally helpful and keen to share data. They were very patient whilst I organised ethics and approvals on the academic side. However, by the time I was ready to talk data sharing agreements they had operational pressures (specifically the 2017 local elections) which tied up their legal team. After this we were all hopeful about making progress, but a certain Prime Minister went for a walk in the woods at Easter and decided to call a general election! Cue another 6-week delay until the legal team could start negotiating an agreement. We eventually got there but this illustrates that the data controllers are at the mercy of higher forces as well and it is impossible to set meaningful deadlines.
I am very fortunate to be in a position to keep working with my PhD data in my current role and keep asking questions of the large amount of data we have. However, I have moved university in order to this. This means I now have to repeat the process of ethics, data sharing agreements, privacy impact assessments etc. This is absolutely necessary as my current employers need to make sure that all legal aspects are covered, but there is nothing more soul-destroying than recreating the (significant) amount of work that goes into the required forms (initially completed two years previously). Fortunately, work is afoot at the Scottish Government to make this process obsolete and centralise access to research data sets – however this is still in early stages and we are currently unsure as when this will be operational or what exactly will be available. For now, the pain must endure!
Although there are difficulties in using administrative data for research purposes and delays can be frustrating at times, it is still (incredibly) a really rewarding process. The ability to gain new insights from previously unseen data is something that should excite any researcher. More importantly, data linkage offers the potential to improve society by answering questions that can’t be asked with traditional methods. Well worth an extra ethics form (even if I grumble about it!).
The conference, organised by Administrative Data Research Wales (ADRW) at the University of Swansea and sponsored by Administrative Data Research UK (ADRUK), the Economic and Social Research Council (ESRC), and the Welsh Government, had the central theme of ‘Public data, for Public Good’. A theme which, most appropriately, reminds us that whilst carrying out our research, the data we are using belongs to the public and we must hold this in the forefront of our minds as we use it to generate better outcomes for them.
The three days were jam packed with plenary keynotes, parallel sessions, rapid fire sessions, a visit to Cardiff Castle and for the super geeks- a whole lot of Rubik’s Cubing 🤓
Unlike some other international conferences I have been to, where it can be difficult to see the relevance of research from one country translate over to your own, the 2019 ADR was completely all highly relevant. In fact, a clear takeaway from the conference was the message that there is a huge amount to be learned from how things are done elsewhere.
There were so many talks I wanted to go to and as always is the way with parallel sessions, it simply wasn’t possible to get to them all. I decided to try and attend as many as possible which focussed on administrative data infrastructure and ethics in using public data.
In this post, I offer a summary of take home messages and expand on a couple of the keynote talks and parallel session talks of interest in this area. I could have written a whole post on the excellent work presented from researchers in Scotland, including some from fellow eCRUSADers, but alas this will have to wait for another time!
ADR 2019 Take Home Messages
The potential of administrative data for research is huge. Especially in Scotland where linkage across several research domains is possible.
There is a general movement towards the use of large data repositories (or data lakes/data lochs/integrated data systems- perhaps we need to agree on one definition?) of ready linked data which will speed up access for researchers whilst maintaining public privacy and ultimately make better use of public data for public good.
Issues with data access, particularly concerning timing, are not unique to Scotland.
Some countries seem to be further forward than Scotland and the rest of the UK in this regard (notably Australia and Canada) and there is much to be learned from work going on around the world.
Whilst the message of public trust and transparency was front and centre throughout the conference, I felt there was little demonstration of how this is being done in practice and how, beyond using public data safely, researchers can contribute directly to building that trust.
Don’t be fooled by the ADR Rubik’s cube- it is a lot harder than it initially looks!
At the end of the three days, it was great to see Michael Fleming, researcher from the University of Glasgow, receiving the Best Paper Award for Evidence to Support Policy Making on his work using linked education and health data to explore outcomes for children treated for chronic conditions. Before presenting Michael with his award, Emma Gordon, Director of ADRUK, acknowledged the long wait (from memory just under 2 years) before Michael got access to the data for his research and asked the audience:
“Can you imagine having to wait so long for data?”
Sadly, we certainly can. In fact, there are a considerable number of us here in Scotland (and almost certainly elsewhere) who have.
It was great to hear Emma highlight this and I do think that the conference really sent a message of hope to eCRUSADers and researchers more generally, that things are improving in Scotland. It was certainly motivating to see the future of administrative data research already being put into practice in many countries around the world. But, there is still a long way to go.
In the meantime, you should definitely join eCRUSADers to hear the latest on the administrative data front and get in touch to share your experiences so that we can all learn from them.
The first keynote presentation on day one was given by NHS Digital’s Garry Coleman who, despite being likened to Scrooge and Smaug the dragon, fiercely guarding NHS administrative data, outlined the suite of changes that NHS Digital have made over the last year in order to improve access to NHS data for researchers. These included the introduction of a fast stream service for repeated applications and those with precedent, published ‘standards’ to help researchers know what is expected from their application, and the establishment of Data Access Environment (DAE). DAE is the new cloud technology in England whereby researchers can access patient data for research without the need for the data to leave NHS Digital. The platform went live in May 2019 and aims to provide researchers with faster access to ready linked data sets with built in tools for more powerful data visualisation and analysis. There’s a YouTube video on it here. It all sounds very good, as well as very familiar. I wonder how will this platform compare to Research Data Scotland?
Whilst ensuring public trust in all that we do with public data, was at the heart of Garry’s talk, it was not clear how much, if any, public engagement by NHS Digital has been done around the use of the new DAE system. I’ve had a quick peruse of the NHS Digital website and can’t see any evidence of it on there either. Perhaps it is there somewhere and I am missing it? In any case, given the need to be transparent and ensure that public trust is at the heart of using administrative data for research, we perhaps need more than the hope that the public are aware and are happy for this to be going ahead.
What was clear from Garry’s talk was that he was actively seeking feedback from the research community on how they have found the data access processes of NHS Digital and he expressed a genuine interest in making things easier for researchers.
John Pullinger, Former Head of the Government Statistical Service and Chief Executive of the UK Statistics Authority.
On day two, the first plenary keynote was from John Pullinger, who offered his thoughts on “Lots of lovely numbers but why does everyone make it so difficult?” Clearly, John has an immense amount of experience in this field and he did an excellent job of taking us on a journey with him from the 70s when he first began working with the limited administrative data that was available then, to present day where administrative data are all around us. John’s message was clear, for us to have a social licence to operate with the public’s data, it is incumbent on us to earn their trust. This is in fact just as important as our research itself. He highlighted the importance of seeing legislation around the use of public data like GDPR, as enablers to research rather than impediments. Finally, John pointed out the need to be realistic with what the data can tell us and not to say something more than what the evidence tells us. Once again, this comes back to the need to earn the trust of the public and not doing anything that might undermine that trust.
For me, John really instilled in my mind the fundamental need to remember whose data we are using and that we are very much still on the journey to earning their trust.
Parallel sessions to highlight
In this talk, Robert McMillan talked about the Georgia Policy Labs, a ‘data lake’ which hosts many ready linkable administrative data sets for policy makers and researchers to access and analyse to conduct research on a number of key policy areas. Robert highlighted the secure cloud infrastructure, separation of duties and secure data rooms which ensure that data are stored and used in a safe way. He also mentioned the ‘master data sharing agreement’ which they have to allow access to this data lake. Time was tight so there wasn’t really time to go into detail on this, though I am sure the Scottish Government would be interested to know more as they work towards implementing Research Data Scotland.
In her talk, Anna Ferrante discussed her work in merging the Data Linkage Western Australia and the Centre for Data linkage to form the Population Health Research Network (PHRN). The PHRN is a national network of data centres which links data collected across Australia on the entire population. Its infrastructure allows for the safe and secure linkage of data collections across a wide range of sources. Like some of the other talks throughout the conference, Anna talked about the Bloom filter structure PHRN uses to probabilistically and anonymously link between administrative datasets. Not surprisingly given the amount of research that comes out of Australia using linked administrative data, Anna’s was one of many Australian talks which highlighted the level of maturity of the administrative data infrastructure in Australia compared to Scotland.
Michael Schull talked about a project he is involved in that is building a partnership between health service researchers and computer scientists to develop a high-performance computing platform for the analysis of large linked administrative datasets. The goal of the partnership is to use artificial intelligence and machine learning to improve health and health care, which of course requires an infrastructure that has the power to store and manage large quantities of data. Michael talked about some of the things they have learned from working with computer scientists. Michael stated that the hardware for this infrastructure was in fact the easy part….
Della Jenkins presented on the work being carried out at Actionable Intelligence for Social Policy (AIS), an organisation which works with state and local governments to implement Integrated Data Systems that link administrative data across government agencies. Della’s talk reiterated many of the messages that John Pullinger had highlighted in his Key Note speech earlier in the day. Namely, as the emergence of initiatives for integrated administrative data (AKA linked administrative data) in research continues to grow, it is vital that we build awareness and infrastructure with public involvement every step of the way.
Andy Boyd gave a great talk on work that was carried out by Closer and NHS Digital looking at the possibility of different infrastructures for the onward sharing of longitudinal study data that is linked to administrative records, which currently cannot be released outside of the cohort study institution. The work identified five onward data-sharing models and concluded that although greater clarity is needed in order to effectively share anonymised data and to do so internationally, there are opportunities for developments and the large community of longitudinal cohort studies in the UK might be able to facilitate part of those processes. Full report available here.
One of the final talks I went to was from Mike Robling at the University of Cardiff. I’d been looking forward to going to this talk because I had already heard of the CENTRIC study, work which hopes to develop “training for UK researchers that enhances their understanding of public perspectives and governance requirements and improves their practice when working with routine data”. In his talk, Mike outlined the results from the focus groups with stakeholders, workshops with members of the public, and online survey filled in by researchers. In summary, the study found that there is both a need and an appetite for training researchers in public engagement and in the complex regulations and requirements around using routine data for research. I very much look forward to seeing the training resources that CENTRIC produce and think they will help to fill the existing gap when it comes to researchers improving public engagement and public trust.
One final reflection on privacy
This post may have been more aesthetically exciting if I had been able to fill it with photographs of the speakers whom I went to see present. Sadly, it isn’t because none of the speakers said one way or the other if they were happy to be photographed. Given the nature of the conference, I decided to err on the side of caution and assume that this meant the speaker had not given consent to having their photograph taken and plastered on social media. Of course not everyone shared this view, which made me then wonder if I was silly not to take any photographs in the first place. Or maybe I should have asked the speaker before if they would mind. And so I wonder, in the name of transparency, might it be possible for speakers to quickly mention at the beginning of their talks if they are happy to be photographed? Just an idea. Here’s me giving my talk 🙂
The full 2019 ADR conference proceedings are available here where you can access abstracts of all talks from the three days. Thanks to all of the organisers for putting on such a great event and to the speakers for sharing their exciting work.
As always, I would love to hear your thoughts on this so please comment/share/email me!
In this post, Catherine tells us a bit about her research and what she has done with the GGC data, as well as the challenges she has faced in terms of applying for and getting access to the national data.
Brief overview of Catherine’s research
My research investigates how we can assess the impact of oncology clinical trials. It is important to be able to demonstrate that trials testing new oncology treatments are having real life impacts such as changing practice, changing health and saving money. Analysing this impact helps us to identify which trials are making real world differences, and subsequently, to design more impactful trials in the future.
I am conducting a case study to assess the impact of the Short Course Oncology Treatment (SCOT) trial (1). This study investigated if treating patients with a diagnosis of colorectal cancer with 3 months of chemotherapy following surgery was non-inferior to treating with 6 months of chemotherapy. The trial results have shown that giving a shorter duration of treatment does not make a significant difference to the percentage of patients who are disease free at 3 years. Patients in the 3 month arm of the trial also had significantly less side effects from the treatment, especially with regards to peripheral nerve damage.
Gaining access to GGC Chemocare data, linked to QPI and SMR06 data sets, has enabled me to assess the impact of the SCOT trial on changing clinical practice. There was a significant change in prescribing practices for patients with colorectal cancer after the results of the SCOT trial were publicised. This will translate to a cost saving for the GGC health board and will result in less patients in GGC experiencing debilitating peripheral nerve damage as a result of their adjuvant chemotherapy treatment. A poster with the preliminary results of this analysis was presented at National Cancer Research Institute 2018.
In the next stages of my project, I plan to investigate the impact of the SCOT trial on prescribing on a national scale by using routinely collected chemotherapy data from the three cancer networks in Scotland (South East Scotland (SCAN), West of Scotland (WOSCAN) and North of Scotland (NOSCAN)). My project is running alongside, and will be using a sub-set of, the COloRECTal Repository (CORECT-R) data at the University of Edinburgh (part of an even wider project at the University of Leeds). PBPP approval for my project was granted in June 2018, however, I do not yet have access to this data. Below, I outline some of the lessons I have learned during the application to access this national data.
Summary challenges faced
(1) When data is on databases out with Information Services Division (ISD), often at a local or regional level, this makes the process of data linkage more challenging and costly. Often, there is not the expertise at a local level to extract and transfer data and working relationships between local analysts and those coordinating data linkage centrally do not exist. Specifically, there are few examples of previous linkage of chemotherapy prescribing data (held locally) on a national scale.
(2) Data linkage requires a pre-specified list of the data variables from each data set. Often these lists are not publicly available, or even defined, and it can be time consuming and difficult to generate variable lists which are required for the data linkage process.
(3) Evidence of funding to perform data linkage and make use of national linkage services is often required for PBPP approval. However, depending on the time between submission and the data linkage occurring, it can be several years before the funds are used.
(4) If a researcher is funded for a specified period, the time taken for PBPP approval and data acquisition means that the researcher may not have an opportunity to analyse the data. There is also a risk that the research question will be less relevant than at the time of submission.
There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients. The potential to pioneer the use of routine data for research purposes in Scotland is obvious; however, the practicalities of currently accessing and using this data are not straightforward.
My advice for anyone planning to work with national Scottish data, based on my experience:
Apply for access to data early and be aware that data acquisition may take longer than expected depending on your project.
Think about the costs of data linkage, especially if you want to link data sets that are not currently stored in ISD. The size and subsequent cost of a data linkage project is often based on the number of databases used (especially those outside ISD), rather than on the size of the finalised database.
Define which variables from the data set you will require early and be clear why you require each variable for your analysis.
Iveson TJ, Kerr RS, Saunders MP, Cassidy J, Hollander NH, Tabernero J, et al. 3 versus 6 months of adjuvant oxaliplatin-fluoropyrimidine combination therapy for colorectal cancer (SCOT): an international, randomised, phase 3, non-inferiority trial. The Lancet Oncology. 2018;19(4):562-78.