Author: Elizabeth Lemmon
In this post, I offer my thoughts as an eCRUSADer, on the Administrative Data Research (ADR) Conference held in Cardiff between the 9th and 11th December. Given that these were three days of excellent talks and discussions, I can report that this has been no mean feat!
Overall Summary
The conference, organised by Administrative Data Research Wales (ADRW) at the University of Swansea and sponsored by Administrative Data Research UK (ADRUK), the Economic and Social Research Council (ESRC), and the Welsh Government, had the central theme of ‘Public data, for Public Good’. A theme which, most appropriately, reminds us that whilst carrying out our research, the data we are using belongs to the public and we must hold this in the forefront of our minds as we use it to generate better outcomes for them.
The three days were jam packed with plenary keynotes, parallel sessions, rapid fire sessions, a visit to Cardiff Castle and for the super geeks- a whole lot of Rubik’s Cubing 🤓
Unlike some other international conferences I have been to, where it can be difficult to see the relevance of research from one country translate over to your own, the 2019 ADR was completely all highly relevant. In fact, a clear takeaway from the conference was the message that there is a huge amount to be learned from how things are done elsewhere.
There were so many talks I wanted to go to and as always is the way with parallel sessions, it simply wasn’t possible to get to them all. I decided to try and attend as many as possible which focussed on administrative data infrastructure and ethics in using public data.
In this post, I offer a summary of take home messages and expand on a couple of the keynote talks and parallel session talks of interest in this area. I could have written a whole post on the excellent work presented from researchers in Scotland, including some from fellow eCRUSADers, but alas this will have to wait for another time!
ADR 2019 Take Home Messages
- The potential of administrative data for research is huge. Especially in Scotland where linkage across several research domains is possible.
- There is a general movement towards the use of large data repositories (or data lakes/data lochs/integrated data systems- perhaps we need to agree on one definition?) of ready linked data which will speed up access for researchers whilst maintaining public privacy and ultimately make better use of public data for public good.
- Issues with data access, particularly concerning timing, are not unique to Scotland.
- Some countries seem to be further forward than Scotland and the rest of the UK in this regard (notably Australia and Canada) and there is much to be learned from work going on around the world.
- Whilst the message of public trust and transparency was front and centre throughout the conference, I felt there was little demonstration of how this is being done in practice and how, beyond using public data safely, researchers can contribute directly to building that trust.
- Don’t be fooled by the ADR Rubik’s cube- it is a lot harder than it initially looks!
At the end of the three days, it was great to see Michael Fleming, researcher from the University of Glasgow, receiving the Best Paper Award for Evidence to Support Policy Making on his work using linked education and health data to explore outcomes for children treated for chronic conditions. Before presenting Michael with his award, Emma Gordon, Director of ADRUK, acknowledged the long wait (from memory just under 2 years) before Michael got access to the data for his research and asked the audience:
“Can you imagine having to wait so long for data?”
Sadly, we certainly can. In fact, there are a considerable number of us here in Scotland (and almost certainly elsewhere) who have.
It was great to hear Emma highlight this and I do think that the conference really sent a message of hope to eCRUSADers and researchers more generally, that things are improving in Scotland. It was certainly motivating to see the future of administrative data research already being put into practice in many countries around the world. But, there is still a long way to go.
In the meantime, you should definitely join eCRUSADers to hear the latest on the administrative data front and get in touch to share your experiences so that we can all learn from them.
Hit subscribe at the top of this page!
Keynote talks to highlight
Garry Coleman, Associate Director of Data Access at NHS Digital
The first keynote presentation on day one was given by NHS Digital’s Garry Coleman who, despite being likened to Scrooge and Smaug the dragon, fiercely guarding NHS administrative data, outlined the suite of changes that NHS Digital have made over the last year in order to improve access to NHS data for researchers. These included the introduction of a fast stream service for repeated applications and those with precedent, published ‘standards’ to help researchers know what is expected from their application, and the establishment of Data Access Environment (DAE). DAE is the new cloud technology in England whereby researchers can access patient data for research without the need for the data to leave NHS Digital. The platform went live in May 2019 and aims to provide researchers with faster access to ready linked data sets with built in tools for more powerful data visualisation and analysis. There’s a YouTube video on it here. It all sounds very good, as well as very familiar. I wonder how will this platform compare to Research Data Scotland?
Whilst ensuring public trust in all that we do with public data, was at the heart of Garry’s talk, it was not clear how much, if any, public engagement by NHS Digital has been done around the use of the new DAE system. I’ve had a quick peruse of the NHS Digital website and can’t see any evidence of it on there either. Perhaps it is there somewhere and I am missing it? In any case, given the need to be transparent and ensure that public trust is at the heart of using administrative data for research, we perhaps need more than the hope that the public are aware and are happy for this to be going ahead.
What was clear from Garry’s talk was that he was actively seeking feedback from the research community on how they have found the data access processes of NHS Digital and he expressed a genuine interest in making things easier for researchers.
John Pullinger, Former Head of the Government Statistical Service and Chief Executive of the UK Statistics Authority.
On day two, the first plenary keynote was from John Pullinger, who offered his thoughts on “Lots of lovely numbers but why does everyone make it so difficult?” Clearly, John has an immense amount of experience in this field and he did an excellent job of taking us on a journey with him from the 70s when he first began working with the limited administrative data that was available then, to present day where administrative data are all around us. John’s message was clear, for us to have a social licence to operate with the public’s data, it is incumbent on us to earn their trust. This is in fact just as important as our research itself. He highlighted the importance of seeing legislation around the use of public data like GDPR, as enablers to research rather than impediments. Finally, John pointed out the need to be realistic with what the data can tell us and not to say something more than what the evidence tells us. Once again, this comes back to the need to earn the trust of the public and not doing anything that might undermine that trust.
For me, John really instilled in my mind the fundamental need to remember whose data we are using and that we are very much still on the journey to earning their trust.
Parallel sessions to highlight
In this talk, Robert McMillan talked about the Georgia Policy Labs, a ‘data lake’ which hosts many ready linkable administrative data sets for policy makers and researchers to access and analyse to conduct research on a number of key policy areas. Robert highlighted the secure cloud infrastructure, separation of duties and secure data rooms which ensure that data are stored and used in a safe way. He also mentioned the ‘master data sharing agreement’ which they have to allow access to this data lake. Time was tight so there wasn’t really time to go into detail on this, though I am sure the Scottish Government would be interested to know more as they work towards implementing Research Data Scotland.
In her talk, Anna Ferrante discussed her work in merging the Data Linkage Western Australia and the Centre for Data linkage to form the Population Health Research Network (PHRN). The PHRN is a national network of data centres which links data collected across Australia on the entire population. Its infrastructure allows for the safe and secure linkage of data collections across a wide range of sources. Like some of the other talks throughout the conference, Anna talked about the Bloom filter structure PHRN uses to probabilistically and anonymously link between administrative datasets. Not surprisingly given the amount of research that comes out of Australia using linked administrative data, Anna’s was one of many Australian talks which highlighted the level of maturity of the administrative data infrastructure in Australia compared to Scotland.
Michael Schull talked about a project he is involved in that is building a partnership between health service researchers and computer scientists to develop a high-performance computing platform for the analysis of large linked administrative datasets. The goal of the partnership is to use artificial intelligence and machine learning to improve health and health care, which of course requires an infrastructure that has the power to store and manage large quantities of data. Michael talked about some of the things they have learned from working with computer scientists. Michael stated that the hardware for this infrastructure was in fact the easy part….
Della Jenkins presented on the work being carried out at Actionable Intelligence for Social Policy (AIS), an organisation which works with state and local governments to implement Integrated Data Systems that link administrative data across government agencies. Della’s talk reiterated many of the messages that John Pullinger had highlighted in his Key Note speech earlier in the day. Namely, as the emergence of initiatives for integrated administrative data (AKA linked administrative data) in research continues to grow, it is vital that we build awareness and infrastructure with public involvement every step of the way.
The group have a very useful report and toolkit on their website: Tools for Talking (and Listening) About Data Privacy for Integrated Data Systems. Although the report is aimed at government agencies and their partners who are using linked administrative data, the content is helpful more generally in terms of steps to develop a social licence for using linked public data. It’s well worth a look.
Andy Boyd gave a great talk on work that was carried out by Closer and NHS Digital looking at the possibility of different infrastructures for the onward sharing of longitudinal study data that is linked to administrative records, which currently cannot be released outside of the cohort study institution. The work identified five onward data-sharing models and concluded that although greater clarity is needed in order to effectively share anonymised data and to do so internationally, there are opportunities for developments and the large community of longitudinal cohort studies in the UK might be able to facilitate part of those processes. Full report available here.
One of the final talks I went to was from Mike Robling at the University of Cardiff. I’d been looking forward to going to this talk because I had already heard of the CENTRIC study, work which hopes to develop “training for UK researchers that enhances their understanding of public perspectives and governance requirements and improves their practice when working with routine data”. In his talk, Mike outlined the results from the focus groups with stakeholders, workshops with members of the public, and online survey filled in by researchers. In summary, the study found that there is both a need and an appetite for training researchers in public engagement and in the complex regulations and requirements around using routine data for research. I very much look forward to seeing the training resources that CENTRIC produce and think they will help to fill the existing gap when it comes to researchers improving public engagement and public trust.
One final reflection on privacy
This post may have been more aesthetically exciting if I had been able to fill it with photographs of the speakers whom I went to see present. Sadly, it isn’t because none of the speakers said one way or the other if they were happy to be photographed. Given the nature of the conference, I decided to err on the side of caution and assume that this meant the speaker had not given consent to having their photograph taken and plastered on social media. Of course not everyone shared this view, which made me then wonder if I was silly not to take any photographs in the first place. Or maybe I should have asked the speaker before if they would mind. And so I wonder, in the name of transparency, might it be possible for speakers to quickly mention at the beginning of their talks if they are happy to be photographed? Just an idea. Here’s me giving my talk 🙂
The full 2019 ADR conference proceedings are available here where you can access abstracts of all talks from the three days. Thanks to all of the organisers for putting on such a great event and to the speakers for sharing their exciting work.
As always, I would love to hear your thoughts on this so please comment/share/email me!