Author: Elizabeth Lemmon

A conversation with eDRIS: Part 2

In Part 1 of our conversation with the Electronic Data Research and Innovation Service (eDRIS), we heard from Research Coordinator (RC) Jules. In the post, Jules described what a typical day might look like for an RC. The idea of the post was to provide researchers with an insight into what the job of an RC involves. Overall, my hope was that having this insight might help to better align researchers and RC’s understanding of what one another are doing on a daily basis. In this post, I reflect on the goings on of Jules’s day and ask Jules some follow up questions.

If you want to skip through to any specific questions – for example skip right to the end for some Top-tips!!!- then click on the question headings below:

Post Contents: 

  1. Q: How many projects does an RC normally have on the go at once? 
  2. Q: Are there instances where you would arrange meetings with researchers to have discussions about their projects?
  3. Q: What are the most common issues you see with SDC requests from researchers? Equally, what does the perfect SDC request look like?
  4. Q: Can you quickly explain the difference between HSCPBPP and SPBPP?
  5. Q: How often is it that you, as an RC, would flag up to researchers that their training is about to expire? Is this something you routinely check for?
  6. Q: Is there any kind of trouble shooting list that researchers can refer to before they contact their RC in panic?
  7. Q: Is there somewhere that researchers can see what the rules are around proportionate governance issued by HSC-PBPP?
  8. Q: What’s the most common email template you use?
  9. Q: Would these things not have been flagged during RC and researcher discussions prior to submission? But in any case, would you say these are the most common queries that come back from the panel?
  10. Q: Why are approved PBPP’s are not readily accessible to the public?
  11. Q: What would be the top three bits of advice that you would give researchers whilst they make their way through the process of applying for and using administrative data in their research?

The first thing that jumped out at me was the number of projects you said you would be working on at one time. Twenty-five seems like a lot to be juggling at once.

1: How many projects does an RC normally have on the go at once? 

Twenty-five is actually at the lower end of the scale for us. When demand is really high, it’s not unusual for RCs to carry up to 40 live projects and also deal with up to 20 enquiries on potential projects!

Back to contents.

It also struck me that the primary way that you communicate with researchers is via email. I guess this is important to maintain a paper trail of decisions etc.

2: Are there instances where you would arrange meetings with researchers to have discussions about their projects?

Yes we frequently arrange phone calls to discuss issues. Like a lot of people, many of us are working from home and this has sped up the adoption of remote working tools. For Public Health Scotland, we have access to MS Teams and we are using this more and more for meetings with researchers. As a former researcher, I am aware of the need to plan research carefully. Now with my role in eDRIS, I am aware of potential issues that researchers may not think of early on. We do try to have meetings with researchers early on, even before permissions applications start, mainly to ascertain if the data requests are feasible. It may seem like we don’t respond quickly, and, though this can be true, I hope part 1 of this blog has given a little bit of insight into some of the reasons for this!

Back to contents.

Statistical Disclosure Control (SDC) is clearly one of the routine things you guys are dealing with and you mentioned that you might look at Helen’s first because hers are usually filled in properly and explained well. Researchers should obviously read the NSH Requesting Outputs SDC Booklet to make sure they are requesting things in the correct way.

3: What are the most common issues you see with requests from researchers? Equally, what does the perfect SDC request look like?

It doesn’t need to be perfect. We get that researchers are busy juggling competing demands so the request just needs to be good enough so that the reader can understand the context enough to assess easily. The framework eDRIS operates under is what is called the ‘Five Safes’. This is a way to model the whole process, from permissions through to outputs. The basic principle is to break projects into five themes: Safe People, Safe Projects, Safe Setting, Safe Data and Safe Outputs. SDC requests fall into this last theme. When we look at outputs we are asking ourselves if there is any risk of disclosing confidential information from this output. We have to judge several things: Does this output disclose anything on its own, does this output in conjunction with other data that may be available increase the risk of disclosure and lastly, what is the damage that would be done if something were disclosed.

Output checking requires us to have a good working relationship with our researchers, as we may ask them to do something that they don’t agree with. With all that in mind, a good enough output would be one where we, as output checkers, can look at it and say “Ah, there’s a title, clearly labelled, acronyms expanded, this researcher has explained all of the outputs clearly to a non-specialist, there’s enough information in this file for me to understand what I’m looking at and make an informed assessment, and where there are disclosure risks, they have mitigated against them and provided an explanation of what has been done, and they have asked us only for the most important outputs for their research” If we can see all of these, then output checking becomes a routine check for us and outputs, I can’t emphasise this enough, are released much quicker. We know that us asking researchers to look at outputs again is frustrating, as it’s also frustrating and time-consuming for us (we don’t like going back to researchers either…).

Over time, if the outputs from researchers are easy for us to check, we trust that researcher more, and it will be easier and quicker for their outputs to be released.

Back to contents.

The ‘quick-question’ from James that turned out not to be a quick question tickled me. I think it highlights the differences in understanding about the processes and …. between researchers and eDRIS. What seems like a quick question to us, often isn’t so quick from your perspective. I’m not sure I can think of an immediate solution to this miss-match, but hopefully bits and pieces of this blog post can go some way in helping. You mentioned in your response to James, the HSCPBPP and the SPBPP.

4: Can you quickly explain the difference between HSCPBPP and SPBPP?

The Health and Social Care Public Benefit and Privacy Panel (HSC PBPP) and the Statistics Public Benefit and Privacy Panel (SPBPP) are the main bodies, we interact with, that assess project applications and decide whether permissions are given to access confidential data. The difference is HSC PBPP represent NHS Scotland so decide on applications to access NHS Scotland datasets, and SPBPP represent the Scottish Government (SG), and decide on applications where the researchers want to access e.g. Census data or Education data. Where the project intends to link NHS Scotland and SG data, applications to both panels are required.

Back to contents.

Information Governance (IG) – unsurprisingly- also came up a lot in your account of your day. One of the recurring issues was researchers (and your own) IG training expiring. You also noted that the expiration dates are different for different data controllers. Of course, it is a researcher’s responsibility to ensure their IG training is up to date and if their IG training certificate expires during the course of their project they must obtain new IG training within two weeks of the expiry date and provide eDRIS with the new certificates.

5: How often is it that you, as an RC, would flag up to researchers that their training is about to expire? Is this something you routinely check for?

As you say, this is the responsibility of the researchers. However, as we are also responsible for ensuring researchers meet the conditions of their permissions, we do make sure that IG training is still valid. For projects using the safe haven, we set accounts to be disabled on the IG training expiry date. In tandem with this measure RCs also periodically remind researchers that their training is due to lapse to avoid the situation where the researcher contacts us to let us know they can’t access the safe haven! Obviously this is not ideal, as it adds delays while training is renewed, so it’s far better that researchers take responsibility for being aware that their training is up to date Again I don’t have any wonderful solution that comes to mind at the moment but there must be some simple steps researchers can take to make sure we don’t end up in the position where our IG training is about to expire and we haven’t planned to get updates (i.e. by booking onto a Safe Researcher Training course). Something as simple as setting a calendar reminder when we first do and pass our training?! Or maybe a month in the year where we encourage researchers to check- Information Governance Ganuary?! This has actually prompted me to check mine…

Back to contents.

Another thing you mentioned that made me laugh (though understandably this might not be so funny to you…) was the email from Bob to ask if you can help because he couldn’t access the safe haven. I can relate to this entirely: you try to log in and it doesn’t work. Mild panic – you promised your boss/supervisor to run some analysis that day. You frantically send an email to your RC begging for them to help. Then you realise it’s your own fault and you’ve jumped the gun in reaching out for help. You immediately feel bad and send another email to apologise and ask the RC to please ignore the first email.

6: Is there any kind of trouble shooting list that researchers can refer to before they contact their RC in panic? I suppose this might minimise the frequency of the scenarios like the one you mention arising.

The main reasons we see for safe haven access are actually forgotten passwords, but this is usually obvious to the user!The first thing to be aware of is the safe haven web page is only accessible from recognised IP addresses, which are always associated with your host institution (e.g. the University). If you are not in the office (true for most people nowadays!), make sure you are on the university VPN. This will give an ‘Access Denied’ message if the IP address is not recognised.The least common issue is with the two-factor authentication PIN codes. The system used for the 2FA PINs very rarely fails (only once that I am aware of!), so if PINs are not being received, it’s usually something else. The most common reason is the user has entered the wrong username, these are different from the subsequent logins and researchers sometimes forget. The next most common reason is lack of mobile phone signal. If you have not received a PIN, check these first.
Finally, if you are not sure, just ask us!

Back to contents.

You also mentioned that some requests can be processed by yourself (i.e. the RC) under the proportionate governance rules issued by HSC-PBPP.

7: Is there somewhere that researchers can see what the rules are around proportionate governance issued by HSC-PBPP?

If we knew which of our requests would need to be sent for higher approval then we might reassess the situation and work out what is the best approach to take before coming to you with something that you then need to explain needs higher authorisation. This would be especially useful in cases where time constraints are tight for research projects. I suppose this comes back partly to understanding what is a ‘quick-question’ and what isn’t.

The governance around amendments is always evolving, so these rules are given to us as guidelines. RCs still frequently ask the panel mangers for advice! This is purely because of the diversity of projects, so I think it would be difficult to pin down every scenario and how each may be treated by the panel and eDRIS. The best way for researchers to figure out which amendments are the least controversial is to ask their RC! Another source of information is the amendment request form available from the HSC PBPP website, in conjunction with the guidance for applicants document, also available from the same website.

Back to contents.

It’s interesting to know that you have template responses for certain things. I guess in themselves they might indicate which are the typical ‘issues’ that arise.

8: What’s the most common email template you use?
The most common template, by far, is to ask researchers to complete a new enquiry form. Internally, the most common template is the safe haven password reset form… Since I wrote this article, eDRIS have developed, and continue to develop new tools, which has reduced the number of template emails we have had to use. It would be interesting to hear if researchers have noticed changes in the safe haven password reset processes, as this is now relatively painless for us!

Back to contents.

The request back from the PBPP with further questions really surprised me. The issues raised were:

1) Please provide a clear data flow diagram
2) Please provide a Data Privacy Impact Assessment or evidence that one is not needed. Your data protection officer should be able to offer advice.
3) Please provide evidence of public involvement in the research design
4) Please ensure your lay proposal is clearer to those with no experience of research
5) Please ensure anyone named in 1.1 to 1.5 of the PBPP form have valid IG training, there is a list in the ‘Guidance for Applicants’ available from the PBPP website.

These are all things that the researcher should have done before submission. You even suggested that this is a regular kind of feedback from the panel.

9: Would these things not have been flagged during RC and researcher discussions prior to submission? But in any case, would you say these are the most common queries that come back from the panel?

These issues will have been flagged by RCs before submission, but RCs can only offer advice, and it’s up to the researchers whether they choose to implement these! Most applications have at most, one or two of these items in feedback from the panels. It’s worth remembering that the majority of the applicants to eDRIS are first time applicants from a huge variety of backgrounds and will not have had to do anything like this before. As a researcher, a role I was in for many years myself, your focus is on the scientific or methodological merit of your research. Being asked to think of wider issues can be difficult to get right first time. I am guessing that this is a major factor in accounting for the frequency of these issues being highlighted. My advice would be to pay attention to the advice that your RC gives you (and read the guidance documents when you are completing the application form). I would say the data flow diagram is probably the most important piece of information to get right. This needs to show the source of the data (usually the individual records in a dataset), and each step in the journey from there to the place where researchers will access the final dataset. Once that is pinned down, it’s much easier for eDRIS to figure out what needs done, and for the panel to see where any risks are.

Back to contents.

I’ve applied to PBPP several times so I like to think I am quite familiar with the application. But if you are a new researcher then there are plenty of resources out there to help you fill it in. Including guidance on the PBPP website, an eCRUSADers blog post (this includes links to an example PBPP and DPIA).

10: Why are approved PBPP’s are not readily accessible to the public? I think this would improve transparency in the use of public/patient data, but also help avoid situations where researchers submit incomplete applications. Although all projects are different, it can often help to see what a successful application looks like, even if it is in an unrelated research project.

Do you have any view on this?

While a good idea in principle, the PBPPs contain confidential data! As an example, researchers’ emails, work addresses, professional registration numbers, signatures to name just a few. PHS have to process these under the same laws (GDPR, Data Protection Act) that we treat e.g. patient and employee records. I believe the eCRUSADers website has a link to the ‘Tooth Fairy’ application. I would recommend this as a resource for researchers to see what a complete application would look like. For transparency, PBPP publish abbreviated lists of approved projects and end of project summaries provided by applicants once projects are completed; these are available on the HSC PBPP website.

Back to contents.

Overall, it’s interesting to see the overlap between the challenges we face: expiring passwords and IG training, safe haven outages. Not to mention the barrage of emails coming in regarding different projects. I guess most researchers (at least academic researchers) are in a similar position in that they are often juggling research on several projects alongside other things like teaching and administrative responsibilities.

11: What would be the top three bits of advice that you would give researchers whilst they make their way through the process of applying for and using administrative data in their research?

This is a tricky one, however, I will do my best!

1) Listen to your RC! We are well aware of the more problematic issues, but we usually have solutions to these. We can only offer advice, but we give this advice to help, not to make people’s lives difficult. We appreciate the governance arrangements are complex and can be confusing (see no 2 in this list), but we want to help.

2) Try to understand your project from the data controllers’ point of view. Data controllers want their data to be used for good, but they are also obliged by law to protect the privacy of the individuals whose data they hold. This applies from the moment you apply for access, right through to the point you request outputs for publication. RCs can help, but I would also recommend making use of the Information Commissioners website to understand data controllers’ obligations with regards to personal data.

3) Ask your peers and eDRIS. There are researchers and RCs that have many years of experience using administrative data in research. Your eCRUSADers website is a fantastic initiative.

If I could add a ‘bonus’ tip, please also let us know what causes you, as a researcher, the most pain. If it causes you pain, it causes us pain! We can’t promise to make swift changes, but we will do our best.

Back to contents.

Thanks again Jules for taking the time to talk to eCRUSADers over these last two posts. It has been great to get an insight into the day in the life of an RC and overall I hope that this conversation will improve the working relationship between eDRIS and the researchers who apply to use health data in their research going forward.

Course Round Up: An Introduction to Data Science for Administrative Data Research

Dates of course: May 2021
Organised by: Scottish Centre for Administrative Data (SCADR)
Post summary: I was lucky enough to bag myself a spot on this year’s SCADR course “An introduction to data science for administrative data research”. In this post, I present an overview of the course and its content. I also provide my thoughts on the parts I felt to be most useful for fellow eCRUSADers as well as where I felt there were things missing.

In a rush? Skip ahead in the contents to find out about:

  1. The course in a nutshell and overall thoughts
  2. The course structure
  3. The lectures
  4. The practical sessions
  5. Thoughts on the most useful parts
  6. What was missing or could be improved?
  7. Who should attend the course
  8. How much does it cost and what time commitment is involved
  9. When will the course be running again? 

1. The course in a nutshell and overall thoughts

Overall, the course is designed to introduce researchers to the world of administrative data, with a particular focus on Scotland. The course includes a series of lectures and practical sessions, allowing participants to get some hands on experience working with a synthetic administrative dataset.

The course was very well structured and organised. The content was extremely useful and the course instructors were incredibly knowledgeable. The course provided participants with a lot of very helpful information. For anyone starting out doing research using administrative data I would most definitely recommend it.

Back to contents

2. The course structure

The course was split into a mixture of lectures and practical lab sessions over a four-week period. Since this year’s course was delivered solely online, there was also a live Q&A session each week.


Back to contents

3. The lectures

Week 1: Week 1 kicked off with an introduction to the course, the learning outcomes and to administrative data. Chris Dibben, SCADR Director, welcomed the participants and gave a very useful summary of the history of research using administrative data in Scotland and the UK. He also outlined the key stages a researcher will go through when carrying out research using administrative data (see figure below). The following lectures, delivered by experienced SCADR researchers, also covered some of the different sources of administrative data in the UK, the benefits and limitations of working with administrative data and the Five Safes Framework. There was also a brief introduction to programming in R to assist with the practical sessions.

Week 2: The focus of week 2 was to introduce participants to some of the administrative datasets available for research in Scotland (and the UK). These included:

  • Scottish Government Education and Analytical Services data and the education datasets (Scotland focus)
  • Department for Work and Pensions data (UK focus)
  • National Records of Scotland data including Scottish Longitudinal Study (Scotland focus)
  • Health data (Scotland focus)

Each lecturer discussed the datasets available, the type of information contained in them, how to access them and links/email addresses to find out more information.

A further lecture in week 2 covered record linkage, the different methods of linkage and some of the implications of incorrect linkage on research. The final two lectures to assist with the practical sessions looked at working with dates and times, and indexing, linking and joining data.

Week 3: Week 3 explored data provenance, law, trust and public engagement. The data provenance lecture, delivered by SCADR co-Director Iain Atherton, outlined the importance of understanding where your data come from because ultimately this will affect what you get out of it. He provided a useful outline of steps researchers should take when working with administrative datasets:

The lecture on law, trust and public engagement was incredibly helpful for any researcher starting out in administrative data research (I would go as far to say that attendance at this lecture should be mandatory for anyone who wants to access administrative data for research!) The session highlighted how important it is to develop the social licence when carrying out research in on unconsented personal information. Some useful questions to ask ourselves were considered when thinking about public engagement. For example, to whom does your research relate? Who will be impacted from the outcomes of your research? What conditions or social issues does your project explore? Doe the findings have the potential to impact a wide proportion of the public?

The final week 3 lecture explored data visualisation in R.

Week 4: The lectures in week 4 gave participants some insight into some existing research that has used administrative data in Scotland. These included Scotland’s maternity dataset (SMR02), linked survey and DWP data, drug consignment data, and Scottish Government data on looked after children.

Each of the speakers also talked about the main challenges faced during those research projects and in particular imparted their words of wisdom for other researchers. The lecturers also pointed towards some published work using Scottish administrative data. For example:

Clemens T, Dibben C, Pearce J, et al, 2020. Neighbourhood tobacco supply and individual maternal smoking during pregnancy: a fixed-effects longitudinal analysis using routine data. Tobacco Control ;29:7-14.

Pattaro, S., Bailey, N. & Dibben, C. Using Linked Longitudinal Administrative Data to Identify Social Disadvantage. Soc Indic Res 147, 865–895 (2020).

Back to contents

4. The practical sessions

Week 1: The practical sessions started in week 1 with an introduction to the R environment. Note that the course is designed for people who have used R before. However, even if you have no experience in R, the course instructors were very willing to help and the instructions for carrying out the data cleaning and analysis were extremely thorough.

The practical sessions use a synthetic Not in Employment or Education (NEET) dataset. This dataset was created to mimic the relationships between variables in the original NEET data but the observations do not pertain to real individuals.

Week 2: The focus of week 2 was first to handle the date and time variables in the dataset, create new variables and to check for any anomalies. All of which are typical exercises you will do when you work with administrative datasets. In the second part, participants had to convert the dataset from long to wide format, another common issue when working with administrative datasets, and merge datasets together.

Week 3: Week 3 was all about visualisation to explore the data to look for patterns and to check for any anomalies.

Week 4: The final practical session was focussed on modelling the data and producing tables of output.

Back to contents

5. What were the most useful parts?

  • The pre-recorded lectures worked very well. It meant you could listen in your own time and fast forward speed if you want to.
  • The links and information about using specific datasets in Scotland.
  • Hearing from researchers who have worked with specific administrative datasets in Scotland and seeing examples of published work using those datasets.
  • Learning top tips from experienced researchers that you can take into your own research.
  • Getting to do hands on work with some synthetic data to give researchers experience in dealing with common problems in administrative datasets.
  • The clear message that administrative data are often unconsented personal data and it is vital that we have to develop the social licence to use it.

Back to contents

6. What was missing or could be improved? 

  • More information on the specific data access processes and how long they typically take.
  • Information on the specific information governance training that researchers can do, for example the ONS Safe Researcher Training.
  • A practical session that includes carrying out disclosure control requests and checks.
  • A dataset that really looked like an administrative dataset- the NEET data were already pretty clean and even came with a codebook!
  • Information and guidance on how researchers can involve the public and patients in their research.
  • Information on how researchers can contact coders.

Back to contents

7. Who should attend this course?

The course is most suitable for those who are new to the world of administrative data research, particularly in Scotland. However, the course would also be useful for anyone working with administrative data in the UK, as many of the lessons learned will translate.

Back to contents

8. How much does it cost and what time commitment is involved?

In 2021, the course cost £120 per person. The course runs over a course of four weeks with both lectures and practical sessions. In total, SCADR estimate that you will need around 4-6 hours per week to watch the lectures and read the teaching materials and a further 3-4 hours per week to join the practical sessions.

Back to contents

9. When will the course be running again?

To enquire when the next training course will run, you can email

Back to contents

People Make Data: Part 4

I’m thrilled to have Mr Steve Clark contributing this post to the eCRUSADers People Make Data series.  Steve has a wealth of knowledge about cancer care throughout the UK and since his colorectal cancer diagnosis in May 2013, he has been involved in various projects and campaigns within the colorectal cancer sphere, including setting up his own Strive for Five campaign. Steve also has experience working closely with academics to shape research, including co-authoring on a recent publication with myself and colleagues from Edinburgh, Glasgow and Oxford.

In this post, Steve talks to eCRUSADers about his experience as a patient, his involvement in the Patient and Public Group of Bowel Cancer Intelligence UK (PPG BCI-UK) and what he has learned about the system of cancer care and cancer data during his journey. Crucially, Steve highlights the importance of actively engaging patients and the public in the research process from the outset.

Over to you Steve, the floor is yours.

I was diagnosed with advance (stage 4) colorectal cancer in May 2013, the initial prognosis looked fairly bleak as it had already spread to my liver and both lungs. I’ve been very lucky since then thanks to an excellent team who’ve given me great care – a skilled specialist colorectal surgeon was able to remove my very large primary tumour without the need for a stoma, and my chemotherapy was so successful that the planned ablation operations were cancelled. Since then I’ve been on maintenance chemo which has kept things nicely under control with only 2 recurrences in 8 years, and I’ve had clear scans for the past two years. At my recent review when I told my oncologist I was targeting 10 years he replied “and beyond”!

I’ve been aware of the fractured nature of cancer care for most of the 8 years since I was diagnosed. I’m referring here to both clinical care and patient data. I should say that I have been fortunate in my care, but through my voluntary work I’ve seen the variations evident throughout the UK.

As patients, we really need a united cancer network. A network that ensures our doctors have knowledge of, and access to, the best care for their patients. A network where the data on cancer care is available across borders to ensure best practice is readily recognised and gaps in care are addressed quickly.

As one, solitary patient, I have no way of affecting change in healthcare, but I do try to help my fellow patients as much as I can. I offer support through my Strive for Five campaign with the aim to give hope to people with a stage 4 diagnosis, but my voice is too small to affect policy. This is one of the reasons that I was keen to volunteer with some of the charities. I’m a Campaigns Ambassador with Cancer Research UK and have volunteered for some time with Bowel Cancer UK. Although most of my work with BCUK has been focussed on patient support, I’ve been involved directly with a number of key campaigns including reducing the bowel cancer screening age.

Almost two years ago I started volunteering on the Patient and Public Group of Bowel Cancer Intelligence UK (BCI-UK), my first time on a formal PPG. BCI-UK is the umbrella body overseeing two important initiatives: the UK Colorectal Cancer Intelligence Hub which runs the COloRECTal cancer Repository (CORECT-R), and the Bowel Cancer Intelligence Programme which aims to improve patient outcomes by identifying and addressing variations in care. The excellent work of the team at BCI-UK is really highlighting how vital it is that we connect the various datasets around cancer care, so that researchers can interrogate these data and directly guide improved clinical care.

The adage states the definition of madness as “doing the same thing over and over again and expecting a different result”, surely that applies to cancer care – if we don’t review what’s working best, how can we hope to improve?

And it’s not enough to simply analyse the data, the findings have to be followed up and implemented, they can be used to educate medical and surgical practitioners to help them improve their care. This could give significant improvements in patient outcomes quickly, something that is urgently needed for all patients with advanced cancer who don’t have the time to wait for new treatments to come through research – we can improve how current treatments and procedures are utilised quickly and relatively cheaply.

It’s not always about new treatments, we need better use of existing therapies through recognising and sharing best practice.

I was lucky enough to be co-author on the recent paper “Creation of the first national linked colorectal cancer dataset in Scotland: prospects for future research and a reflection on lessons learned” which is a clear step in this direction by at least making the Scottish data accessible. That’s a start, now the real work begins by making the data work for us!

Some of the ways that this type of dataset could be used for colorectal cancer care could include:

  • Identifying hotspots across the country – good and bad – and addressing the gaps;
  • Maintenance chemo for long term care of stage 4 – what regimens get best balance of effect and lifestyle;
  • Impact of different support programmes on treatment success and tolerability;
  • Clear evidence to help drive significant investment to ensure early diagnosis of cancer

I am so pleased to see more Patient and Public Groups being set up for individual studies and study groups, this can only be a good thing. I would encourage all researchers to do this as a way of ensuring their work is truly patient-centric.

This really doesn’t have to be an arduous process. The BCI-UK PPG is a large group but that’s because of the volume and range of work it’s involved in. I’m a member of other PPGs relating to individual studies that only have one or two patients, and this can work well so long as those patients can represent more than their own individual experience.

The key advice for PPG involvement is the earlier the better, ideally right at the start, when you’re at concept stage, but you can bring us in at any point. The sorts of things a PPG can help with include the more obvious things like writing the plain English version of the proposals and reviewing manuscripts, but also giving insight into what the outcomes of the study should be and what may or may not be acceptable to a patient within the study.

I truly believe that we are achieving great improvement in the care of colorectal cancer, but so much more is needed, and the way forward is by collaboration between researchers, physicians and surgeons, with patients not just at the centre of the concepts, but actively engaged.

Working with administrative data in Scotland: A round up of researcher experiences

We’ve hit n = 5 in terms of eCRUSADer Researcher Experience posts! It’s not quite there in terms of a sample size for claiming any statistically significant findings but I thought that it was about time we took stock of them to see if there were any common themes emerging. So, that’s what this post will briefly do.

First, the key challenges that our researchers are outlined. Next, you’ll see some direct quotes taken from the posts- in particular lots of positive messages about carrying out research with administrative data in Scotland. Finally (and hopefully most usefully for you), a list of some ‘Top-Tips’ so that your administrative data journey runs as smoothly as possible!

Key Challenges

There were some common challenges that popped up throughout the five researcher experience posts which I try to summarise below.

    • Timing Timing Timing!!!

This was (as expected) a clear theme that emerged in each of the researcher experience posts. In particular, the time taken between PBPP approval and data access.

    • Administrative datasets can be messy…

They aren’t made available to you in a ‘research ready’ format (even though a huge amount of work will have gone on behind the scenes to get them ready) and they don’t come with clearly defined data dictionaries.

    • ECR short term contracts

The nature of ECR work can means that we are often on short term contracts. Together with the issues around timing, this can have knock on consequences for our career trajectories if we don’t get access to the data in time.

Key Messages

But, it’s not all bad! Although there are real challenges involved in accessing and working with administrative data, each of the researchers we have heard from have agreed on the massive potential for administrative data in research that ultimately aims to improve outcomes for society. Here’s what they had to say:

“Administrative data is an extremely powerful tool that will help you to answer the largest and most difficult questions faced by society”

“There is huge potential to use routine data to improve the way we do clinical trials and ultimately to improve outcomes for patients.”

“The ability to gain new insights from previously unseen data is something that should excite any researcher.”

“It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades.”

“Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!”

Top Tips and Solutions

    • Consider the time it can take to access the data and plan for this as far as possible

This is one of the issues that we are trying to shed some light on by putting together these researcher experience posts. It is rather tricky, not least because every project is different and has differing levels of complexity. However, there are some parts of the data acquisition process that are easier to plan for in terms of the time they will take. In particular, preparation of your PBPP application will probably take around 3-6 months. In terms of the time from submitting your application to the approval, this usually takes around 1-2 months. Knowing these timings means you can put them into funding applications etc. The harder bit is knowing how long it will take to get access to the data and we have heard from our researchers here that this can take up to three years! To try and understand how long things will take, it is well worth talking to your eDRIS coordinator about how frequently the datasets you have requested are linked for other projects. There may be some datasets that are harder to link than others or that have never been linked before. See if you can find any researchers who have previously worked with similar linked datasets and speak to them. They might have some good advice!

    • Have a plan B (and C!)

Unexpected things can (and probably will) crop up during your administrative data journey. And the longer things take, the more likely these unexpected events occur. The best thing you can do is have a back up plan. Better yet, have several! This may be using publicly available data, or settling for a subset of the datasets you have requested if there are particular hold ups with a specific dataset.

    • Prepare as much as you can before getting access to the data

There is actually a huge amount you can do whilst you are waiting for access to the data. You will still have to do a lot of data cleaning when you get access so one thing you can do is try and get as familiar as possible with the variables in the datasets you have requested. One idea might be to prepare a data dictionary (which includes codes) that you can ask to be transferred into the safe haven for when you begin analysis. You can also prepare some code for cleaning the data to some extent. For example, code to attach labels and value labels. You should also make sure you have done the relevant training (see the training section of the website for some useful links).

    • Acknowledge the limitations of administrative data

It is important to remember that administrative data has not been collected with research in mind. This can often mean that it wont contain all of the information you need to carry out the ‘perfect analysis’. What is important is that you are able to answer your research question with the administrative data, so be realistic. In some cases, it might be that your question would be better answered using survey data for example. 

    • Invest in the relationships with the key people involved in the data access pipeline

Get to know the people who are assisting you with data access and speak to people who have knowledge of the datasets you are requesting. Also, do both of these early on!

I hope this was useful in giving a summary of the researcher perspective of accessing and using administrative data in Scotland. It occurs to me that I haven’t yet contributed my own Researcher Experience post. I should say that overall my experience has been largely similar to those we have heard about. I gave a talk on my experience recently at a useMYdata event (if you haven’t heard of them then do check out the great work they are doing!). You can find my slides and the recording from the event here. The event was particularly focused around the researcher’s journey in accessing routine health data and how patients themselves are (or can be) involved throughout the process.

Researcher Experience: Dr Feifei Bu

In this first Research Experience post of 2021 we hear from Dr Feifei Bu, Senior Research Fellow in the Department of Behavioural Science and Health at the University College London (UCL). Feifei first started working with administrative data in 2014 when she worked with the National Pupil Database linked to Understanding Society survey data (UK Household Longitudinal Study). In 2015, she joined the University of Stirling and started working on projects that were using administrative extensively. In particular, she worked with Scottish Morbidity Record (SMR) data linked with the Social Care Survey (now Source) and Healthy Ageing in Scotland (HAGIS). From there, her interest in carrying out research using administrative data continued into her current position at UCL where she has worked with Hospital Episode Statistics (HES) linked with English Longitudinal Study of Ageing (ELSA). She has also worked with de-identified Whole Systems Integrated Care (WSIC) data. All in all, Feifei has been carrying out research using administrative datasets for around seven years.

Overview of my research

My work using administrative data has been mainly around health service utilisation. Collaborating with colleagues from Stirling and Dundee, we had looked at the cost of hospital admissions for people with cognitive spectrum disorders using SMR data. In 2019, I worked on a project on the relationships between social factors and health outcomes amongst older adults using ELSA linked with HES. We looked at how loneliness and social isolation were associated with the risk of hospitalisation related to fall, cardiovascular disease and respiratory disease respectively. More recently, I led a project looking at how patient activation (a measure of people’s knowledge, skills and confidence to manage their own health and wellbeing) was related to the usage of different health care services, including GP and non-GP primary care, elective and emergency inpatient admissions, outpatient and A&E attendances. At the moment, I am involved in an ESRC funded project looking at how indoor temperature is related to secondary care health service utilisation using ELSA linked with HES.

Summary of any challenges faced

Unlike survey data that are usually thoroughly cleaned and well documented, administrative data often require some extra work. Based on my own experience, for example, the episode order variable comes with the SMR or HES data cannot be taken for granted. In some cases, it could be important to further sort them into the correct order. Also, it may take some detective work to find out what a specific variable measures or how data were collected in practice and by who—this could be critical for data interpretation.

A unique strength of administrative data is that they offer objective and detailed measures that are usually unavailable in surveys. However, as these data were not collected for research purposes, there is often a lack of other critical information that we would like to take into account in our research. If data linkage is not possible, this is an even tougher challenge than the one above.

Due to data protection purposes, administrative data often need to be analysed in a safe setting, like a data safe haven. This can usually be accessed via a remote desktop connection, but in some cases, you might need to go to a secure access point that is not necessarily local. This will slow down your progress significantly. Some administrative data are stored in data warehouses, in which case researchers need to extract data that are relevant to them using programming language, like SQL. In other instances, researchers may not have access to the data warehouse directly and data extraction need to be done by a data analyst. This would require a lot of planning ahead as well as communication back and forth. Finally, data access is time-limited in most cases. It may ‘expire’ before getting everything published. This is something that needs to be taken into account when applying for data access.

Working with administrative data is like learning to tame a dragon—albeit challenging, it is also exciting and rewarding!

Thoughts for fellow and future eCRUSADers

As previous Researcher Experience posts have mentioned already, the access application can take a long time to go through. It is important to plan ahead especially if you are on a tight schedule—either for your PhD or other funded projects.

It is important to acknowledge the limitations of administrative data, in particular, the lack of critical information that need to be ‘controlled for’ in analyses. We should not rule out the possibility that survey data may serve our research purposes better. Here is a note to myself, and to be shared with eCRUSADers: our passion for data should not outweigh a solid research design.

Public Benefit Privacy Panel Timelines

Project: Social Care Survey linked to Scottish Morbidity Record

Preparation of PBPP application: – December 2015- April 2016 (approximately 4 months)

Submission to initial PBPP approval: April 2016 – August 2016 (approximately 4 months)

PBPP approval to data access: August 2016 – April 2018 (approximately 2 years)

Publications using administrative data

Bu, F., Abell, J., Zaninotto, P., & Fancourt, D. (2020). A longitudinal analysis of loneliness, social isolation and falls amongst older people in EnglandSci Rep, 10 (1), 20064. doi:10.1038/s41598-020-77104-z

Bu, F., Zaninotto, P., & Fancourt, D. (2020). Longitudinal associations between loneliness, social isolation and cardiovascular eventsHeart. doi:10.1136/heartjnl-2020-316614

Bu, F., Philip, K., & Fancourt, D. (2020). Social isolation and loneliness as risk factors for hospital admissions for respiratory disease among older adultsThorax. doi:10.1136/thoraxjnl-2019-214445

Hapca, S., Guthrie, B., Cvoro, V., Bu, F., Rutherford, A. C., Reynish, E., & Donnan, P. T. (2018). Mortality in people with dementia, delirium, and unspecified cognitive impairment in the general hospital: prospective cohort study of 6,724 patients with 2 years follow-upClin Epidemiol, 10, 1743-1753. doi:10.2147/CLEP.S174807

A conversation with eDRIS: Part 1

The Electronic Data Research and Innovation Service (eDRIS) is a small team within Public Health Scotland set up to facilitate access to administrative data for research. Sometime back in the beginning of 2020, I was invited along to talk to eDRIS about eCRUSADers at one of their Development Days. My main hope from the talk was to introduce eDRIS to the eCRUSADers platform and work out if we could come up with any ideas for improving the journey that researchers and eDRIS go through together, when applying to use and using administrative records in Scotland.

Based on the Researcher Experience posts on eCRUSADers at the time (and to this day), as well as personal and published evidence, a common theme is the lengthy wait for data access. As researchers (especially ECRs who are often on temporary research contracts), it is vital that we make the best use of the time from initial contact with eDRIS, right up until data access and beyond. To do this, we need to make sure that our interactions with eDRIS are productive and efficient for both parties. My belief is that if we are to identify any areas where this journey can be improved, both parties need to understand more about one-another’s work and roles in the process.

So, on the back of my presentation to eDRIS, we chatted about the prospect of beginning to create this understanding, by putting together a couple of blog posts in conversation with eDRIS.

In this first post, I am incredibly grateful to have Jules, one of eDRIS’s Research Coordinators (RC), to describe what a typical day looks like. Jules talks through his morning and afternoon, giving us an idea of some of the daily tasks he is involved in and providing an insight into the emails and requests he receives throughout the day.

For me (as a researcher who has worked with a number of RCs on different projects), this insight was very useful and as I read about Jules’s day I had lots of further questions to ask. Jules has kindly offered to answer those questions and these will be posted in Part 2- so stay tuned!

But first off, let’s hear from Jules on his account of a day in the life of an RC. Not quite sure what an RC’s role is? Have a quick read here.

A day in the life of a Research Coordinator

For statistical disclosure control purposes (SDC), the names used here are fictional but the events described are based loosely on real incidents.


Check for new emails, only 10 from last night, great, not bad for my 25 projects! Ok, first job, do we have any SDCs… Yes, two researchers on different projects have output requests, which one first? I think I’ll do Helen’s first, she usually has done a good job of explaining the outputs and making sure there are no disclosure risks. On top of that, she only has health data, so only one data controller requirements to worry about, result! So, lets log in to the safe haven…. now, what is my password? Oh yes, the access path has changed, I need a new password. Oh well, let’s get that password reset first, that might take up to an hour and means I can’t do the other SDC.

Ok, what’s next in the Inbox. Ahh, a ‘quick question’ from James, this should be easy. Nope, he wants to add a Census variable, so…let’s check the existing permissions… Just as well, the Health and Social Care Public Benefit and Privacy Panel (HSC PBPP) and Statistics Public Benefit and Privacy Panel (SPBPP) end date is in two weeks! So, I need to ask James to submit an amendment to add the new Census variable, as well as extend the study date, so that means an amendment to SPBPP and HSC PBPP, and maybe get him to contact the National Records of Scotland (NRS) data access team to discuss if it’s possible first? Yes, that would be best. So, I’ll just email James…


Uh-oh email from HR, my own information governance (IG) training needs refreshed, perfect timing! That reminds me, does anyone on James application need their own IG training refreshed…. yep, James and two others are about to expire. Let’s see what the data controllers accept as valid IG training… So, Census accept Safe Researcher Training (SRT) as valid for five years, but HSC PBPP have this as three years… so it’s about to expire as far as HSC PBPP are concerned… I may just ask them to do the online Medical Research Council course (MRC), as that’s quicker, and we worry about the SRT in two years’ time… So, lets email James.

“Dear James, thank you for your request to add a Census variable. The first thing to do would be to discuss feasibility with NRS, I have added their contact details below. Let me know if you need any help with your approach to them. I also noticed that your project permissions are due to expire, and some of your colleagues named on the form have IG training that is also about to expire, but only as far as HSC PBPP are concerned. Each of these changes needs to be recorded in the permissions, so we need to submit amendments to both SPBPP and HSC PBPP for: adding a new variable, extending the study duration and updating IG training. I think the best way to do this is to submit amendments to the PBPP panels for the end date and updated training, then, after you have got the go-ahead from NRS to add the new variable, we can process another amendment to add the variable, as this will take longer. Please let me know if that makes sense?”


“Dear Jules, I can’t access the safe haven, please can you help? Thanks, Bob”

Now, is the safe haven down? Nope… So where is the issue for Bob, he didn’t say…

“Dear Bob, sorry you are having problems accessing the safe haven. Please can you let me know at which stage you are having the problem? If you can access the safe haven page, are you receiving the 2FA PIN? If not…”


“Dear Jules, please ignore my last email, I wasn’t on the VPN, my mistake! I am in now. While I am here, please can you release the tables in my study area? These are quite urgent, and I need them today.

Ok, delete my email draft. Now, do I have my own password yet… Nope. Ok Bob will have to wait, next email. Now, John wants to know where we are with his data sharing agreement. Which project is that? Oh yes, here it is, so… the data sharing agreement was sent back to the Shire Commissioners for signing three weeks ago, good question, where is that? Nothing from them…. so, lets send an email chasing it

“Dear Phyllis, Hope you are well. We have had a researcher chasing…”


“…the data sharing agreement for 1234-5678. We returned to you for review and signature three weeks ago, please can you let me know when you will be able to get to it? Thanks, Jules”

Ok, lets email John

“Hi John, apologies for the delay, we sent to the Shire Commissioners three weeks ago for signing…”


” and I have contacted them to ask for an update, I will let you know as soon as I hear from them. Thanks, Jules”

Ok, where was I? No safe haven access, so no SDCs for now… so, lets check the task list… Next job is an amendment to add a researcher to Siobhan’s HSC PBPP. So this is 1.5, great, under the proportionate governance rules issued by HSC-PBPP I can process these myself.


Ok, let’s get back to it…


An email from HSC PBPP to researcher:

“Dear Prof. Urquhart,
The HSC PBPP panel have reviewed your application and have some further questions for you before your application can be properly considered. Please provide responses below the listed queries, and return to us within two weeks:
1) Please provide a clear data flow diagram
2) Please provide a Data Privacy Impact Assessment or evidence that one is not needed. Your data protection officer should be able to offer advice.
3) Please provide evidence of public involvement in the research design
4) Please ensure your lay proposal is clearer to those with no experience of research
5) Please ensure anyone named in 1.1 to 1.5 of the PBPP form have valid IG training, there is a list in the ‘Guidance for Applicants’ available from the PBPP website.

Ah this is a shame, but at least chimes with the advice I gave to the Prof. that the panel would likely pick up on these issues if we didn’t address them before submitting the application. With tight funding cycle deadlines I can sympathise with the desire to get something submitted very quickly, sadly this often creates more work, now where’s that template response… send, done.

Now, has my new Safe Haven password turned up? Nope. Ok, next

“Dear Jules,
In order to avoid SDC, please can I share my safe haven screen with my collaborators? I would only need to do this using Zoom, and with a small number of colleagues, so nothing would leave the safe haven.

Oh dear…

“Dear Gary,
Please do not do this!
Sharing the safe haven screen is not allowed in any circumstances, whether screen shots, screen sharing or in person. As a reminder, these terms are detailed in the user agreement you signed and are also on the statements you accept every time you log in to the safe haven. Any outputs from the safe haven must be assessed for disclosure, please complete the request form to help speed these assessments up.
Let me know if you have any questions.


“Dear Jules,
I submitted a draft PBPP to you a few weeks ago. I know the data flow is missing, but this is because I don’t yet know what data I need. I was hoping you could just submit it anyway, to get the ball rolling.

Ok… where’s that template…

“Dear XXXX,
Please note I have not submitted your incomplete PBPP; if I had, the panel would have returned to us asking where the missing sections were. It saves time if the required sections are completed, as indicated in the ‘Guidance for applicants’ available from the PBPP website. I believe I have already provided the minimum recommended changes for the PBPP to be able to consider your application.
In this case, if the panel do not know what confidential data you are asking for, they cannot assess the risks to the privacy of the individuals in the datasets, as they don’t know which individuals you are asking for data on.
Please let me know if you have any further questions.

Ok, last thing, do I have my password?.. Yes!!! Now let’s finally look at Bobs urgent SDC then Helen’s.


“Dear Safe Haven user,
We have experienced some network issues which means we need to shut down the Safe Haven for the rest of today. The Safe Haven will be unavailable from 1530 today until 1000 tomorrow morning. Please save any work and log off.
We apologise for any inconvenience caused by this unexpected outage.
The Safe Haven.”

What time is it??? 1528….

“Dear Bob,
Unfortunately, the Safe Haven has experienced an unexpected error and I am unable to look at your SDC request today.
Please also note that, as you have Census data, we need NRS to carry out checks and clear the outputs before we can check and release. I know you asked for the outputs today, I am afraid this is not possible; however, we will aim to have the outputs checked within our three-day turnaround target.

Apologies for the delays,

I’m going home…oh wait, I am home. (Please note we have flexible working, not all staff finish at 3:30 pm)

The role of an eDRIS Research Coordinator

The two main researcher-facing roles are RCs and Analysts.

The RC role is primarily project management. RCs are assigned a number of projects that they are then responsible for. The essence of the role is to enable access to administrative datasets for researchers, where that access is granted in line with confidentiality laws (e.g. GDPR, Data Protection Act). The RC is there to provide a service to researchers to enable high quality research. In practical terms, this requires the RC to make sure they are aware of current procedures (rather than knowing the jurisprudence around the common law of confidentiality!), so we can provide researchers with the best approach to meeting each data controller’s requirements within a legal framework. There are often multiple data controllers (even within a single organisation) and each data controller has their own requirements (this is why we sometimes ask researchers to provide the same information in slightly different ways). The sheer number of datasets, each with the quirks of their respective data controllers, requires a great breadth of knowledge of the administrative data landscape. As well as projects where data are provided as part of the service, there are numerous projects where the applicants need permissions only, to do all sorts of things, ranging from setting up clinical trials to changing the way health audits are carried out.

The Analyst role is distinct from the RC role and is primarily tasked with creating the extracts for the researchers, although there are often discussions with analysts at early stages to determine feasibility of the requests. The eDRIS analysts have in-depth knowledge of many of the common health data sets, so are a good source of information, for both researchers and eDRIS RCs.
For statistical disclosure control purposes (SDC), the names used here are fictional but the events described are based loosely on real incidents.

Are data repositories the future? An eCRUSADers conversation with DataLoch

In this post, we will be discussing a very exciting project called DataLoch. DataLoch is a repository of linked health and social care administrative data sets from Edinburgh and the South East of Scotland. It was established in 2019 under the Data Driven Innovation (DDI) programme funded by the Edinburgh and South East Scotland City Region Deal. The programme is led by Professor Nick Mills at the University of Edinburgh but the programme is a collaborative partnership between NHS Lothian, Borders and Fife, Health and Social Care Partnerships, the University of Edinburgh, patients and the public. The ambition for the DataLoch repository is that it will help to solve some of the pressing health and social care challenges being faced in Scotland.

At present, the data linkage infrastructure and governance in Scotland is set up in such a way that bespoke linkages are created for the purposes of specific research projects that will be destroyed at the end of the project lifecycles. In contrast, within a research ready data repository like DataLoch, the access permissions to use and link data persist and expand over time without destruction. Such persistent curation increases the efficiency in creating data assets and greatly reduces the time required to provide data for projects. As is currently the case for bespoke linkages, all applications to DataLoch will go through an approvals process to ensure that the processing of the data is done in accordance with the required data protection principles, maintaining public trust and privacy of individuals’ data.

I was very excited when the DataLoch team said they would be happy to talk to eCRUSADers about their work because from my own experience, and from listening to the experiences of others, it appears that research ready data repositories may address some of the challenges that we eCRUSADers face. So, if you are interested in finding out about how such repositories might help, or if you are a researcher hoping to work with linked health and social care data in Scotland: read on!

Q: How will DataLoch help solve the problems researchers face around timing for access to data?

Q: Will DataLoch allow for flexibility in research proposals?

Q: Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

Q: Do you envisage DataLoch extending to other parts of Scotland in the future?

Q: What have been the biggest challenges you have faced when setting up DataLoch?

Q: Where will researchers access the DataLoch data? Will remote access be an option?

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

Q: Is the plan for DataLoch to continually update the data to contain the most recent data?

Q: When can researchers hope to apply for and access the DataLoch?

Solving the current challenges around conducting research with administrative data

Q: As eCRUSADers, one of the main challenges we face is that we have limited time. This makes accessing administrative records for research tricky because in many cases this can take a long time (see some of our Researcher Experience posts for examples). How will DataLoch help solve this?

“DataLoch’s governance model is agile and based on agreements with our data controllers based on a model of precedence. This means with every project our approval process is quicker – meaning researchers no longer have to experience long waiting times – while remaining robust and in line with GDPR. Research will be reviewed through a Caldicott panel and the Safe Haven delegated ethics panel. Whilst there are no fixed timescales as each project is unique, our average turnaround (from application to data delivery) is currently 3 months.”

Q: One of the other challenges Early Career Researchers face is the exploratory nature of their research. For example, in a PhD project, the researcher might set off with one research question in mind but after understanding the data more, they realise that there might be a more useful/relevant/beneficial research question they can answer. Will DataLoch allow for flexibility around the research proposal?

“There are two ways we can enable flexibility for exploratory research. Firstly, we do allow applications that are explicit about their exploratory nature. Applicants would then need to clarify the specific research nature when the exploration phase is complete. The second route a researcher could follow would be via an explicit proposal which, if it changes to accommodate a new research interest, would need a second research application to be approved.”

Work with patients and the public

Q: Of course, any use of patient and public data in research should include their views and input throughout the process. Have you carried out any work with patients and the public to find out how they feel about storing their health and care data indefinitely?

“DataLoch have a public reference group which meets regularly and is part of our governance structure. The Public Reference Group have been extremely supportive of the DataLoch project and actually expressed some surprise that health data was not already more routinely linked. They are especially interested in ensuring easy to understand transparency of DataLoch’s work and that the data is being used in areas of public priority.

We would welcome volunteers to join this group, please contact us if you’re interested. We recognise that public perception does change over time and have a learning loop built into our structure that allows us to be sensitive to public opinion.”

Looking forward and reflecting back

Q: The Scottish Government’s initiative, Research Data Scotland, is trying to set up a similar model but on a Scotland wide level. Do you envisage DataLoch extending to other parts of Scotland in the future?

“Research Data Scotland have a fantastic national ambition and we would love to work with them as this idea develops. One of the points of difference for DataLoch is that we work at a regional level working closely with local clinicians to understand local context. This would be difficult to scale at a national level though there are interesting federation models that we would like to explore. Equally the granularity of the data we provide would be difficult to scale at a national level. It is also worth noting that our rich regional level data can contribute to answering some critical nationally-relevant questions.”

Q: What have been the biggest challenges you have faced when setting up DataLoch?

“Our greatest challenge has become a significant point of difference for us, we have created a virtual team that includes NHS clinicians, analysts and technical support as well as University of Edinburgh expertise. Working across boundaries has been tough but ultimately has created something unique and agile gaining the benefits that come with the expertise from both NHS and academia.”

Practical questions around using DataLoch data

Q: Where will researchers access the DataLoch data? Will remote access be an option? Safe settings are few and far between!

“Our preference is that researchers access the data via the National Safe Haven which can be done remotely.”

Q: Where would researchers find the data dictionary for the data sets and variables included in the DataLoch?

“The latest version is available on the website here. This dictionary corresponds to the current COVID-19 data set. Note that the website is under construction and some pages may move”

Q: Is the plan for DataLoch to continually update the data to contain the most recent data? Once again, this improves reproducibility of results and research transparency.

“Yes. DataLoch will be continually updating the data according to data source lag times, we will be managing version control to enable reproducibility of research as needed. We recommend researchers save their own coding and statistical analysis.”

Q: When can researchers hope to apply for and access the DataLoch?

“You can do so today, our current dataset is focused on COVID-19 but please do register for our DataLoch release newsflash via our website to receive updates.”

eCRUSADers: one year on

I still can’t quite believe that it has been one year to the day since I launched the eCRUSADers website with the first Welcome post.!

For those of you who don’t know, before launching the platform, I had been contemplating the idea of eCRUSADers for some time whilst reflecting the challenges I faced when attempting to carry out my PhD research using administrative data. I’d also had many conversations with other researchers who seemed to be in the same boat. When I started my new post-doc position in Edinburgh, I jumped straight into the ‘waiting for data’ pool again. It was really this that spurred me on to finally set up something tangible. It would be a place for other researchers to go to learn from the experiences of others and to find out useful nuggets of information.

A huge amount has happened since then and little did I know back in February that the impending pandemic would shine such a bright light on the use of administrative data for research purposes. In the last six months, countless COVID-19 studies using administrative health data have emerged, hopefully paving the way for the continued use of this data to generate public benefit.

What has been achieved?

  • 4 Researcher Experience posts
  • 3 People Make Data posts
  • 3 Other posts
  • 2 Reflections on courses/training/conferences
  • 44 subscribers
  • Approximately 2,400 page views
  • £1,000 funding from the Welcome Trust Institutional translational partnership award (iTPA) Hub to set up an eCRUSADers working group.
  • Invited to give a presentation to the Electronic Data Research and Innovation Service (eDRIS) at their Development Day and have an ongoing dialogue with them about eCRUSADers.
  • Statement of support from the Scottish Centre for Administrative Data Research (SCADR) and Chief Statistician Roger Halliday.

Since the platform was launched in November 2019, I’ve published a combination of Researcher Experience posts, People Make Data posts, reflections from courses and conferences and other bits and pieces. It hasn’t always been easy to squeeze in time for eCRUSADers alongside my role within the Edinburgh Health Economics group but I have done my best to spread the word and get researchers from around Scotland to contribute. This could easily be a full time job! Overall, I am pleased with the progress to date and have thoroughly enjoyed progressing a platform that I believe will be so useful for future early career researchers.

What happened that I didn’t expect?

Aside from COVID-19, when I set up eCRUSADers I hadn’t planned the People Make Data series. The idea for the series came about as I started trying to paint a picture of the administrative data landscape in Scotland and naturally that involved including patients and the public in that picture. That process helped me reflect on my own practice as a researcher and recognise the invaluable contribution that patients and the public have in shaping research. I think eCRUSADers provides a great place to share that message and point researchers towards useful information and resources regarding public and patient involvement.

What is still to do?

My ambitions for eCRUSADers are big and I have lots of ideas in the pipeline for expanding the information that is contained within the platform! I plan to keep chipping away at this and look forward to what next year will bring.

I still have to use the funds from the iTPA Hub. I had planned to arrange a number of face to face meetings and an event to formalise an eCRUSADers working group but COVID-19 has unfortunately put that on hold. But watch this space!

Finally, I’m always looking to expand the number of Researcher Experience posts so if you are a researcher working with Scottish Administrative data then please do get in touch (

Thank you!

I just wanted to say a huge thank you to all of the contributors to the blog through the last year and thank you to everyone who has shared, subscribed and followed the work eCRUSADers. Here’s to another year and to the future sharing of information and experiences about carrying out research using Scottish administrative data!

People Make Data: Part 3

I am very excited to share this post with you today! It is the third of our People Make Data series and this time, Pete Wheatstone talks to us about sharing patient data from a patients perspective.

I had the pleasure of meeting Pete at the end of 2019 at a useMYdata workshop in Leeds, where we got to talking and realised that we had in fact already met on Twitter?! Pete is a former cancer patient and a member of useMYdata (who we heard from in People Make Data: Part 2). Pete is an extremely active and experienced Patient and Public Involvement (PPI) representative for a number of data research programmes. These include, the UK Colorectal Cancer Intelligence Hub Patient Public Group; Chair of the DATA-CAN (the HDR UK Hub for Cancer) PPIE Group; National BOwel Cancer Audit Patient & Carers Panel; IQVIA Patient Group, the National Institute for Health Research (NIHR) Royal Marsden/ICR Biomedical Research Centre (Digital Theme); Cancer Research UK Patient Data Reference Panel; the European Institute for Health Innovation Through Health Data (i-HD) and an association with (through i-HD) – to name but a few!

Pete’s contribution to the promotion of the safe use of patient data in research is second to none and it is a real privilege to have him offer his insights in his post. In his neat analogy that follows, Pete helps us really get to the bottom of what sharing patient data is all about.

“We’ve been sharing information since before language was developed”

At a most basic level, creatures that live in groups, from insects to humans, share information between themselves such as the presence of danger and the location of food. This is because it is a good method of protecting the group and helping it to flourish. Whilst living in caves, our gestures, grunts and groans gradually became more sophisticated allowing us to share more detailed information that evolved into language. However, even today ninety percent of our communication (and therefore information) is still non-verbal. You can tell things about a person just by such things as their facial expressions, how they sit or move their body, their tone and volume of voice, the level of eye contact. We all sub-consciously and consciously do this to enhance the communication of our thoughts and feelings. It helps us to form relationships and friendships. Surely, acquiring information is the reason we send our children to school and why we study. We exchange information about our  thoughts and our feelings when we socialise.

But when it comes to personal medical information this is, of course, a little bit different – or is it? Whilst many of us like to share some of this information, there may have some aspects that we feel we want to keep to ourselves. Of course, it is our right to keep that information to ourselves if we wish or tell a trusted person in confidence.

Medical data is just bits of information held electronically. But information, when held as data, can be easily shared with others for both benefit and, potentially, disadvantage. However, if that data is anonymised (in other words all information is removed that might identify us) and it is added to information from thousands of other people, might we hold a more relaxed view? And if that data was only accessible by trusted people, authorised to access that information only for a very specific and approved purpose should we have any substantial concern?

As current or future patients, we benefit from improved treatments and services because previous patients shared their medical information. Do we not, in turn, have a moral obligation to share our information to benefit our children, grand-children and future generations of humanity?

I believe that, providing the current legally required data security controls in place and those that hold the data are open and transparent (about who data is accessed by and why), there is no logical reason why we should not share our anonymised medical data – for the benefit of us all.

Researcher Experience: Dr Drew Altschul

With all that has been going on it has been a wee while since we heard from a researcher who is in the thick of working with administrative data in Scotland. In this Researcher Experience post, we hear from Dr Drew Altshul, Research Associate in the Department of Psychology at the University of Edinburgh, who has been navigating the administrative data landscape in Scotland for around two years. Drew works with a large linked data set of the Scottish Mental Survey 1947, 36 day sample, Scottish Longitudinal Study (census data), Prescribing Information System (PIS) and Scottish Morbidity Record for Mental Health Inpatient and Day Case (SMR04).

In Drew’s account of working with administrative data, the familiar challenges of timing, unforeseen circumstances and working in the safe setting, rear their heads. However, like the other researchers we have heard from, the ‘seeing the glass half full’ attitude and optimism for the need to press on in spite of these challenges endures. In particular, Drew points out the useful discoveries him and his colleagues made whilst waiting for data access, which would ultimately improve their research output in the long run. I think this point rings true for me especially, after all, eCRUSADers wouldn’t exist if it weren’t for the wait for data.

Over to you Drew:

Overview of my research

I’ve yet to do much of work with our main variables of interest, as we only recently were granted access to a few of the data sets we requested. However, while we were working on obtaining and waiting for access we followed some side avenues in part to prepare ourselves for working with the data, and in part because we thought of research questions that we thought were interesting in their own right. For example, we are interested in how early life socioeconomic conditions, commonly represented by the father’s occupational social class, relate to mental health later on in life. However, our data set is based on the participants of the Scottish mental survey 1947; these individuals were all born in 1936, and because of World War II, reports of fathers’ occupations from censuses carried out during participants’ early lives are unreliable, not representative, and often missing. In order to improve on our data set, we dug deeper into the data we were aiming to link, pulling out additional, historical occupation information, and coding these data ourselves. This in turn lead to a machine learning approach to classifying historical social class data, which can be used in the future by people working with historical social class data. So it goes to show how much interesting, useful work you can wind up doing along the way!

Summary of any challenges faced

The process is long and convoluted, and at seemingly every turn. I was fortunate because I joined the project relatively late, although when I joined we thought we would have access to the data in a few months’ time, rather than two years later. I did what I could to help with the application processes, but ultimately this work predominantly falls on the shoulders of a single person, and most of one’s time in this area is not spent working on forms, but waiting for other people to get back to you.

A large amount of time and effort goes into processing and preparing data before linkage, but that does not mean that the data are clean and easy to work with once you get a hold of them. You are likely going to need to spend significant time cleaning and otherwise processing your data before you can analyse them.

There are advantages to having to layout analyses in advance during the application process: essentially, this forces you to pre-register your work, which is an important step in doing reproducible science. However, a run-of-the-mill pre-registration has considerable flexibility, and this is not so much the case with the analyses we plan for our data. All output must be checked for privacy and security concerns, so if we want to tweak an analysis or run a sensitivity analysis, for instance at the request of a reviewer, every different analysis that we want to take out of the safe haven environment needs to be checked, and that process can take weeks.

Thoughts for fellow and future eCRUSADers

You ought to think very carefully about timing, in particular you ought to expect significant delays. If possible, try to plan for multiple scenarios, and make sure you have meaningful work you can do while you wait out the access process. The processes for accessing data are supposedly being streamlined and improving, but it is worth investing in your relationships with the people along the data access pipeline, as they are best served to help you manage your expectations.

It can be a difficult and frustrating area to work in, but there are big potential payoffs, including large sample sizes and long-term follow-up, sometimes across many decades. These are types of data that sometimes cannot be obtained in any other way, and this allows for novel, meaningful research questions to be asked and answered.

Public Benefit Privacy Panel Timelines

Preparation of PBPP application: 01/06-2018 – 21/08/2018 (about 12 weeks)

Submission to initial PBPP approval: 05/10/2018 (about 12 weeks)

PBPP approval to data access: 16/06/2020 (about 1 year and 6 months)