ROBINS-I: My Thoughts and Experience

I’ve been meaning to write this post for a few months, so I’ll warn you – it’s going to be long. Since I realised I’d be using the ROBINS-I (Risk Of Bias In Non-randomised Studies – of Interventions) tool for my systematic review I have been searching for people, blog posts, and snippets of experience via Twitter to tell me how people have found this tool, and I didn’t find much – probably because the tool is so new. I did find a brilliant blog post from the Methods in Evidence Synthesis Salon (University of Bristol), which explains why we need a risk-of-bias tool for non-randomised studies, and gives an overview of the domains of bias that are assessed. I’d recommend reading that post before you carry on reading this one if you’re new to ROBINS-I: here. The lack of experiential posts got me thinking though – if I was looking for advice/guidance/stories of experience, surely others would appreciate that too? So, given that I couldn’t find a whole lot of detail, here’s my two-cents on ROBINS-I.

In this post, I’d like to give an overview of my experiences of using ROBINS-I (I will stress again that these are my experiences, if you’re going to use the tool for wildly different subject matters then my thoughts may not translate, and even if you use the tool for very similar studies you may still disagree – it’s all good), and then some ideas on what I think the tool does really well, and what it could do better.

First impressions
As a very new systematic reviewer (i.e. this was my first) I had no pre-conceived thoughts or views on ROBINS-I, it was just another thing for me to learn how to do – just like protocol writing, abstract screening and data extraction had been. Saying that, I’ll admit that I was a bit hesitant when first downloaded the ROBINS-I pdf. I had expected the tool to be 2 or 3 pages at most, but this was 22 pages. I then looked immediately for the guidance document which was a further 53 pages. To reinforce – that’s a lot of pages to get your head around. Looking further at the tool itself calmed me down a bit; the domains were nicely split and well defined, and the tool itself contains a lot of guidance within it.

Right, using ROBINS-I. As I said, my first impressions were a mix of hesitation and ‘I’m sure this will be fine‘, but when I went into the first meeting with my Supervisor to talk ROBINS-I tactics, I went right back to hesitant. He’d never used the tool either, and we were both sat with the guidance document, a print out of the tool, and a huge stack of studies to assess – all covered in scribbles and highlighted areas that were fighting for our attention. My thought process went something like this: ‘Yep, bit nervous about this now – how long is this going to take? Will I ever get all this done before I have to submit my thesis? This is probably going to kill me.’

I’ll say upfront, the whole process of using the ROBINS-I tool to assess risk-of-bias for 103 included studies was not as much of a nightmare as I thought it would be. We (my Supervisor and I) did all of the risk-of-bias assessments together; and when I say together I mean we sat in a room and talked through the entire assessment process. The decision to do the assessments in a pair was for a few reasons; 1) neither of us had used the tool before so it was good to talk through each domain, challenge each other and then reach agreement, 2) the time it would have taken for each of us to do risk-of-bias assessments individually and then meet up to discuss discrepancies would have meant the process took at least double the amount of time, and with 103 studies that wasn’t workable, 3) honestly, I was a bit nervous.

The first study we assessed took a relatively long time. I gave my Supervisor an outline of the study, he looked over the completed data extraction form, and we talked through any flaws we could see in study design. After that we went through the ROBINS-I tool domain by domain, making sure to refer to the guidance when we needed clarification. I also made notes throughout this process, which was invaluable when trying to ensure consistency between assessments. I’ll give you an example, if we pulled one study down to moderate risk-of-bias in the ‘classification of interventions’ domain, I’d write down why. That would ensure that the text time we saw the same flaw in a different study, we’d be sure to pull it down to moderate too.

Once we were happy with the first assessment the second took less time, and the third less still.

After about 10 assessments it was clear that the studies we were looking at were falling down in similar places, and I made a sort of crib sheet (example on the right). This was how a typical study came out for us; obviously not all of them did, but it was a good way to build a loose structure. Things sped up after that. We’d arrange to meet for one or two hours at a time every week, and we got through the assessments much quicker than we first anticipated. When they were all done my Supervisor provided baked goods in celebration, I think that helped.

Advice for future users

  • Do your risk-of-bias assessments in pairs if possible
  • Write everything down, yep, that’s everything underlined and in bold, if you don’t do this you’ll be really angry at yourself later
  • Make a loose crib sheet after you’ve got to grips with the assessment process, tweak it until you’re happy, and then apply to the rest of your studies
  • Invest in highlighter pens, and lots of them – highlighting specific parts of your documents will ensure you don’t forget where there are flaws in the study design, and you can highlight the tool itself so you can see your usual ‘path’ through it

What ROBINS-I is really good at

The studies that we were looking at were not particularly good quality. We were very open right from protocol stage (screenshot on the left) that the collated data may remain at low or very low quality. That made me panic a bit; what was I going to do with a big pile of poor quality studies?! ROBINS-I provided a way to distinguish between the poor quality studies, and the not-so-poor quality ones. Using the tool helped us to create a quality gradient within the pile, which (gladly) prevented me from hating the process of writing the review up. I say that now, I’ve only just started writing the results, so there’s still time yet.

The length of the tool wasn’t a big deal for me. As I said earlier, at the beginning of the process it really intimidated me, but the judgements you need to make are guided heavily, without the named guidance document. There are lots of ‘if you answered yes to X, go to Y’ which means you never answer every question within the 22 pages, and the entire process speeds up considerably because you don’t need to keep checking what each question/judgement means in minute detail – the tool holds a lot of information itself.

What changes I think ROBINS-I would benefit from

  • It’s longer than it needs to be

Let’s tackle the obvious thing first. The tool is really long, and whilst the guidance contained within it is good and it’s relatively easy to navigate, the process of doing an assessment could be very time-consumptive. My review has one outcome that we could apply ROBINS-I to, but for non-randomised clinical studies, especially those that involve multiple outcomes, this is going to take an age.

  • What’s the difference between ‘yes’ and ‘probably yes’, ‘no’ and ‘probably no’, and ‘probably yes’ and ‘probably no’?

I know that the judgements you make throughout each and every domain in the tool are subjective, but the nuances between these responses makes them even more subjective, which I’m not sure is a good thing. In the older version of the risk-of-bias tool for randomised controlled trials, the responses were simply ‘yes’ ‘no’ and ‘unclear’. That seems like an easier route to ensure consistency between reviewers. As well as that, the ‘probably yes’ and ‘yes’ responses, like ‘probably no’ and ‘no’ tend to result in the same judgement for that domain anyway, so I’m unsure what these subtleties are adding to the judgement itself.

Some clarification on the need for these additional degrees of judgement would be great; if they’re not adding much to the final judgement outcome then they could either be taken out, or at least if people know the finer judgements don’t have a huge impact, they won’t agonise over their decision-making.

  • When should you complete the optional question, ‘What is the predicted direction of bias due to selection of participants into the study?’ and how?

This one is a weird one for me – how do you make that judgement, and what is it adding to the process? For me, I don’t think I’d feel comfortable saying that the direction of bias could be characterised as favouring the experimental arm or the comparator. In my (perhaps incorrect – feel free to discuss!) view the fact that the study is at risk-of-bias means just that, it’s too difficult to tell what the direction of that bias is, and it ends up being another gut judgement that you can argue either way.

  • Is one ‘serious’ really the same as four ‘serious’ judgements?

This was my main problem with the tool. If an overall risk-of-bias judgement using ROBINS-I comes out at ‘serious’, that means that the study is judged to be at serious risk of bias in at least one domain, but not at critical risk of bias in any domain. Meaning then, that one ‘serious’ domain and four ‘serious’ domains equate to the same overall judgement. When I was thinking about this I decided to look at it in a completely out-of-context example; image you’re a child and you get a detention once over the course of an entire school term, if you get a detention 5 times in the space of one week is your punishment or judgement by parents/teachers etc the same? I wouldn’t have thought so. I got detention once (and it really was only once in my entire school career), my parents weren’t very happy, but it wasn’t something that they were particularly worried about. If I’d come home with detention every week though, I’m pretty sure I’d have been grounded. See what I mean?

This is more tricky because all of my studies started with a ‘serious’ judgement in the confounding domain, meaning they had no chance of redemption. We knew they were all going to be at serious risk-of-bias due to confounding from the type of studies they were, so it was the other domains that allowed us to see which studies were truly of poor quality.

Have you used the ROBINS-I tool yet? What did you think? I’d really like to hear your thoughts on it, and I’m happy to answer any questions you have on my experiences. When a new tool comes out it’s always a bit tricky to navigate, and I think speaking to others and listening to their thoughts and experiences is invaluable. Leave a comment and let’s get talking.


Doing a Systematic Review and Not Being Beaten by Piles of Paper

As with most PhDs based in Health Services Research, my project started with a systematic review. This seems to differ hugely from lab-based PhDs which (from my experience anyway) largely begin with traditional literature reviews. Not sure what the difference is between the two types of review? I’ll point you in the direction of this blog post from Students 4 Best Evidence. In short, systematic reviews can take an absolute age and they require a certain level of patience and persistence that I didn’t realise I had.

Last year HealthPsychTam posted two different posts talking about her experience of doing a systematic review. ‘A Confession…’ which was a brutally honest post about the feeling of wanting to drop out, and ‘Conducting a Systematic Review’ with lots of absolutely brilliant tips on getting through the process. I’d recommend you read both. In this post I want to add to Tamsyn’s experiences and give my own thoughts on the process so far.

What do I aim to achieve with this review?
My primary PhD supervisor has a Cochrane review that looks at methods to improve recruitment to randomised controlled trials, and mine is sort of the mirror of that review. It looks at methods to improve recruitment to randomise controlled trials that are evaluated using only non-randomised evaluations. We know there’s a lot of publications that cover this topic, but as yet there has been no systematic review including only data from non-randomised studies.

What stage am I at now?
Currently I’ve published the protocol for my systematic review (huge gold star to my supervisor for encouraging me to do this – it was a massive motivator), I’ve finished data extraction and we’re now tackling the task of data analysis and synthesis. In very simple terms, I have created a large pile of paper that I now need to shape into something useful.

Things I wish I’d known at the start that I know now

  • Search strategies can never ever weed out all the studies you don’t need
    I worked with an Information Specialist to create my search strategy – put bluntly, I am not an expert in search strategy development and the Information Specialist based in our unit is. She was brilliant to work with, and she made the whole process much easier and quicker than if I was going to figure out how to do this whole thing myself. Still, search strategies can never be perfect and you will always end up with a big pile of studies that won’t make it into your review. I began with over 9,500 abstracts, whittled that down to ~270 full texts to assess, and then ended up with 103 studies in the final review.
  • You will never finish a systematic review of this size in a year
    I still haven’t finished the review and I’m entering month 19th of working on it. That’s a really long slog to go through, most of which was spent reading stuff and meticulously tracking where every abstract, full text and included study was in the biggest spreadsheet I’ve ever made. Be realistic, it’s unlikely you’ll be done within a year unless you’ve got a really small amount of included studies (if this is the case well done you, I am very jealous).
  • A review cannot be done by one person – get people involved as soon as you can
    All of my abstract screening, full text assessments and data extraction were done in duplicate; once by me and once by whichever person I managed to sweet talk that week. It took a lot of time and effort to find people willing to help, and then explain tasks to them via telephone/Skype and a lot of Dropbox files. I couldn’t have done the review without them and I’m so grateful that they offered to help (I had no funds to offer them – they were just being top notch humans). I would thoroughly recommend getting other people involved in your review as early as you can; whether they can help with screening/data extraction or just give you a new perspective on how you’re going to analyse your data, it’s all helpful.

This systematic review has been a really brilliant learning process, but it’s been the longest slog I’ve had over the course of my PhD. Two of my desk drawers are now crammed with papers, some scribbled with ‘include’, others ‘exclude’ – the further down the pile the less clear and politically correct they become, my personal favourite being ‘this is crap, total crap, exclude on the basis it’s utter crap’. I’m on the way with it though! I’ve got the big cloud of screening and extraction out of the way, and I’m on to the fun stuff and seeing what the review itself shows! Hoorah!

If you’re thinking of doing a systematic review, please be realistic with your timescales – and make sure you have snacks along the way. It’s a long process but chocolate definitely helps.