In episode 44 of the Channels podcast, Phil Ewels Phil talks with Alex Pelzer, a key figure in the Nextflow and nf-core communities, about regulatory aspects of bioinformatics and Nextflow pipelines. We talk about Alex’s background, his experience in academia(QBiC) and industry (Boehringer Ingelheim), and the importance of regulatory compliance in bioinformatics workflows. We discuss how documentation, software testing, and standardization all plays a part in ensuring pipeline reliability for clinical trials. Finally, we highlight the newly formed nf-core Regulatory Special Interest Group, aiming to collaborate on best practices and standards for bioinformatics pipelines.
We’d love for interested listeners to joing us at the upcoming Nextflow Summit and nf-core hackathon to get involved!
Key links:
In this episode we talk about the critical topic of regulation, its impact on bioinformatics, and how it shapes the future of the field.
Alex recalls his early days working on bioinformatics pipelines without frameworks like Nextflow or Snakemake. He shares his journey from using a self-developed Java-based framework for ancient DNA analysis to transitioning to Nextflow, eventually leading to his involvement in nf-core.
Phil and Alex discuss how their collaborative efforts and the early involvement of other key figures helped establish nf-core. They reflect on the importance of standardization in bioinformatics workflows and how nf-core has facilitated collaboration and pipeline development across the global bioinformatics community.
Alex’s transition from QBiC to Boehringer Ingelheim (BI) marks a significant shift in his career. At BI, Alex has continued to advocate for Nextflow and nf-core, converting numerous colleagues to these tools despite the complexity of the larger pharmaceutical setting. This naturally leads to a discussion about the importance of regulatory compliance in the pharmaceutical industry.
Regulation in bioinformatics involves adhering to frameworks set by authorities to ensure analyses are reproducible and meet quality standards. Alex explains how this impacts everything from sample collection to pipeline processing and downstream analysis. He underscores the importance of being able to document and prove the validity of bioinformatics methods and tools to regulatory bodies.
Phil and Alex discuss the importance of trust in the software tools used for bioinformatics. Alex highlights the need for tests, documentation, and a strong user base to build trust in these tools. Open-source software, with its transparency and community-driven development, often provides the reliability needed in this highly regulated space.
The conversation touches on the differences in regulatory overhead between academia and the pharmaceutical industry. While academia has some flexibility, regulatory requirements in industry settings can add a significant overhead, sometimes up to 30%. This highlights the rigorous processes needed to ensure reproducibility and compliance in clinical trials and other settings.
When choosing bioinformatics tools, Alex emphasizes the need for open-source options that are well-documented and have a strong user base. Quality assurance processes include software tests and documentation to ensure tools are fit for purpose. This selection process is crucial for maintaining trust in the results generated by these tools.
Beyond pipeline processing, downstream analysis in bioinformatics involves handling patient data, ensuring differential gene expression is accurately measured, and maintaining reproducibility. Alex shares insights into how these analyses are performed in pharmaceutical settings, highlighting the importance of proper documentation and quality control.
The episode also introduces the nf-core special interest group for regulatory work. This group is dedicated to addressing the unique regulatory challenges faced by the bioinformatics community, fostering collaboration across different sectors, and ensuring high standards are maintained. The group’s kickoff meeting saw significant enthusiasm and participation, indicating a strong community interest in this area.
Phil and Alex discuss the future directions for regulatory work within nf-core. They highlight the importance of community involvement and collaboration with regulatory authorities to develop standards that can be widely adopted. The special interest group aims to streamline regulatory processes and ensure nf-core pipelines meet necessary compliance standards.
In closing, Phil reminds listeners about the upcoming Nextflow Summit in Barcelona and the early bird discount. Alex reinforces the importance of attending not just for the summit but also for the hackathon, where valuable collaborations and knowledge sharing occur. They encourage everyone interested in regulatory work to join the special interest group, get involved in the Slack channel, and participate in upcoming meetings and events.
This episode of Channels for Nextflow provides valuable insights into the current and future landscape of regulatory work in bioinformatics, highlighting the critical role of collaboration and community-driven standards in advancing the field.
Phil Ewels: Hello, and welcome to Channels for Nextflow podcast. You are listening to episode 44 and today we’re really lucky to have Alex Pelzer. He’s a key figure in the Nextflow and nf-core community. Many of you, I’m sure will know him and have heard of his name before. And I’ve managed to twist his arm and bring him onto the podcast to talk to us all about the exciting topic of regulation around bioinformatics and Nextflow pipelines.
Alex, thanks very much for joining me today.
Alex Peltzer: Thank you, Phil, for the invitation.
Phil Ewels: Before we get too stuck in, I want to do an early flag for our upcoming Nextflow summit. It’s in Barcelona at the end of October, you can find out more details at summit.nextflow.io.
We have an early bird discount. If you go to register, you can see all the details 20 percent off on registration fees and that expires at the end of August. So you don’t have many days left.
So I just wanted to flag that to anyone who’s listening that it’s going to be a really good event. We’ve got training, we’ve got nf-core hackathon, and we’ve got some fantastic speakers.
One of whom, of course, Alex, you’re going to be giving a talk. So yeah, if you’re listening to the podcast and you can’t get enough of Alex’s voice, you know where to find him.
Are you going to be at the hackathon as well?
Alex Peltzer: I plan to go for the full week, so with Summit and the Hackathon, and we were already thinking about doing something on the Hackathon, specifically on regulatory as well.
Phil Ewels: Fantastic. So if this is your thing, if you’re interested in working around this then absolutely valuable time to come and meet other people interested in it and get, get up to speed.
Alex Peltzer: Absolutely.
Phil Ewels: Alex, tell us, you and I, we’ve known each other. gosh, longer than I’d like to think now, probably coming off on seven, eight years or something now I think maybe we met at the first Nextflow meeting ever in Barcelona in 2017 or something
Alex Peltzer: Yeah. Late 2017 it was.
Phil Ewels: Both of us were working in academia at the time I was at SciLifeLab in Sweden and you were at QBiC in Tübingen, in Germany. Tell us a little bit about your background, how you got into using Nextflow and, what brought you here today,
Alex Peltzer: Yeah I was working on bioinformatics pipelines like pretty much everybody in bioinformatics, I think . But without using frameworks like Nextflow or Snakemake or others.
So in the past I was using my own more or less self developed Java based framework, which is which had also concept like modules and things like that, but was exclusively only used for ancient DNA analysis.
So the only pipeline that I was able to write was one of my PhD projects, which was called Eager, which some of you might know because it’s already on nf-core as well, but now in Nextflow obviously, and there’s also some other folks now working on this because I actually moved away from ancient DNA research.
Phil Ewels: I feel like writing your own bioinformatics pipeline software is like a rite of passage for anyone working in the field 10 years ago, you’re another one to add to that club.
Alex Peltzer: Yeah, pretty much, yeah. Some people say it’s also about writing your own aligner at some point in your career. But I think it also boils down to writing your own bioinformatics workflow.
Yeah, so I started working with that. And then when I was about to finish my PhD, I was looking for options, actually, where to work, what to do next.
A good friend of mine, who was also very much involved in the early days, Sven Fillinger, was asking me whether I want to join QBiC because they were also having interest in setting up standard core unit, bioinformatics pipelines for daily analysis.
And back then I thought, okay, yeah, that’s a cool thing to do, but I need to come up with a good solution on how to do this properly.
I was actually giving a talk about the eager pipeline in. 2014, 15 ish at the HPC conference in Frankfurt, where I met some people who were early involved in Nextflow as well, but I didn’t look into it further back then, because we had already developed a framework for ancient DNA analysis and I was just, I couldn’t really squeeze this in at that point.
But at that point I thought it might make sense to actually join one of these next floor meetings back then in Barcelona., at the CRG back then was pretty small. It’s not too many people, actually, 30 people maybe, I don’t
Phil Ewels: Yeah, like 30, it was like a small meeting room, wasn’t it? It was
Alex Peltzer: Yeah, it was really small, but also quite nice because it was easy to talk to people. And that’s where we met, I think, the first time in person.
Phil Ewels: Yeah. If you look at all the people that were sat in that room in 2017, it’s amazing how many of them have gone on to become friends, colleagues, how many of them are still in the Nextflow space. It’s, it was a real formative group of people, I think.
Alex Peltzer: Yes, absolutely. It was really nice actually to talk to people, to be able to actually also influence, also collaborate on things. I think I talked about it with you. Because NGI back then had a lot of pipelines already out . We had already a little bit of a standardization effort . And we thought, okay, let’s not just reinvent the wheel, let’s just collaborate . And I think that’s when it kicked off, right?
Phil Ewels: So three podcasts ago, episode 41 or something was a podcast I did at SciLifeLab with Fran and with Maxime. And we were talking a little bit about this history of nf-core. It’s nice to chat to you now and see the other side of that coin. I remember you getting in touch. I think it was about a single cell analysis pipeline that you wanted to build, using the NGI framework and stuff.
Alex Peltzer: Yeah. So that was back then quite nice. And I thought about using NGI pipelines in the beginning and then you were bringing it up to rebrand them, to put everything under the nf-core hood. Then others also chimed in. There was also Andreas Wilm, from ASTAR Institute in Singapore, so we wanted to contribute and there was already a couple of people involved early on, bringing in their own expertise, bringing in their own pipelines.
I think from then it became history, people joined in, started collaborating, started developing, started contributing. I think that was quite nice to see as well for me in the new job at QBiC , because all the internal developments then also were more or less streamlined into contributing directly to end of core rather than developing in house stuff anymore.
I think that helped a lot later as well.
Phil Ewels: So you co founded nf-core with me and we set that up and you converted everyone at QBiC to using nf-core, which is a legacy, which lasts to this day.
Alex Peltzer: I think so. Yes. So my knowledge, at least there’s still lots of people at QBiC doing nf-core work, doing contributions, using pipelines. It’s not a question anymore, whether you are using nf-core pipelines. It’s the standard, the default. Back when I started, that wasn’t the case.
So that’s the difference now. It’s been like this everywhere nowadays that people are switching from in house developments mostly to the public nf-core pipelines.
Phil Ewels: So you’ve finished at QBiC and moved on into Pharma.
Alex Peltzer: Yeah. After a couple of years at QBiC and building nf-core there, and also contributing to project work there. I decided to move on to where it’s something different to work, my case, this was a move to Boehringer Ingelheim pharma company.
At Boehringer Ingelheim we also use Nextflow quite a lot. Actually all the in house pipelines are in Nextflow as well or have been migrated to Nextflow, so yeah.
Phil Ewels: I was going to say, you joined QBiC and you converted QBiC to be a Nextflow, an nf-core shop is, it was that your plan with BI? What was it like when you first joined? Were there many people using Nextflow already?
Alex Peltzer: Yeah, a good friend of mine called me an nf-core Terraformer sometimes. Wherever I enter, I seem to convert people to using nf-core tooling and Nextflow tooling. But It’s a bit difficult because QBiC is rather small, we’re talking about, like 20, 30 people back when I joined QBiC. With BI, that’s a different story. We’re talking about hundreds of people in bioinformatics. So it’s more difficult. Also, there’s different applications and it’s not always the same shoe- it does not fit all of them. However, I also see that the uptake is actually speeding up, and people are now also more happy with how to use Nextflow pipelines than they were in the past, so I think it’s been moving that way. Yeah.
Phil Ewels: And anyone who’s listening, who’s not familiar with pharmaceutical companies and can you tell us a little bit about what BI is and what BI does?
Alex Peltzer: So we’re developing medicines for oncology, inflammation, cardiometabolic diseases, and trying to do clinical trials. So we do both the discovery part of research, early clinical research part, as well as then also developing stuff for application in humans in the end, also animal. We also have as a part of the company animal health.
So what we’re doing is applying these pipelines also on biomarker analyses across projects, across clinical trials , trying to figure out whether we see certain trends in the data that could, for example, explain the mode of action of a medication, for example.
Phil Ewels: Is this mostly genomics data, like DNA sequencing, RNA sequencing, or a bit of everything.
Alex Peltzer: I think it’s a bit of everything, but the focus is strictly on genomics. Transcriptomics is the standard thing nowadays , but there’s also other applications. There’s also a bit of proteomics, these panel based approaches where you have a couple of hundred targets when you measure them, because it’s easier in clinical settings to measure these in a more standardized way.
Same for RNA seq, for example, it’s exploratory analysis in 99 percent of the cases, but in the end it’s also helping you to confirm whether you see some trend in your data that you also can see, for example, on clinical safety or efficacy of data or something.
Phil Ewels: The title of this podcast is regulatory revolution. So with that context of the work you’re doing, what does regulatory mean to you?
Alex Peltzer: I think it’s a bit of a catch them all phrase, honestly speaking. When we had the kickoff with the special interest group around the regulatory we had lots of people joining, lots of people with interest, but they came off from a bit of a different background, a bit of a different angle to this.
And for us here in the pharma setting, it’s a different angle than I would envision this for let’s say a hospital where they do clinical diagnostic testing. versus what we do in clinical trials, for example.
For me personally, I would say regulatory is the framework that is set by authorities. that you have to obey. That you have to make sure that you do certain analysis in a way that they are reproducible that you can guarantee that what you did is actually making sense that you can guarantee that the quality of your analysis actually matches certain criteria, certain standards.
Meaning , whatever you sample at the hospital, should be already in a very good shape up to pipelines where we do the data analysis and then also downstream analysis. And should actually be able upon request , to actually tell them and show them exactly this is how I did it. This is the pipeline version I used, this is how I did the testing, that this pipeline actually does what it should do, that it’s really fit for purpose for this type of analysis.
And then also downstream that you can really fully explain what’s going on here. What is done and how it is done and who reviewed this.
So there’s multiple things involved there in the entire chain that you should actually be aware of. And that’s, for me, that’s the part for regulatory, if you look at it from a pharma context, obviously.
Phil Ewels: And what’s, what are the consequences of this? If a regulatory body checks and doesn’t think you’re doing enough or not doing the right thing, what happens?
Alex Peltzer: There are regular checks and I think they are there for a very good reason. If you don’t do a hundred percent and then they will mark these as findings, they’ll tell you, okay, you have to improve here.
It depends a little bit on how severe these findings are. If they are severe, it’s getting worse. To a point where there will be consequences for you as a company, which also will mean a lot of money.
In the end, what you would like to do is to prepare for these types of audits, would like to be able to show people in authorities, We’re doing the right thing. We’re doing it properly. We’re doing it fully fleshed out into all of the details that you need to be aware of.
So that anybody out there could, given the data then you can point them to: this is how we did it, and if they can reproduce it, then they will have some trust, that this has been done properly.
Phil Ewels: So what’s the bulk of the day to day work for this? It sounds like there’s a lot of record keeping, you keeping notes of what it is that you’ve done. Is there other types of work you need to do? In terms of looking ahead of time, choosing pipelines or written documentation.
Alex Peltzer: Yes, so it boils down to do a lot of record keeping, honestly. If you do analysis, you have to do a lot of record keeping. You have to make sure that you know exactly what you’ve done in the past. There’s a lot of quality driven documentation on how these steps should be done, ideally.
Then, there’s also scouting. We know clinical trials usually don’t run in six months or something like that. They run on a much, much longer timescale. So it’s running these across a couple of years, maybe, or some of them might just start in two years and run over three, four years. You also have to scout for new methods. You also have to scout for new analysis technology. So we usually get input quite early in this development.
So whenever there is a new trial planned, they select assays. And then we give input on that from a bioinformatics perspective and check, okay, is the assay even doing what we expect it to do? How do the readouts look like? Can we have some test data? Can we run some pipelines on that already? In many cases, this is standard stuff like RNA seq for example, I mean it’s not like you reinvent the wheel every six, six trials or something like that, but in, in other cases it might become, might be something where it’s much more difficult because it’s something not as established.
B cell receptor sequencing, for example, is a topic that is currently in vogue a little bit. So that could be something of interest as well.
Phil Ewels: And presumably in some cases there is no existing analysis method and you have to go and write something .
Alex Peltzer: Very rare cases, although I would say that’s a bit more of a problem for the early clinical people because usually they’re the ones who are breaking this type of data way, way earlier than we have in the clinical setting.
So in many cases, you also get some notification from colleagues in house already this is what we’ve done in the past. Maybe you want to check it out. And then we can either take the same tooling or figure out, okay, there’s a more production ready way to do this for a clinical trial that we would like to validate that we would like to check whether it’s a fit for purpose.
Phil Ewels: And so if someone comes to you and it’s a new genomics method that you haven’t done before, you do some research to see what’s available already, what things that you’re looking for and what work do you need typically need to do once you select an existing tool?
Alex Peltzer: So my case, I usually tend to acquire some test data if this is possible. So if I can get test data from some big open source academic background, that’s usually a first step.
So applicability is for me, something that I really like to focus on. Some people just publish stuff and they’re there for publishing. And if you look at the tool and the coding in the end you figure out bugs and issues with it. It’s a bit of a different story. If it’s open source stuff, then usually you can contribute to.
So there’s a decision that you have to make at some point: do we invest into this. We actually make the tooling work for us, do we invest at least as much that it works for us, for our use cases. Or do we reinvent something completely from scratch? In many cases, it’s enough to invest a little bit to make it work for your use cases, and that’s what we intend to do.
What I also look for is that it’s usually open tooling, if possible. So that it’s open source, ideally, that it’s something that we can use freely, that we, even adapt to novel use cases. If this is possible, then I tend to try to source tooling because it also is much easier. If something pops up to fix something, to get help, even from outside the company, you can’t fix everything yourself.
If it’s a black box software tooling, then it’s usually very difficult to fix things without talking to the supplier. Might be fine for some use cases, but I found it very difficult, especially for this quick moving biotech, bioinformatics, tooling, assay development, to find somebody who does it in a way that you can actually just deploy it without actually checking that it does the right thing.
Phil Ewels: So that’s focusing on whether the tool works by the sound of it. And we talked also about a record keeping and other stuff, other considerations beyond just the raw functionality.
Alex Peltzer: Quality is also a big thing. You need to identify whether tooling is actually fit for purpose. That means you have to have software tests, ideally you would like to see documentation for the tooling so that it’s not completely unclear what it does.
It really depends a lot on what this tool does. So if it’s just an R script that somebody wrote, it’s relatively easy to check. You can even come up with a copy of that, document it properly and then just use this. In other cases, it’s much, much more difficult. For standard tools let’s say samtools or something like that to read SAM or BAM files, I think there is documentation, there is code testing, there is hundreds, if not thousands of people using this.
There’s a lot of trust in the tooling that it does what it should do. And that you can also then say, this is probably valid and okay code to use. Other cases, if it’s a one man show project from somebody who doesn’t have any documentation up, no tests, no user base, it’s getting much more tricky.
And that’s also something that we briefly touched upon the, when we had the kickoff with the regulatory special interest group one of our auditors, in house auditors was also joining and also commented on that, that it’s a lot about the trust that you have into the software tooling and trust comes with tests, trust comes with test data and documentation and also user base.
Because if you have thousands of people using this, it’s very likely that it’s either stable or if there is a bug that is there, then it’s going to be found and then hopefully also fixed.
Phil Ewels: So effectively you have to be prepared. If an auditor asks you, why did you choose this software? And how do you trust that it works? You need to be able to give them an answer.
Alex Peltzer: Yes, exactly. You have to have documentation for that. You have to have some document where you outline, this is what we did to check. This is why we trust this. This is why we think this is fit for purpose.
Phil Ewels: You were in academia and in QBiC, like how much of a change is this? What percentage of overhead does this requirement add?
Alex Peltzer: 30 percent overhead. At least it depends heavily on what you do in the end. It depends also in what settings you do analysis because also analysis is defined scopes of analysis. So for example, if you do analysis on a very exploratory level, then because the data is not going to be used for anything where you decision making in the end, then it’s fine to do it a little bit less strict, which means your overhead will be lower.
In other cases, maybe you have way more stringent requirements to be followed. So for example, if it’s something that you use for decision making in the clinical trial to continue this, does it make sense? And it’s getting much more difficult. Because then you actually have to make sure that what you do is really properly a hundred percent documented and also fit for this specific decision making purpose.
So that’s a key difference I would say. It really depends. There’s no general answer to that I that
I think think.
Phil Ewels: It’s pretty significant.
Phil Ewels: Before we started this podcast, you were chatting a little bit and you were talking about some work you’d been doing with biomarker analysis. Is that right? Can you tell us a little bit about your experience there?
Alex Peltzer: Yes it’s not that different to what other people are also doing in like academic institutions or bioinformatics core units, I would say. What we do is we get a lot of patient data from clinical trials. For example, let’s say standard types like RNA seq, but also more exotic ones like mutational calling, some specific assays that you actually get readouts from.
And then you have to process that data yourself. You can actually use Nextflow pipelines, inhouse, nf-core, whatever you want, as long as it’s good for purpose. And then the outputs of that. You then also have to downstream analysis. Do I see some genes being differentially expressed between my treatment. Not a placebo group. Do I see something over time, for example, longitudinal analysis, , so that you see, okay, a patient got the first dose and I see a certain group of genes which we hypothesize about for example, going up in regulation or This type of analysis is usually something that you do in R or Python or something like that, not in a pipeline setting anymore that also has to be reviewed and quality controlled.
You also have to check that the code that somebody wrote actually does the right thing. It’s properly documented. That it doesn’t do nasty hacks, for example, with the data so that, the code is really also something that you can really reproduce in a year, two years without having any issues or even run the time frame.
It also is stored away, that it’s versioned, that you can actually make sure that everything actually ends up in storage and archive.
Phil Ewels: And so I keep pointing back to previous podcast episodes, but the last one, 43, but it was about data studios, which is this a feature we’re developing at secure. And this is exactly the thing we were talking about there. I think I was saying like, there’s no point in having a super reproducible analysis pipeline in Nextflow. If you then go off and do something totally undocumented and non reproducible for the final final stage of the analysis, I think that’s one of the more difficult parts of what we do with reproducible by informatics.
Alex Peltzer: it’s also more difficult in a sense that for pipeline, it’s usually very much defined what it does in the end time. It’s like a fixed input, fixed output, more or less. If you point it to a certain code revision, you just use that code tech and then it does exactly what you want. After that freedom starts to some extent, you can actually do a little bit more in various directions, which is also, I think, fine to story, your mind, what you can actually see, are there some trends from a different angle.
Now, this is a bit different than what you actually do with the pipeline, but I fully agree. Become the point where you actually lose all of this reproducibility and checkability and, yeah, of your analysis, because then what’s the point in doing pipeline reproducibility if you just give up on anything after
Phil Ewels: Exactly.
Phil Ewels: Tell us a little bit about your thoughts with regulatory work and how we started off this nf-core special interest group.
Alex Peltzer: So we had internal discussions on how to move forward with these regulatory discussions around the pharma context. From a pharma perspective, there’s a little bit of a push towards making also the pipeline part a little bit more open in terms of that you can also run pipelines in a dedicated setting so that there’s also some requests, for example, for future audits, that potentially FDA and EMA as regulatory bodies in the U.S. versus the European Union are actually also trusting this pipelines. For example, the FDA also has this biocompute project nowadays, but they also run pipelines and try to find out whether they are really reproducible . There’s some push from these authorities, I think, in that direction.
Phil Ewels: Yeah Boston earlier this year the Nextflow Summit in Boston, there was a talk by Jonathan who’s one of the original people working on biocompute objects and he talked a little bit about his involvement with the FDA.
The BCO files are now officially recognized, I think, by the U.S. Food and drug association. We also have RO crates in Europe, which the two are very similar. It’s a big JSON object describing all the outputs of a pipeline and what ran. So that you can look at a standardized file format , doesn’t necessarily matter what the pipeline was or what language the pipeline was written in, but you have a standardized JSON file that the auditors can look at and understand what ran and what was produced.
Alex Peltzer: Yeah. I think exactly these type of developments are nowadays starting and also have been started like a year ago, two years ago, maybe. But we’re still early there. For example a lot of the pipelines that people are using are usually in house pipelines are usually validated in an in house setting.
You also have to make sure that all the documentation fits with your own quality framework that you have within your company. So that might be different between companies as well to some extent at least. Although it’s usually aligned with what the authorities want, but what I thought was with this push towards doing this with nf-core was that there is a large overlap between companies, I think doing them more or less the same thing.
We’ve also seen people from Novo Nordisk in the regulatory kickoff meeting. We’ve also seen people having interest from other pharma companies, other providers were also having interest in this topic because they’re seeing the same issues that we have as well. You have to make sure that the documentation is up to date, you have to make sure that there are tests for your pipeline, you have to make sure that there are certain guidelines that are adhered to, you have to prove that, you have to also document that in house to some extent.
But if we come up with good standards, good ideas on how to do this within nf-core, how to employ the nf-core standards that are already there: it’s not like we don’t have anything. It’s there is a huge framework around that, nf-test, and it’s been developing quite substantially over the last couple of years. It’s really amazing what you can actually do now. my opinion, it’s just that you have to point people to that, that you have to make sure that, for example, if there are new updates coming from pipeline, that they are reviewed properly, that there’s some tests done, that there’s some documentation updates.
And then I think we can push it in a direction that helps everybody.
Phil Ewels: That’s my feeling with much of this regulatory stuff. It’s not really necessarily anything novel. It’s already just best practices that everyone in the research environment should be using, whether you’re in academia or pharma or anywhere, pull request reviews and tests and documentation are a good thing.
I don’t think anyone disagrees with that. I guess the point of the regulatory work is just to ensure that is in place.
Alex Peltzer: Exactly. And I think that’s the point. We would like to look into nf-core, what’s already there, what is already available, are there ways also more openly, more actively also promote this a little bit because one of the feedbacks we got from our quality colleagues was it wasn’t so easy to find things like that.
For example, for somebody who’s not involved in nf-core at all, and usually an auditor would not be involved in nf-core, I would assume they would like to find this very easily. And our idea was, can we, collect this feedback? Can we collect all of the interesting bits that we need to promote to these people with an audit background, promote to authorities as well, that this is already done, that we adhere certain standards, that there’s guidelines out there that people have to follow And then put this on a summary page, or something like an extra label on, on pipelines that adhere to this very strictly.
Things like that could be discussed, could be brought up and formed in the community and then can help people doing this qualification of these pipelines later on much better than it’s currently the case.
Phil Ewels: At the moment you have to say to the auditor, we’ve got a three day training course on Nextflow. You can try it’s maybe not going to fly.
Alex Peltzer: Maybe not.
Phil Ewels: I want to just quickly take a step back as well. The idea of this special interest group then is that it’s a way to bring people together working in a specific field or with a specific interest, but across multiple nf-core pipelines. So the discussion isn’t just siloed per pipeline.
And then try and foster collaboration of other things. The community collaborates in many ways. It’s not just writing code. It’s not just bug fixing a specific pipeline. This is such a good example of where we can collaborate in a secondary way.
The regulatory special interest group is I think it’s the second one that’s been set up and if anyone is interested, you go to the nf-core website, hit community and as a page for special interest groups here. So we’ve got four of them live as of today and Regulatory is there with a little description and list of key people and list of key pipelines and stuff.
Without going into loads of detail, maybe Alex, tell us a little bit about how that started and how it’s been received so far.
Alex Peltzer: The kickoff was in early July . Lots of people actually joined in with different backgrounds . We had people from pharma context, we had people from academic institutions, core units. SciLifeLab had people, QBiC had people, for example, to just mention where we came from.
But there’s also people just from the community who don’t have an affiliation at the moment. There’s also care providers who joined in and we even had people purely out of interest . And I think the kickoff went very good. It was a lot of collection of ideas, a lot of collection of what people see as in scope of this regulatory special interest group.
The difficulty now I think is to really come up with a more stringent way of Looking at these things to collect all of these ideas. We already started the Google docs that people could contribute to, which already has quite some substantial cross links and cross references to a lot of different initiatives slash where people already started collecting topics around regulatory topics.
Now we actually have to go through this and maybe also sort out things in different subgroups so that we can also focus on these distinguishly in, in separate subgroups.
I think there is a strong community of people from a pharma background who would like to more focus on more authority related points.
However, we also have people with a clinical diagnostics background. I think there is an overlap between what we do in both settings, but we need to identify first. Where is this overlap exactly? How large it is really can we take an influence? Can we also come up with standards that would apply for both?
And in other cases are there some standards that we would like to specifically highlight? Okay. If you want to use this pipeline for anything that goes past this point, then you would have to obey these rules, this rules, make sure that you follow these kinds of guidelines.
It’s a bit early to talk about that because I also have to digest it. It’s. 20 page nowadays already, and I don’t have had the chance to actually go through it entirely. But I think there’s a lot of people who already have some background in this and willing to contribute.
And that’s already a good success, I guess.
Phil Ewels: I think the attendance for that meeting and the enthusiasm that people hit with it really shows that this is a key area that many people are interested in. You never really know until you actually try and organize a kickoff like this, but I think, I think just for a number of people commenting on LinkedIn about it and stuff as well.
I think it’s a real hot area and and what I love about it as well, like all of nf-core, it’s very grassroots. Like the people in that meeting, the people driving this are the people who are actually doing this work. And it’s in everyone’s best interests and hopefully just, it’s a win win situation for everybody, .
Alex Peltzer: Yeah, just maybe as a comment on from our end, my hope personally would be that if you come up with good standards based on the guidelines and maybe also add some additional guidelines for these specific settings that we would like to target for that authorities like FDA, EMA, etc. We’ll just pick this up together with us, maybe even just say, okay this looks very good.
Can actually have this also as part of the recommendations on how to do these things properly. Because at the moment there is not that much around. Other people from other companies tell us more or less the same, they have their own standards, they have their own developments to some extent, and they try to make sure that they follow exactly what is out there.
If it’s not covered yet, then I think it’s a good time now to collect across the community how to do things on the pipelines properly so that you can use them in such trials settings easily. And then also bring that up with authorities. Our intent also was to actually invite people from authorities to also join these regulatory special interest group calls at some point and maybe also do a presentation and bring their perspective in.
Phil Ewels: Exactly. Cause it’s in their interest as well. I imagine if you’re an auditor, you don’t want to have to figure out and understand 10 different systems. If everyone’s just using the same pipeline, it’s going to save you a lot of time and effort.
Alex Peltzer: Yes
Phil Ewels: So you’ve touched how you use open source software, but obviously you’re a keen advocate of open source yourself. And you’ve written nf-core pipelines from scratch whilst being at BI. I’ve forgotten the name of the pipeline now.
Alex Peltzer: Nanostring pipeline, for example, is one of the contributions we made.
Phil Ewels: And you say you’ve written from scratch there and pushed to nf-core and made that open for the community, which I think is, it’s fantastic.
One of the things I think people may be outside of pharma often view pharma with a, this slant that it’s very commercial, there’s lots of things about privacy data protection obviously is important but also down to the code and intellectual property and everything.
From your point of view inside such a company , obviously you’re using and contributing a lot of open source work, which might be surprising to some, is there a cultural difference or is it just is it a wrong assumption for people to make?
Alex Peltzer: I think it is something that is about to change or it was changing the last couple of years. A lot of people that I’ve met at different conferences with also pharma background are more or less telling me the same stories now. It’s impossible these days to actually do analysis on any of the novel data types that you’ve seen across clinical trials, clinical research without using open source tooling. You can’t just reinvent the wheel if there’s a proper tooling out there with open source. And that boils down to, is it something that you just use or is it really something that you also have to understand? In my opinion, you also have to understand this tools because in the end, you also have to prove that you understand what you’re doing there. If something is broken, you have to fix it. And if you can fix it in an open source way ,in my personal opinion and I think a lot of people think very much the same also in the pharma context nowadays, it’s much better to fix it upstream in the open source community in the actual project that works with these tools. Let it be challenged there as well, critically challenged.
I know who I’m talking to here. Your pull request reviews are famous for me as well. For people who don’t know, I also have a special title in nf-core, the Merge Cowboy, because I was keen of merging pull requests a bit early.
Earlier than most people, maybe. Phil was always one person who was really very strict about pull request reviews and gave a lot of feedback, which helps in many cases also to improve it. That’s, also something you have to be aware of.
Phil Ewels: I think it’s going back a bit now. I do fewer pull requests than I used to. I think it’s a somewhat dubious reputation to have. It’s oh no, Phil’s going to drop 200 comments on my pull request. But I think it’s, especially in the early days of nf-core, of course, it was important. It helps with that standardization a lot.
Alex Peltzer: Yeah, it really helps. And I think that’s also something changing with pharmas now as well. I don’t know about how it was 15 years ago, 20 years ago. I wasn’t working there, but my personal experience nowadays is we have multiple people who are also in multiple open source communities, multiple open source projects who are actively contributing.
Obviously there has to be a clear focus. It’s not like I can contribute whatever I want. I can’t work on ancient DNA analysis pipelines anymore, but if there’s a good justification, it’s usually possible for people to.
Phil Ewels: Unfortunately mammoths don’t need very many drugs.
Alex Peltzer: They won’t help anymore, I guess.
Phil Ewels: I’m personally super excited about this topic. It’s hugely important to good science, to, good healthcare, good development.
I feel like this is a pivotal moment for nf-core a little bit and hopefully as part of a larger bioinformatics picture of coming together, collaborating helping one another. And I’m really excited by the enthusiasm that we’ve seen in the community so far and very keen to see what comes out in the months to come.
The next regulatory meeting is coming up pretty soon,
Alex Peltzer: I think it was set up for September 9th.
Phil Ewels: Still plenty of time for people to join and contribute to that. And then after that, the next thing will be, you’re giving a talk at the summit, in October, so it’s two fairly big milestones coming up for that special interest group.
Is there anything else that would be useful for people to know who are listening, who are interested in this to find out more information or get involved?
Alex Peltzer: If you want to get involved, please just join the Slack channel. Please just join the group there. We will announce every meeting that we will have in the future on the nf-core webpage.
My focus is clearly on pharma, obviously, because that’s what I do. But there will be probably also subgroups working with a different angle on this, on these topics and just collecting information around this from a different angle. We will discuss this and it’s really early days still.
We just had the kick off. Please just get involved.
Phil Ewels: And the Slack channel is called hash regulatory, so that’s the best place.
And all of the meetings are being recorded or live streamed and going to YouTube. I think we’ve got a playlist called nf-core regulatory, special interest group.
So there’s a playlist which will be maintained. At the moment there’s just the one talk in it, which is the kickoff meeting. So even if you missed that kickoff, you can still go back and catch up.
Thanks so much for joining me today. It’s been absolute pleasure as always chatting to you. I’m looking forward to seeing you in person in a few weeks. And and yeah, and everyone listening just final plug: remember that early bird 20 percent discount is going to run out in just a couple of weeks.
So get your skates on, get that ticket and hopefully meet many of you listening in person in a few weeks as well.
Alex Peltzer: Join the full week, not just the summit. The hackathon is also super fun to join. I really miss the last ones. And this time I really look forward to going.
Phil Ewels: This is not always super clear to people, people see the word hackathon and they think, Oh, it’s just going to be super nerdy coding. But there’s a big chunk of people who really spend most of their time just chatting or planning and see all kinds of outputs from a hackathon, which are not just pull requests. It’s bigger than that, and this is a great example of that.
Fantastic. Thanks very much, Alex.