PathoGFAIR

PathoGFAIR: a collection of FAIR and adaptable (meta)genomics workflows for (foodborne) pathogens detection and tracking

PathoGFAIR is a collection of Galaxy-based FAIR workflows employing state-of-the-art tools to detect and track pathogens from metagenomic Nanopore sequencing. Although initially developed to detect pathogens in food datasets, the workflows can be applied to other metagenomic Nanopore pathogenic data. PathoGFAIR incorporates visualisations and reports for comprehensive results.

PathoGFAIR implementation

The core of PathoGFAIR project is a series of 5 Galaxy-based workflows designed to process Nanopore sequencing data, detect pathogens, and track their presence across samples:

plot

Where to find the workflows?

Workflows are available on 2 workflows registries (Dockstore and WorkflowHub) and several Galaxy servers.

Workflow Name WorkflowHub Dockstore Galaxy Servers
Nanopore Preprocessing (v 0.1) ID 1061 v 0.1 nanopore-pre-processing/main:v0.1 European Galaxy Server, United States Galaxy Server, Australian Galaxy Server
Taxonomy Profiling and Visualization with Krona (v 0.1) ID 1059 v 0.1 taxonomy-profiling-and-visualization-with-krona/main:v0.1 European Galaxy Server, United States Galaxy Server, Australian Galaxy Server
Gene-based Pathogen Identification (v 0.1) ID 1062 v 0.1 gene-based-pathogen-identification/main:v0.1 European Galaxy Server, United States Galaxy Server, Australian Galaxy Server
Allele-based Pathogen Identification (v 0.1) ID 1063 v 0.1 allele-based-pathogen-identification/main:v0.1 European Galaxy Server, United States Galaxy Server, Australian Galaxy Server
Samples Aggregation and Visualisation (v 0.1) ID 1060 v 0.1 pathogen-detection-pathogfair-samples-aggregation-and-visualisation/main:v0.1 European Galaxy Server, United States Galaxy Server, Australian Galaxy Server
PathoGFAIR 5in1 (v 0.1) Soon Soon European Galaxy Server, United States Galaxy Server, Australian Galaxy Server

How to learn to use the workflows?

To assist in understanding and using the workflows, we provide extensive tutorial and recording available via the Galaxy Training Network GTN.

Use Cases

To demonstrate PathoGFAIR and its features, 130 samples from 2 studies (without or with prior pathogen isolation) were analysed.

All samples contained pathogens known beforehand and were sequenced using Oxford Nanopore technology.

Samples Without Prior Pathogen Isolation

Pathogens were deliberately spiked into 46 samples to mimic real-world scenarios given a protocol developed in the context of PathoGFAIR.

The full analysis can be found in a dedicated Galaxy history.

Samples With Prior Pathogen Isolation

To further test PathoGFair, 84 public datasets were used. The full analysis can be found in a dedicated Galaxy history.

Benchmarking

To evaluate the effectiveness of PathoGFAIR workflows, a benchmarking analysis was performed comparing PathoGFAIR’s pathogen detection capabilities with the systems and pipelines.

This section provides detailed instructions to replicate the PathoGFAIR benchmarking process, as outlined in our dedicated protocol on protocols.io. The focus here is on running the selected systems/pipelines used in our benchmarking.

PathoGFAIR

CZID (IDseq)

BugSeq

GitHub Repository

In the GitHub repository, you’ll find sources (Jupyter notebooks) to replicate figures in the manuscript.

The notebooks are also designed to run on any Galaxy instance using Jupytool.

Folder structure

Requirements

To reproduce the figures in results, we need to run the notebooks bin and then needs the following:

This can be installed with a conda environment:

$ conda env create -f environment.yml

Usage

Contributors

Contribution

Feel free to contribute, open issues, or provide feedback.

Citation

If you use or refer to this project in your research, please cite the associated paper:

PathoGFAIR: a collection of FAIR and adaptable (meta)genomics workflows for (foodborne) pathogens detection and tracking Engy Nasr, Anna Henger, Björn Grüning, Paul Zierep, Bérénice Batut bioRxiv 2024.06.26.600753; doi: https://doi.org/10.1101/2024.06.26.600753

Sources

All sources for the figures and tables in the paper can be found in this GitHub repository

Figures

Tables