Instructions
Version
You can choose which version of the software you want to use from
the drop down list.
Input data type
You must select what format your file has. For now, PathogenFinder2
only accepts input files in FASTA format.
The fasta file must contain the genomic data of one bacterial
isolate. For more than one input, consider using the GitHub
repository locally. The files must not be compressed.
To avoid problems caused by file names, we only allow a limited
selection of ASCII characters: a-z, A-Z, 0-9, "_" (underscore),
"-" (hyphen), "." (full stop)
Upload and submit job
Click on the 'Submit job' button to submit your job after having attached
the files. The waiting page will be displayed and constantly updated until
it terminates, and the server output page appears in your browser. You also
have the option to input your email and be notified as soon as your
results are ready. The data is available for one week from the moment the
results are created.
Output
PathogenFinder2 prediction comes from an ensemble of 4 neural networks. Therefore,
four different predictions are reported, each one being a number between 0 (without
pathogenic capacity) and 1 (with pathogenic capacity). but for how close the
bacteria is to the decision border of PathogenFinder2. It is valid to use
the mean of the four values, but it is recommendable to take into account
the separate predictions when taking decisions about the nature of the
bacteria.
PathogenFinder2 prediction comes from an ensemble of 4 neural networks.
Therefore, four different predictions are reported, each one being a number
between 0 (without pathogenic capacity) and 1 (with pathogenic capacity).
This number does not correlate with the pathogenic capacity, but for how
accurate the prediction is (the closest to 0.5, the more unsure the neural
network is about the pathogenic capacity). It is valid to use the mean of
the four values, but it is recommendable to take into account the separate
predictions when taking decisions about the nature of the bacteria.
As a standard, PathogenFinder2 will report a results file ("results.tsv"),
as well as the embeddings file and the attentions scores file
("embeddings.npz" and "attentions.npz", respectfully). Intermediate files,
like the predicted proteins or/and the embeddings file, are reported in case
they were produced when using PathogenFinder2.
In case the option for mapping the top proteins highlighted by the attentions
score to UniRef50 is selected, a table with the results will be also
displayed (unavailable at the moment) as well as possible to download
("meh.tsv"). In case the option for mapping the embeddings to the Pathogenic
Bacterial Landscape is selected, the image and the closer neighbours will be
available for download.
PathogenFinder2 has four outputs, two standard and two supplementary:
Bacterial Pathogenic Capacity prediction: Explains the prediction
of the neural networks on Bacterial pathogenic capacity. This will
always be part of the output.
Highlighted proteins during the pathogenic capacity prediction:
It shows the matches on UniRef50 of the 20 most relevant proteins for
each neural network to predict pathogenicity. This will only be part of
the output if Map the 20 most relevant proteins to UniRef50 option has
been selected.
Bacterial pathogenic Landscape & Closest Bacteria in the Bacterial
Pathogenic Landscape: It shows the location of the sample in the
pathogenic landscape, as well as the 10 closest bacteria to your sample
in the Bacterial Pathogenic Landscape. This will only be part of the
output if Map your sequence to the Pathogenic Bacteria Landscape option
has been selected.
Downloads: Section where you can download all the files produced
by and during PathogenFinder2 run. This will always be part of the
output, but the amount of files will depend on the options selected.