WrightonLabCSU/dram pipeline parameters¶
DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes
Pipeline Steps¶
Which steps to run in the pipeline.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Rename FASTA headers based on file name. Example: sample1.fa –> (fasta header renamed to) > sample1…… Why? DRAM output is focused on scaffolds/contigs with respect to each provided input input_fasta. Thus, without renaming FASTA headers, the individual scaffolds/contigs will not be distinguashable. *If you have already renamed your FASTA headers, do not include with ‘–call’. |
|
|||
|
Whether to call genes on the input FASTA files using Prodigal. |
|
|||
|
Annotate called genes using downloaded databases. |
|
|||
|
Path to directory pointing to DRAM annotation to merge into one file. This is ran as a separate pipeline. |
|
|||
|
Topic to distill. Options: <carbon |
energy |
misc |
nitrogen |
transport |
|
Ecosystem to distill. Options: <eng_sys |
ag> If more than one ecosystem included, they must be seperated by a comma (–distill_ecosystem eng_sys,ag or –distill_ecosystem ‘eng_sys,ag’). |
|
||
|
<path/to/custom_distillate.tsv> Custom distill file to use. Can also supply a comma separated list of custom distill files to use. If more than one file included, they must be seperated by a comma (–distill_custom path/to/file1.tsv,path/to/file2.tsv or –distill_custom ‘file1.tsv,file2.tsv’). The custom distill file must be in TSV format with the first column being the gene ID and the second column being the distillate value. |
|
|||
|
Generate a product visualization of the annotations and save the output to the output directory. |
|
|||
|
Using a DRAM annotations file, make a table of adjectives. Still in development, but can still be used. |
|
|||
|
Format KEGG database for use in DRAM. Standalone operation, will exit after completion. |
|
Input/output options¶
Define where the pipeline should find input data and save output data.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. |
|
True |
Reference genome options¶
Reference genome related files and options required for the workflow.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Path to FASTA directory |
|
|||
|
Input format for the FASTA file. |
|
.f |
||
|
Directory containing called genes. Only used when not running call. This allows you to provide pre-called genes to the pipeline. |
|
|||
|
Input format for the Genes file. |
|
*.faa |
||
|
Input format for the Genes fna file. Only needed if you are not running call and you are passing |
|
*.fna |
Call Prodigal Options¶
Call genes on the input FASTA files using Prodigal.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Mode for Prodigal gene calling. |
|
meta |
||
|
Translation table to use. |
|
11.0 |
||
|
Minimum contig length in base pairs. |
|
2500.0 |
Annotate Options¶
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Use the CAMPer database for annotation. |
|
|||
|
Use the Cant_Hyd database for annotation. |
|
|||
|
Use the DBCan database for annotation. |
|
|||
|
Use the FeGenie database for annotation. |
|
|||
|
Use the KEGG database for annotation. |
|
|||
|
Use the Kofam database for annotation. |
|
|||
|
Use the MEROPS database for annotation. |
|
|||
|
Use the Methyl database for annotation. |
|
|||
|
Use the PFAM database for annotation. Currently disabled for this release as a bug was discovered with the pfam DRAM2 implementation. It will be re-enabled in the next release. |
|
|||
|
Use the Sulfur database for annotation. |
|
|||
|
Use the UniRef database for annotation. |
|
|||
|
Path to old annotations to add to the current run. |
|
|||
|
Minimum BitScore of search to retain hits’) |
|
60.0 |
||
|
Minimum BitScore of reverse best hits to retain hits |
|
350.0 |
||
|
Database list for generating GFF and GBK. Comma sepeterated list of databases to include in the annotates. Use empty for all. Example: ‘kegg,dbcan,kofam,merops,viral,camper,cant_hyd,fegenie,sulfur,methyl,uniref,pfam,vogdb’ |
|
empty |
||
|
Generate GFF output for each input_fasta. |
|
|||
|
Generate GBK output for each input_fasta. |
|
Distill Options¶
The purpose of DRAM distill is to distill down annotations based on curated distillation summary form(s). It can be ran with either –distill_topic, –distill_ecosystem, or –distill_custom (or some combination). User’s may also provide a custom distillate via –distill_custom <path/to/file> (TSV forms). Distill can be ran independent of –call and –annotate however, annotations must be provided (–annotations <path/to/annotations.tsv>). Optional tRNA, rRNA and bin quality may also be provided. When using distll, you also must use kegg, ko, dbcan, or merops when you do your annotation. To get the full results, you need to use kegg or ko and dbcan and merops.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Path to annotations tsv to distill. Required if running without annotate |
|
|||
|
File path to distill dummy sheet. |
|
True |
Product Options¶
The purpose of DRAM –product is to generate a product visualization of the annotations and save the output to the output directory.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Column to to group by in the annotations file for etc and function groupings. Defaults to DRAM’s annotation output fasta column name. |
|
Adjectives Options¶
The purpose of DRAM –adjectives is to use the outputted DRAM annotations file to make a table of adjectives. You need to use the Kegg, FeGenie, and Sulfur Databases.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
A comma seperated list of adjectives (‘adj1,adj2,adj3’), by name, to evaluate. This limits the number of adjectives that are evaluated and is faster. |
|
RNA Options¶
rRNA and tRNA input sheets, used when running DRAM distill without –call
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Path to rRNA tsv. |
|
|||
|
Path to tRNA tsv. |
|
Bin Quality and Taxonomy Options¶
Paths to bin quality and taxonomy tsvs, used after annotate and before distill
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Path to bin quality tsv. CheckM and CheckM2 compatible. |
|
|||
|
Path to bin taxonomy tsv. Compatible with GTDB. |
|
Database Options¶
File paths to databases used in the workflow.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
|
${launchDir}/databases/kegg/ |
True |
||
|
|
${launchDir}/databases/uniref/ |
True |
||
|
|
${launchDir}/databases/pfam/mmseqs/ |
True |
||
|
|
${launchDir}/databases/merops/ |
True |
||
|
|
${launchDir}/databases/viral/ |
True |
||
|
|
${launchDir}/databases/kofam/ |
True |
||
|
|
${launchDir}/databases/kofam/kofam_ko_list.tsv |
True |
||
|
|
${launchDir}/databases/dbcan/ |
True |
||
|
|
${launchDir}/databases/dbcan/dbcan.fam-activities.tsv |
True |
||
|
|
${launchDir}/databases/dbcan/dbcan.fam-activities.tsv |
True |
||
|
|
${launchDir}/databases/vog/ |
True |
||
|
|
${launchDir}/databases/vogdb/vog_annotations_latest.tsv.gz |
True |
||
|
|
${launchDir}/databases/camper/hmm/ |
True |
||
|
|
${launchDir}/databases/camper/hmm/camper_hmm_scores.tsv |
True |
||
|
|
${launchDir}/databases/camper/mmseqs/ |
True |
||
|
|
${launchDir}/databases/camper/mmseqs/camper_scores.tsv |
True |
||
|
|
${launchDir}/databases/canthyd/hmm/ |
True |
||
|
|
${launchDir}/databases/canthyd/hmm/cant_hyd_HMM_scores.tsv |
True |
||
|
|
${launchDir}/databases/canthyd/mmseqs/ |
True |
||
|
|
${launchDir}/databases/canthyd/mmseqs/cant_hyd_BLAST_scores.tsv |
True |
||
|
|
${launchDir}/databases/fegenie/ |
True |
||
|
|
${launchDir}/databases/fegenie/fegenie_iron_cut_offs.txt |
True |
||
|
|
${launchDir}/databases/sulfur/ |
True |
||
|
|
${launchDir}/databases/methyl/ |
True |
||
|
|
${launchDir}/databases/db_descriptions/description_db.sqlite |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-15 |
True |
||
|
|
1e-1 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
||
|
|
1e-05 |
True |
Format Kegg Options¶
Options for preparing the KEGG database for use in DRAM.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Only required if you need to concatenate all of KEGG’s provided pep files. Root directory to downloaded KEGG peptide files. The pipeline will search for all pep files in this directory and concatenate them into a single file. In format <root_dir>//.pep. |
|
|||
|
Path to and of the gene fasta files that are provided by the KEGG FTP server or a concatenated version of them. Either this or kegg_pep_root_dir must be provided. |
|
|||
|
Path to and of the KO file that is provided by the KEGG FTP server. |
|
|||
|
Skip the gene_ko_link file. If you are using an older version of KEGG that does not supply the gene_ko_link file you can use this option to skip the gene_ko_link file. Otherwise, the gene_ko_link file is required. |
|
|||
|
The date the KEGG database was downloaded. If not provided, the current date will be used. |
|
yyyy-MM-dd |
SLURM Options¶
Generic options for SLURM job submission. More customized options can be configured in your own Nextflow config file. See https://www.nextflow.io/docs/latest/executor.html#slurm and example configs here: https://github.com/nf-core/configs/tree/master/conf
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Launch the pipeline using the SLURM executor. Without this option, the pipeline will run in your current shell/environment. |
|
|||
|
Name of the SLURM partition to use for job submission. If not provided, the default partition will be used. |
|
|||
|
Name of the SLURM Node to use for job submission. If not provided, the default node will be used. |
|
|||
|
Maximum number of jobs to submit to the queue at once. |
|
10 |
Institutional config options¶
Parameters used to describe centralised config profiles. These should not be edited.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
Git commit id for Institutional configs. |
|
master |
True |
|
|
Base directory for Institutional configs. |
|
https://raw.githubusercontent.com/nf-core/configs/master |
True |
|
|
Institutional config name. |
|
True |
||
|
Institutional config description. |
|
True |
||
|
Institutional config contact information. |
|
True |
||
|
Institutional config URL link. |
|
True |
Generic options¶
Less common options for the pipeline, typically set in a config file.
Parameter |
Description |
Type |
Default |
Required |
Hidden |
|---|---|---|---|---|---|
|
|
10 |
True |
||
|
Display version and exit. |
|
True |
||
|
Method used to save pipeline results to output directory. |
|
copy |
True |
|
|
Email address for completion summary. |
|
True |
||
|
Email address for completion summary, only when pipeline fails. |
|
True |
||
|
Send plain-text email instead of HTML. |
|
True |
||
|
File size limit when attaching MultiQC reports to summary emails. |
|
25.MB |
True |
|
|
Do not use coloured log outputs. |
|
True |
||
|
Incoming hook URL for messaging service |
|
True |
||
|
MultiQC report title. Printed as page header, used for filename if not otherwise specified. |
|
True |
||
|
Custom config file to supply to MultiQC. |
|
True |
||
|
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file |
|
True |
||
|
Custom MultiQC yaml file containing HTML including a methods description. |
|
|||
|
Boolean whether to validate parameters against the schema at runtime |
|
True |
True |
|
|
Base URL or local path to location of pipeline test dataset files |
|
https://raw.githubusercontent.com/nf-core/test-datasets/ |
True |
|
|
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss. |
|
yyyy-MM-dd_HH-mm-ssZ |
True |