WrightonLabCSU/dram pipeline parameters

DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes

Pipeline Steps

Which steps to run in the pipeline.

Parameter

Description

Type

Default

Required

Hidden

rename

Rename FASTA headers based on file name. Example: sample1.fa –> (fasta header renamed to) > sample1…… Why? DRAM output is focused on scaffolds/contigs with respect to each provided input input_fasta. Thus, without renaming FASTA headers, the individual scaffolds/contigs will not be distinguashable. *If you have already renamed your FASTA headers, do not include with ‘–call’.

boolean

call

Whether to call genes on the input FASTA files using Prodigal.

boolean

annotate

Annotate called genes using downloaded databases.

boolean

qc

Run quality control steps such as collect rRNA and tRNA scans. See QC section for more options.

boolean

summarize

Summarize annotations into a distilled output file. Will use all available topic distillate forms. Ecosystem

boolean

visualize

Generate a product visualization of the annotations and save the output to the output directory.

boolean

traits

Using a DRAM annotations file, make a table of adjectives. Still in development, but can still be used.

boolean

merge_annotations

Path to directory pointing to multiple DRAM annotations to merge into one file. This is ran as a separate pipeline.

string

format_kegg

Format KEGG database for use in DRAM. Standalone operation, will exit after completion.

boolean

Input/output options

Define where the pipeline should find input data and save output data.

Parameter

Description

Type

Default

Required

Hidden

outdir

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

string

True

input_fasta

Path to FASTA directory

HelpThis parameter is mandatory.

string

fasta_fmt

Input format for the FASTA file.

string

.f

input_genes

Directory containing called genes. Only used when not running call. This allows you to provide pre-called genes to the pipeline.

string

genes_fmt

Input format for the Genes file.

string

*.faa

genes_fna_fmt

Input format for the Genes fna file. Only needed if you are not running call and you are passing --generate_gff or --generate_gbk.

string

*.fna

Call Prodigal Options

Call genes on the input FASTA files using Prodigal.

Parameter

Description

Type

Default

Required

Hidden

prodigal_mode

Mode for Prodigal gene calling.

string

meta

prodigal_trans_table

Translation table to use.

number

11.0

min_contig_len

Minimum contig length in base pairs.

number

2500.0

Annotate Options

Parameter

Description

Type

Default

Required

Hidden

anno_dbs

Alternative way to specify database list for annotation. Comma sepeterated list of databases to include in the annotates. Use all for all. Example: ‘kegg,dbcan,kofam,merops,viral,camper,cant_hyd,fegenie,sulfur,methyl,uniref,pfam,vogdb’. When in doubt, use the name after use_ for each database. This option overrides individual use_ database flags. (WARNING, this option name may change in the future)

string

use_camper

Use the CAMPer database for annotation.

boolean

use_canthyd

Use the Cant_Hyd database for annotation.

boolean

use_dbcan

Use the DBCan database for annotation.

boolean

use_fegenie

Use the FeGenie database for annotation.

boolean

use_kegg

Use the KEGG database for annotation.

boolean

use_kofam

Use the Kofam database for annotation.

boolean

use_merops

Use the MEROPS database for annotation.

boolean

use_methyl

Use the Methyl database for annotation.

boolean

use_pfam

Use the PFAM database for annotation. Currently disabled for this release as a bug was discovered with the pfam DRAM2 implementation. It will be re-enabled in the next release.

boolean

use_sulfur

Use the Sulfur database for annotation.

boolean

use_uniref

Use the UniRef database for annotation.

boolean

use_metals

Use the Metals database for annotation.

boolean

use_vog

boolean

use_viral

boolean

True

add_annotations

Path to old annotations to add to the current run.

string

bit_score_threshold

Minimum BitScore of search to retain hits’)

number

60.0

rbh_bit_score_threshold

Minimum BitScore of reverse best hits to retain hits

number

350.0

database_list

Database list for generating GFF and GBK. Comma sepeterated list of databases to include in the annotates. Use all for all. Example: ‘kegg,dbcan,kofam,merops,viral,camper,cant_hyd,fegenie,sulfur,methyl,uniref,pfam,vogdb’

string

all

generate_gff

Generate GFF output for each input_fasta. This option is in development right now and may not work properly, and may not work with large numbers of inputs.

boolean

generate_gbk

Generate GBK output for each input_fasta. This option is in development right now and may not work properly, and may not work with large numbers of inputs.

boolean

QC Options

Quality control options for DRAM workflow. The QC step collects rRNA and tRNA scans using Barrnap and tRNAscan-SE for the genome_states.tsv output as a baseline. Additional options can be found below.

Parameter

Description

Type

Default

Required

Hidden

bin_quality

Path to bin quality tsv. CheckM and CheckM2 compatible.

string

taxa

Path to bin taxonomy tsv. Compatible with GTDB.

string

Summarize Options

The purpose of DRAM summarize is to distill down annotations based on curated distillation summary form(s). It can be ran with either just –sumarize, or currated sheets with –distill_topic, –distill_ecosystem, or –distill_custom (or some combination). User’s may also provide a custom distillate via –distill_custom <path/to/file> (TSV forms). Summarize can be ran independent of –annotate however, annotations must be provided (–annotations <path/to/annotations.tsv>). Optional tRNA, rRNA and bin quality may also be provided. When using distll, you also must use kegg, ko, dbcan, or merops when you do your annotation. To get the full results, you need to use kegg or ko and dbcan and merops.

Parameter

Description

Type

Default

Required

Hidden

sum_ecos

Ecosystem to distill. Options: <eng_sys

ag> If more than one ecosystem included, they must be seperated by a comma (–sum_ecos eng_sys,ag or –sum_ecos ‘eng_sys,ag’).

string

distill_topic

A way to specify a subset of topics instead of all topics. By default, if --sumarize is specified, all topics are included. Options: <carbon

energy

misc

nitrogen

transport

sum_custom

<path/to/custom_distillate.tsv> Custom distill file to use. The custom distill file must be in TSV format with the first column being the gene ID and the second column being the distillate value.

string

annotations

Path to annotations tsv to distill. Required if running without annotate

string

rrnas

Path to folder containing individual rRNA files generated with tRNAscan-SE. Can be used when running distill without call to add additional information.

string

trnas

Path to folder containging individual tRNA files generated with tRNAscan-SE. Can be used when running distill without call to add additional information.

string

distill_dummy_sheet

File path to distill dummy sheet.

string

True

Product Options

The purpose of DRAM –product is to generate a product visualization of the annotations and save the output to the output directory.

Parameter

Description

Type

Default

Required

Hidden

groupby_column

Column to to group by in the annotations file for etc and function groupings. Defaults to DRAM’s annotation output fasta column name.

string

Adjectives Options

The purpose of DRAM –adjectives is to use the outputted DRAM annotations file to make a table of adjectives. You need to use the Kegg, FeGenie, and Sulfur Databases.

Parameter

Description

Type

Default

Required

Hidden

adjectives_list

A comma seperated list of adjectives (‘adj1,adj2,adj3’), by name, to evaluate. This limits the number of adjectives that are evaluated and is faster.

string

rules_tsv

This is an optional path to a rules file with strict formatting. It will take the place of the original rules file that is stored with the script. For formatting, see the original rules.tsv sheet stored with the script in the repo.

string

Database Options

File paths to databases used in the workflow.

Parameter

Description

Type

Default

Required

Hidden

kegg_db

string

${launchDir}/databases/kegg/

True

uniref_db

string

${launchDir}/databases/uniref/

True

metals_db

string

${launchDir}/databases/metals/

True

pfam_mmseq_db

string

${launchDir}/databases/pfam/mmseqs/

True

merops_db

string

${launchDir}/databases/merops/

True

viral_db

string

${launchDir}/databases/viral/

True

kofam_db

string

${launchDir}/databases/kofam/

True

kofam_list

string

${launchDir}/databases/kofam/kofam_ko_list.tsv

True

dbcan_db

string

${launchDir}/databases/dbcan/

True

dbcan_fam_activities

string

${launchDir}/databases/dbcan/dbcan.fam-activities.tsv

True

dbcan_subfam_activities

string

${launchDir}/databases/dbcan/dbcan.fam-activities.tsv

True

vog_db

string

${launchDir}/databases/vog/

True

vog_list

string

${launchDir}/databases/vogdb/vog_annotations_latest.tsv.gz

True

camper_hmm_db

string

${launchDir}/databases/camper/hmm/

True

camper_hmm_list

string

${launchDir}/databases/camper/hmm/camper_hmm_scores.tsv

True

camper_mmseqs_db

string

${launchDir}/databases/camper/mmseqs/

True

camper_mmseqs_list

string

${launchDir}/databases/camper/mmseqs/camper_scores.tsv

True

canthyd_hmm_db

string

${launchDir}/databases/canthyd/hmm/

True

cant_hyd_hmm_list

string

${launchDir}/databases/canthyd/hmm/cant_hyd_HMM_scores.tsv

True

canthyd_mmseqs_db

string

${launchDir}/databases/canthyd/mmseqs/

True

canthyd_mmseqs_list

string

${launchDir}/databases/canthyd/mmseqs/cant_hyd_BLAST_scores.tsv

True

fegenie_db

string

${launchDir}/databases/fegenie/

True

fegenie_list

string

${launchDir}/databases/fegenie/fegenie_iron_cut_offs.txt

True

sulfur_db

string

${launchDir}/databases/sulfur/

True

methyl_db

string

${launchDir}/databases/methyl/

True

sql_descriptions_db

string

${launchDir}/databases/db_descriptions/description_db.sqlite

True

kegg_e_value

string

1e-05

True

kofam_e_value

string

1e-05

True

dbcan_e_value

string

1e-15

True

merops_e_value

string

1e-1

True

vog_e_value

string

1e-05

True

camper_e_value

string

1e-05

True

uniref_e_value

string

1e-05

True

canthyd_e_value

string

1e-05

True

sulfur_e_value

string

1e-05

True

fegenie_e_value

string

1e-05

True

metals_e_value

string

1e-03

True

Format Kegg Options

Options for preparing the KEGG database for use in DRAM.

Parameter

Description

Type

Default

Required

Hidden

kegg_pep_root_dir

Only required if you need to concatenate all of KEGG’s provided pep files. Root directory to downloaded KEGG peptide files. The pipeline will search for all pep files in this directory and concatenate them into a single file. In format <root_dir>//.pep.

string

kegg_pep_loc

Path to and of the gene fasta files that are provided by the KEGG FTP server or a concatenated version of them. Either this or kegg_pep_root_dir must be provided.

string

gene_ko_link_loc

Path to and of the KO file that is provided by the KEGG FTP server.

string

skip_gene_ko_link

Skip the gene_ko_link file. If you are using an older version of KEGG that does not supply the gene_ko_link file you can use this option to skip the gene_ko_link file. Otherwise, the gene_ko_link file is required.

boolean

kegg_download_date

The date the KEGG database was downloaded. If not provided, the current date will be used.

string

yyyy-MM-dd

SLURM Options

Generic options for SLURM job submission. More customized options can be configured in your own Nextflow config file. See https://www.nextflow.io/docs/latest/executor.html#slurm and example configs here: https://github.com/nf-core/configs/tree/master/conf

Parameter

Description

Type

Default

Required

Hidden

slurm

Launch the pipeline using the SLURM executor. Without this option, the pipeline will run in your current shell/environment.

boolean

partition

Name of the SLURM partition to use for job submission. If not provided, the default partition will be used.

string

slurm_node

Name of the SLURM Node to use for job submission. If not provided, the default node will be used.

string

cpu_provision_limit

The largest numbers of CPUs you can provision in 1 job. Not the total CPU limit. You can control that with process size or provision limit and queue_size

integer

mem_gb_provision_limit

The largest amount of memory you can provision in 1 job. Not the total memory limit. You can control that with process size or provision limit and queue_size

integer

time_hr_provision_limit

The longest time a job should be provisioned for. Not the total length cutoff for all jobs. You can control that with process size or provision limit and queue_size

integer

Process Options

Total resource limits for a single process. Not the limit to the total resources available to the pipeline. Up to queue_size processes can run in parallel, of various sizes

Parameter

Description

Type

Default

Required

Hidden

tiny_cpus_limit

Maximum number of CPUs to use/provision for a this job size.

integer

small_cpus_limit

Maximum number of CPUs to use/provision for a this job size.

integer

medium_cpus_limit

Maximum number of CPUs to use/provision for a this job size.

integer

big_cpus_limit

Maximum number of CPUs to use/provision for a this job size.

integer

huge_cpus_limit

Maximum number of CPUs to use/provision for a this job size.

integer

tiny_gb_mem_limit

Maximum memory to use/provision for a this job size.

integer

small_gb_mem_limit

Maximum memory to use/provision for a this job size.

integer

medium_gb_mem_limit

Maximum memory to use/provision for a this job size.

integer

big_gb_mem_limit

Maximum memory to use/provision for a this job size.

integer

huge_gb_mem_limit

Maximum memory to use/provision for a this job size.

integer

tiny_hr_time_limit

Maximum time in hours to use/provision for a this job size.

integer

small_hr_time_limit

Maximum time in hours to use/provision for a this job size.

integer

medium_hr_time_limit

Maximum time in hours to use/provision for a this job size.

integer

big_hr_time_limit

Maximum time in hours to use/provision for a this job size.

integer

huge_hr_time_limit

Maximum time in hours to use/provision for a this job size.

integer

queue_size

Maximum number of jobs to submit to the queue at once.

integer

10

aliased_options

Aliased options for DRAM annotation workflow. These are provided for backward compatibility with older versions of the pipeline.

Parameter

Description

Type

Default

Required

Hidden

product

alias for visualize.

boolean

adjectives

alias for traits.

boolean

distill_ecosystem

alias for sum_ecos.

string

distill_custom

alias for sum_custom.

string

Institutional config options

Parameters used to describe centralised config profiles. These should not be edited.

Parameter

Description

Type

Default

Required

Hidden

custom_config_version

Git commit id for Institutional configs.

string

master

True

custom_config_base

Base directory for Institutional configs.

HelpIf you’re running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don’t need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

string

https://raw.githubusercontent.com/nf-core/configs/master

True

config_profile_name

Institutional config name.

string

True

config_profile_description

Institutional config description.

string

True

config_profile_contact

Institutional config contact information.

string

True

config_profile_url

Institutional config URL link.

string

True

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter

Description

Type

Default

Required

Hidden

threads

integer

10

True

version

Display version and exit.

boolean

True

publish_dir_mode

Method used to save pipeline results to output directory.

HelpThe Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

string

copy

True

email

Email address for completion summary.

HelpSet this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don’t need to specify this on the command line for every run.

string

True

email_on_fail

Email address for completion summary, only when pipeline fails.

HelpAn email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

string

True

plaintext_email

Send plain-text email instead of HTML.

boolean

True

max_multiqc_email_size

File size limit when attaching MultiQC reports to summary emails.

string

25.MB

True

monochrome_logs

Do not use coloured log outputs.

boolean

True

hook_url

Incoming hook URL for messaging service

HelpIncoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

string

True

multiqc_title

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

string

True

multiqc_config

Custom config file to supply to MultiQC.

string

True

multiqc_logo

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

string

True

multiqc_methods_description

Custom MultiQC yaml file containing HTML including a methods description.

string

validate_params

Boolean whether to validate parameters against the schema at runtime

boolean

True

True

pipelines_testdata_base_path

Base URL or local path to location of pipeline test dataset files

string

https://raw.githubusercontent.com/nf-core/test-datasets/

True

trace_report_suffix

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

string

yyyy-MM-dd_HH-mm-ssZ

True