WrightonLabCSU/dram pipeline parameters

DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes

Pipeline Steps

Which steps to run in the pipeline.

Parameter	Description	Type
`rename`	Rename FASTA headers based on file name. Example: sample1.fa –> (fasta header renamed to) > sample1…… Why? DRAM output is focused on scaffolds/contigs with respect to each provided input input_fasta. Thus, without renaming FASTA headers, the individual scaffolds/contigs will not be distinguashable. *If you have already renamed your FASTA headers, do not include with ‘–call’.	`boolean`
`call`	Whether to call genes on the input FASTA files using Prodigal.	`boolean`
`annotate`	Annotate called genes using downloaded databases.	`boolean`
`qc`	Run quality control steps such as collect rRNA and tRNA scans. See QC section for more options.	`boolean`
`summarize`	Summarize annotations into a distilled output file. Will use all available topic distillate forms. Ecosystem	`boolean`
`visualize`	Generate a product visualization of the annotations and save the output to the output directory.	`boolean`
`traits`	Using a DRAM annotations file, make a table of adjectives. Still in development, but can still be used.	`boolean`
`merge_annotations`	Path to directory pointing to multiple DRAM annotations to merge into one file. This is ran as a separate pipeline.	`string`
`format_kegg`	Format KEGG database for use in DRAM. Standalone operation, will exit after completion.	`boolean`

Input/output options

Define where the pipeline should find input data and save output data.

Parameter	Description	Type	Default	Required
`outdir`	The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.	`string`		True
`input_fasta`	Path to FASTA directory Help This parameter is mandatory.	`string`
`fasta_fmt`	Input format for the FASTA file.	`string`	.f
`input_genes`	Directory containing called genes. Only used when not running call. This allows you to provide pre-called genes to the pipeline.	`string`
`genes_fmt`	Input format for the Genes file.	`string`	*.faa
`genes_fna_fmt`	Input format for the Genes fna file. Only needed if you are not running call and you are passing `--generate_gff` or `--generate_gbk`.	`string`	*.fna

Call Prodigal Options

Call genes on the input FASTA files using Prodigal.

Parameter	Description	Type	Default
`prodigal_mode`	Mode for Prodigal gene calling.	`string`	meta
`prodigal_trans_table`	Translation table to use.	`number`	11.0
`min_contig_len`	Minimum contig length in base pairs.	`number`	2500.0

Annotate Options

Parameter	Description	Type	Default	Hidden
`anno_dbs`	Alternative way to specify database list for annotation. Comma sepeterated list of databases to include in the annotates. Use `all` for all. Example: ‘kegg,dbcan,kofam,merops,viral,camper,cant_hyd,fegenie,sulfur,methyl,uniref,pfam,vogdb’. When in doubt, use the name after `use_` for each database. This option overrides individual `use_` database flags. (WARNING, this option name may change in the future)	`string`
`use_camper`	Use the CAMPer database for annotation.	`boolean`
`use_canthyd`	Use the Cant_Hyd database for annotation.	`boolean`
`use_dbcan`	Use the DBCan database for annotation.	`boolean`
`use_fegenie`	Use the FeGenie database for annotation.	`boolean`
`use_kegg`	Use the KEGG database for annotation.	`boolean`
`use_kofam`	Use the Kofam database for annotation.	`boolean`
`use_merops`	Use the MEROPS database for annotation.	`boolean`
`use_methyl`	Use the Methyl database for annotation.	`boolean`
`use_pfam`	Use the PFAM database for annotation. Currently disabled for this release as a bug was discovered with the pfam DRAM2 implementation. It will be re-enabled in the next release.	`boolean`
`use_sulfur`	Use the Sulfur database for annotation.	`boolean`
`use_uniref`	Use the UniRef database for annotation.	`boolean`
`use_metals`	Use the Metals database for annotation.	`boolean`
`use_vog`		`boolean`
`use_viral`		`boolean`		True
`add_annotations`	Path to old annotations to add to the current run.	`string`
`bit_score_threshold`	Minimum BitScore of search to retain hits’)	`number`	60.0
`rbh_bit_score_threshold`	Minimum BitScore of reverse best hits to retain hits	`number`	350.0
`database_list`	Database list for generating GFF and GBK. Comma sepeterated list of databases to include in the annotates. Use `all` for all. Example: ‘kegg,dbcan,kofam,merops,viral,camper,cant_hyd,fegenie,sulfur,methyl,uniref,pfam,vogdb’	`string`	all
`generate_gff`	Generate GFF output for each input_fasta. This option is in development right now and may not work properly, and may not work with large numbers of inputs.	`boolean`
`generate_gbk`	Generate GBK output for each input_fasta. This option is in development right now and may not work properly, and may not work with large numbers of inputs.	`boolean`

QC Options

Quality control options for DRAM workflow. The QC step collects rRNA and tRNA scans using Barrnap and tRNAscan-SE for the genome_states.tsv output as a baseline. Additional options can be found below.

Parameter	Description	Type	Default	Required	Hidden
`bin_quality`	Path to bin quality tsv. CheckM and CheckM2 compatible.	`string`
`taxa`	Path to bin taxonomy tsv. Compatible with GTDB.	`string`

Summarize Options

The purpose of DRAM summarize is to distill down annotations based on curated distillation summary form(s). It can be ran with either just –sumarize, or currated sheets with –distill_topic, –distill_ecosystem, or –distill_custom (or some combination). User’s may also provide a custom distillate via –distill_custom <path/to/file> (TSV forms). Summarize can be ran independent of –annotate however, annotations must be provided (–annotations <path/to/annotations.tsv>). Optional tRNA, rRNA and bin quality may also be provided. When using distll, you also must use kegg, ko, dbcan, or merops when you do your annotation. To get the full results, you need to use kegg or ko and dbcan and merops.

Parameter	Description	Type	Default	Required	Hidden
`sum_ecos`	Ecosystem to distill. Options: <eng_sys	ag> If more than one ecosystem included, they must be seperated by a comma (–sum_ecos eng_sys,ag or –sum_ecos ‘eng_sys,ag’).	`string`
`distill_topic`	A way to specify a subset of topics instead of all topics. By default, if `--sumarize` is specified, all topics are included. Options: <carbon	energy	misc	nitrogen	transport
`sum_custom`	<path/to/custom_distillate.tsv> Custom distill file to use. The custom distill file must be in TSV format with the first column being the gene ID and the second column being the distillate value.	`string`
`annotations`	Path to annotations tsv to distill. Required if running without annotate	`string`
`rrnas`	Path to folder containing individual rRNA files generated with tRNAscan-SE. Can be used when running distill without call to add additional information.	`string`
`trnas`	Path to folder containging individual tRNA files generated with tRNAscan-SE. Can be used when running distill without call to add additional information.	`string`
`distill_dummy_sheet`	File path to distill dummy sheet.	`string`			True

Product Options

The purpose of DRAM –product is to generate a product visualization of the annotations and save the output to the output directory.

Parameter	Description	Type	Default	Required	Hidden
`groupby_column`	Column to to group by in the annotations file for etc and function groupings. Defaults to DRAM’s annotation output fasta column name.	`string`

Adjectives Options

The purpose of DRAM –adjectives is to use the outputted DRAM annotations file to make a table of adjectives. You need to use the Kegg, FeGenie, and Sulfur Databases.

Parameter	Description	Type	Default	Required	Hidden
`adjectives_list`	A comma seperated list of adjectives (‘adj1,adj2,adj3’), by name, to evaluate. This limits the number of adjectives that are evaluated and is faster.	`string`
`rules_tsv`	This is an optional path to a rules file with strict formatting. It will take the place of the original rules file that is stored with the script. For formatting, see the original rules.tsv sheet stored with the script in the repo.	`string`

Database Options

File paths to databases used in the workflow.

Parameter	Type	Default	Hidden
`kegg_db`	`string`	${launchDir}/databases/kegg/	True
`uniref_db`	`string`	${launchDir}/databases/uniref/	True
`metals_db`	`string`	${launchDir}/databases/metals/	True
`pfam_mmseq_db`	`string`	${launchDir}/databases/pfam/mmseqs/	True
`merops_db`	`string`	${launchDir}/databases/merops/	True
`viral_db`	`string`	${launchDir}/databases/viral/	True
`kofam_db`	`string`	${launchDir}/databases/kofam/	True
`kofam_list`	`string`	${launchDir}/databases/kofam/kofam_ko_list.tsv	True
`dbcan_db`	`string`	${launchDir}/databases/dbcan/	True
`dbcan_fam_activities`	`string`	${launchDir}/databases/dbcan/dbcan.fam-activities.tsv	True
`dbcan_subfam_activities`	`string`	${launchDir}/databases/dbcan/dbcan.fam-activities.tsv	True
`vog_db`	`string`	${launchDir}/databases/vog/	True
`vog_list`	`string`	${launchDir}/databases/vogdb/vog_annotations_latest.tsv.gz	True
`camper_hmm_db`	`string`	${launchDir}/databases/camper/hmm/	True
`camper_hmm_list`	`string`	${launchDir}/databases/camper/hmm/camper_hmm_scores.tsv	True
`camper_mmseqs_db`	`string`	${launchDir}/databases/camper/mmseqs/	True
`camper_mmseqs_list`	`string`	${launchDir}/databases/camper/mmseqs/camper_scores.tsv	True
`canthyd_hmm_db`	`string`	${launchDir}/databases/canthyd/hmm/	True
`cant_hyd_hmm_list`	`string`	${launchDir}/databases/canthyd/hmm/cant_hyd_HMM_scores.tsv	True
`canthyd_mmseqs_db`	`string`	${launchDir}/databases/canthyd/mmseqs/	True
`canthyd_mmseqs_list`	`string`	${launchDir}/databases/canthyd/mmseqs/cant_hyd_BLAST_scores.tsv	True
`fegenie_db`	`string`	${launchDir}/databases/fegenie/	True
`fegenie_list`	`string`	${launchDir}/databases/fegenie/fegenie_iron_cut_offs.txt	True
`sulfur_db`	`string`	${launchDir}/databases/sulfur/	True
`methyl_db`	`string`	${launchDir}/databases/methyl/	True
`sql_descriptions_db`	`string`	${launchDir}/databases/db_descriptions/description_db.sqlite	True
`kegg_e_value`	`string`	1e-05	True
`kofam_e_value`	`string`	1e-05	True
`dbcan_e_value`	`string`	1e-15	True
`merops_e_value`	`string`	1e-1	True
`vog_e_value`	`string`	1e-05	True
`camper_e_value`	`string`	1e-05	True
`uniref_e_value`	`string`	1e-05	True
`canthyd_e_value`	`string`	1e-05	True
`sulfur_e_value`	`string`	1e-05	True
`fegenie_e_value`	`string`	1e-05	True
`metals_e_value`	`string`	1e-03	True

Format Kegg Options

Options for preparing the KEGG database for use in DRAM.

Parameter	Description	Type	Default
`kegg_pep_root_dir`	Only required if you need to concatenate all of KEGG’s provided pep files. Root directory to downloaded KEGG peptide files. The pipeline will search for all pep files in this directory and concatenate them into a single file. In format <root_dir>//.pep.	`string`
`kegg_pep_loc`	Path to and of the gene fasta files that are provided by the KEGG FTP server or a concatenated version of them. Either this or kegg_pep_root_dir must be provided.	`string`
`gene_ko_link_loc`	Path to and of the KO file that is provided by the KEGG FTP server.	`string`
`skip_gene_ko_link`	Skip the gene_ko_link file. If you are using an older version of KEGG that does not supply the gene_ko_link file you can use this option to skip the gene_ko_link file. Otherwise, the gene_ko_link file is required.	`boolean`
`kegg_download_date`	The date the KEGG database was downloaded. If not provided, the current date will be used.	`string`	yyyy-MM-dd

SLURM Options

Generic options for SLURM job submission. More customized options can be configured in your own Nextflow config file. See https://www.nextflow.io/docs/latest/executor.html#slurm and example configs here: https://github.com/nf-core/configs/tree/master/conf

Parameter	Description	Type
`slurm`	Launch the pipeline using the SLURM executor. Without this option, the pipeline will run in your current shell/environment.	`boolean`
`partition`	Name of the SLURM partition to use for job submission. If not provided, the default partition will be used.	`string`
`slurm_node`	Name of the SLURM Node to use for job submission. If not provided, the default node will be used.	`string`
`cpu_provision_limit`	The largest numbers of CPUs you can provision in 1 job. Not the total CPU limit. You can control that with process size or provision limit and queue_size	`integer`
`mem_gb_provision_limit`	The largest amount of memory you can provision in 1 job. Not the total memory limit. You can control that with process size or provision limit and queue_size	`integer`
`time_hr_provision_limit`	The longest time a job should be provisioned for. Not the total length cutoff for all jobs. You can control that with process size or provision limit and queue_size	`integer`

Process Options

Total resource limits for a single process. Not the limit to the total resources available to the pipeline. Up to queue_size processes can run in parallel, of various sizes

Parameter	Description	Type	Default
`tiny_cpus_limit`	Maximum number of CPUs to use/provision for a this job size.	`integer`
`small_cpus_limit`	Maximum number of CPUs to use/provision for a this job size.	`integer`
`medium_cpus_limit`	Maximum number of CPUs to use/provision for a this job size.	`integer`
`big_cpus_limit`	Maximum number of CPUs to use/provision for a this job size.	`integer`
`huge_cpus_limit`	Maximum number of CPUs to use/provision for a this job size.	`integer`
`tiny_gb_mem_limit`	Maximum memory to use/provision for a this job size.	`integer`
`small_gb_mem_limit`	Maximum memory to use/provision for a this job size.	`integer`
`medium_gb_mem_limit`	Maximum memory to use/provision for a this job size.	`integer`
`big_gb_mem_limit`	Maximum memory to use/provision for a this job size.	`integer`
`huge_gb_mem_limit`	Maximum memory to use/provision for a this job size.	`integer`
`tiny_hr_time_limit`	Maximum time in hours to use/provision for a this job size.	`integer`
`small_hr_time_limit`	Maximum time in hours to use/provision for a this job size.	`integer`
`medium_hr_time_limit`	Maximum time in hours to use/provision for a this job size.	`integer`
`big_hr_time_limit`	Maximum time in hours to use/provision for a this job size.	`integer`
`huge_hr_time_limit`	Maximum time in hours to use/provision for a this job size.	`integer`
`queue_size`	Maximum number of jobs to submit to the queue at once.	`integer`	10

aliased_options

Aliased options for DRAM annotation workflow. These are provided for backward compatibility with older versions of the pipeline.

Parameter	Description	Type
`product`	alias for visualize.	`boolean`
`adjectives`	alias for traits.	`boolean`
`distill_ecosystem`	alias for sum_ecos.	`string`
`distill_custom`	alias for sum_custom.	`string`

Institutional config options

Parameters used to describe centralised config profiles. These should not be edited.

Parameter	Description	Type	Default	Hidden
`custom_config_version`	Git commit id for Institutional configs.	`string`	master	True
`custom_config_base`	Base directory for Institutional configs. Help If you’re running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don’t need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.	`string`	https://raw.githubusercontent.com/nf-core/configs/master	True
`config_profile_name`	Institutional config name.	`string`		True
`config_profile_description`	Institutional config description.	`string`		True
`config_profile_contact`	Institutional config contact information.	`string`		True
`config_profile_url`	Institutional config URL link.	`string`		True

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter	Description	Type	Default	Hidden
`threads`		`integer`	10	True
`version`	Display version and exit.	`boolean`		True
`publish_dir_mode`	Method used to save pipeline results to output directory. Help The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.	`string`	copy	True
`email`	Email address for completion summary. Help Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don’t need to specify this on the command line for every run.	`string`		True
`email_on_fail`	Email address for completion summary, only when pipeline fails. Help An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.	`string`		True
`plaintext_email`	Send plain-text email instead of HTML.	`boolean`		True
`max_multiqc_email_size`	File size limit when attaching MultiQC reports to summary emails.	`string`	25.MB	True
`monochrome_logs`	Do not use coloured log outputs.	`boolean`		True
`hook_url`	Incoming hook URL for messaging service Help Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.	`string`		True
`multiqc_title`	MultiQC report title. Printed as page header, used for filename if not otherwise specified.	`string`		True
`multiqc_config`	Custom config file to supply to MultiQC.	`string`		True
`multiqc_logo`	Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file	`string`		True
`multiqc_methods_description`	Custom MultiQC yaml file containing HTML including a methods description.	`string`
`validate_params`	Boolean whether to validate parameters against the schema at runtime	`boolean`	True	True
`pipelines_testdata_base_path`	Base URL or local path to location of pipeline test dataset files	`string`	https://raw.githubusercontent.com/nf-core/test-datasets/	True
`trace_report_suffix`	Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.	`string`	yyyy-MM-dd_HH-mm-ssZ	True