Rule Expression Grammar and Parser
DRAM uses a small, explicit expression language to define rules, conditions, and logical combinations of features. This page documents the grammar, available operators and functions, and an important design decision regarding boolean operator ambiguity.
Overview
Rule expressions are used throughout DRAM to:
Summarize (distillate) annotated genes into a summarized sheet (see distill_metals.tsv for examples)
Evaluate complex rules for gene traits (see trait_rules.tsv for examples)
Build visualizations (see dram_viz for examples)
The grammar is intentionally restrictive to avoid ambiguous interpretations that can easily arise in boolean logic.
Expression Components
The lowest-level building blocks of expressions are genes:
KXXXXX (KEGG Orthology ID)
These can be evaluated indivudally as presence/absence in your annotation sheet and then combined using boolean logic on other genes to get totaly trait presence/absence or scores.
Boolean Operators
Operator |
Meaning |
Notes |
|---|---|---|
|
Logical AND |
May only be chained with other |
|
Logical OR |
May only be chained with other |
❗ Mixing & and \| requires explicit grouping.
Grouping
Square brackets are used to explicitly group expressions:
[KXXXXX & KXXXXX] | KXXXXX
Brackets may contain any valid expression, including nested boolean logic.
Step Expressions
Comma-separated expressions define step rules:
KXXXXX & KXXXXX, KXXXXX | KXXXXX
These steps represent a sequence or a pathway of evaluations. This can be fed into functions that operate on multiple steps, such as looking for a percentage of steps satisfied.
Core Design Principle: No Implicit Boolean Precedence
DRAM does not allow mixing | (OR) and & (AND) operators without explicit grouping.
This means expressions like:
A | B | C & D
are invalid and will result in a parsing error.
Instead, users must write:
A | B | [C & D]
or:
[A | B | C] & D
Why this matters
In many languages, & has higher precedence than |, but this is:
Often misunderstood
Easy to misread in complex biological rules
A frequent source of subtle bugs
DRAM therefore disallows implicit precedence and requires explicit grouping with brackets ([...]) whenever boolean operators are mixed.
Functions
Functions operate on expressions or values and may appear anywhere an expression is allowed.
General Form
function_name(arg1, arg2, ...)
Arguments may be:
Expressions
Numbers
Strings
Identifiers
Functions
not
percent
at_least
column_count_values
not
Negates a boolean expression.
not(A & B)
percent
Calculates percentage of steps satisfied and takes in a threshold value. Returns true if the percentage of satisfied steps meets or exceeds the threshold.
percent(60, [A & B, C | D, E])
at_least
Checks if at least a certain number of steps are satisfied.
at_least(2, [A & B, C | D, E])
column_count_values
/**
* Counts the number of values in a specified column and evaluates conditions based on the provided operations and thresholds.
*
* @param column_name The name of the column to be analyzed.
* @param value_op The operation to be used for comparing the column values against the value_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
* @param value_threshold The threshold value to compare the column values against using value_op.
* @param count_op The operation to be used for comparing the counted values against the count_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
* @param count_threshold The threshold value to compare the counted values against using count_op.
* @return boolean Returns true if the conditions defined by value_op and count_op are satisfied; otherwise, returns false.
*/
column_count_values("column_name", "value_op", value_threshold, "count_op", count_threshold)
Piped Function Usage
Filter functions may be used with the pipe operator:
not(filter_contains(kegg_description,"nitrate reductase")) -> column_count_values(heme_regulatory_motif_count,ge,4,ge,3)
These filter functions can be used to prefilter data before applying counting or other operations. Filter functions are under the filter_ namespace.
filter_contains
filter_compare
filter_contains
Checks if a specified column contains a given substring.
filter_contains(column_name, substring)
filter_compare
Compares values in a specified column against a threshold using a given operation.
filter_compare(column_name, operation, threshold)
Valid vs Invalid Examples
✅ Valid
A | B | C
A & B & C
[A & B] | C
A | [B & C]
filter_contains(col, "substring A") -> column_count_values(col,ge,2,ge,1)
❌ Invalid
A | B | C & D
A & B | C
A | B & C | D
[A, B, C] -> percent(50)