# Rule Expression Grammar and Parser

DRAM uses a small, explicit expression language to define **rules**, **conditions**, and **logical combinations** of features.
This page documents the grammar, available operators and functions, and an important design decision regarding **boolean operator ambiguity**.

---

## Overview

Rule expressions are used throughout DRAM to:

* Summarize (distillate) annotated genes into a summarized sheet (see distill_metals.tsv for examples)
* Evaluate complex rules for gene traits (see trait_rules.tsv for examples)
* Build visualizations (see dram_viz for examples)

The grammar is intentionally **restrictive** to avoid ambiguous interpretations that can easily arise in boolean logic.

---

## Expression Components

The lowest-level building blocks of expressions are genes:
        
```text
KXXXXX (KEGG Orthology ID)
```

These can be evaluated indivudally as presence/absence in your annotation sheet and then combined using boolean logic on other genes to get totaly trait presence/absence or scores.

### Boolean Operators

| Operator | Meaning     | Notes                                         |
| -------- | ----------- | --------------------------------------------- |
| `&`      | Logical AND | May only be chained with other `&` operators  |
| `\|`     | Logical OR  | May only be chained with other `\|` operators |

❗ Mixing `&` and `\|` **requires explicit grouping**.


### Grouping

Square brackets are used to explicitly group expressions:

```text
[KXXXXX & KXXXXX] | KXXXXX
```

Brackets may contain **any valid expression**, including nested boolean logic.

### Step Expressions

Comma-separated expressions define **step rules**:

```text
KXXXXX & KXXXXX, KXXXXX | KXXXXX
```

These steps represent a sequence or a pathway of evaluations. This can be fed into functions that operate on multiple steps, such as looking for a percentage of steps satisfied.

---

## Core Design Principle: No Implicit Boolean Precedence

DRAM **does not allow mixing `|` (OR) and `&` (AND) operators without explicit grouping**.

This means expressions like:

```text
A | B | C & D
```

are **invalid** and will result in a parsing error.

Instead, users must write:

```text
A | B | [C & D]
```

or:

```text
[A | B | C] & D
```

### Why this matters

In many languages, `&` has higher precedence than `|`, but this is:

* Often misunderstood
* Easy to misread in complex biological rules
* A frequent source of subtle bugs

DRAM therefore **disallows implicit precedence** and requires explicit grouping with brackets (`[...]`) whenever boolean operators are mixed.

---

## Functions

Functions operate on expressions or values and may appear anywhere an expression is allowed.

### General Form

```text
function_name(arg1, arg2, ...)
```

Arguments may be:

* Expressions
* Numbers
* Strings
* Identifiers

---

### Functions

- not
- percent
- at_least
- column_count_values


#### not

Negates a boolean expression.

```text
not(A & B)
```

#### percent

Calculates percentage of steps satisfied and takes in a threshold value. Returns true if the percentage of satisfied steps meets or exceeds the threshold.

```text
percent(60, [A & B, C | D, E])
```

#### at_least

Checks if at least a certain number of steps are satisfied.

```text
at_least(2, [A & B, C | D, E])
```

#### column_count_values


```text
/**
 * Counts the number of values in a specified column and evaluates conditions based on the provided operations and thresholds.
 *
 * @param column_name The name of the column to be analyzed.
 * @param value_op The operation to be used for comparing the column values against the value_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
 * @param value_threshold The threshold value to compare the column values against using value_op.
 * @param count_op The operation to be used for comparing the counted values against the count_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
 * @param count_threshold The threshold value to compare the counted values against using count_op.
 * @return boolean Returns true if the conditions defined by value_op and count_op are satisfied; otherwise, returns false.
 */

column_count_values("column_name", "value_op", value_threshold, "count_op", count_threshold)
```

---

### Piped Function Usage

Filter functions may be used with the pipe operator:

```text
not(filter_contains(kegg_description,"nitrate reductase")) -> column_count_values(heme_regulatory_motif_count,ge,4,ge,3)
```

These filter functions can be used to prefilter data before applying counting or other operations. Filter functions are under the `filter_` namespace.

- filter_contains
- filter_compare

#### filter_contains

Checks if a specified column contains a given substring.

```text
filter_contains(column_name, substring)
```

#### filter_compare

Compares values in a specified column against a threshold using a given operation.

```text
filter_compare(column_name, operation, threshold)
```

---

## Valid vs Invalid Examples

### ✅ Valid

```text
A | B | C
A & B & C
[A & B] | C
A | [B & C]
filter_contains(col, "substring A") -> column_count_values(col,ge,2,ge,1)
```

---

### ❌ Invalid

```text
A | B | C & D
A & B | C
A | B & C | D
[A, B, C] -> percent(50)
```