Why DELTA? — Transformer vs Graph Neural Net vs DELTA¶

Transformer vs Graph Neural Net vs DELTA

Three paradigms for relational reasoning — click through the tabs to see the core architectural difference.

Tokens in a flat sequence. Every token attends to every other — attention is the only way relationships are discovered. The model must reconstruct relational structure from sequential position alone.

Query token

High attention weight

Low attention / neutral

Nodes pass messages to neighbors. Relationships are edges with scalar weights. Edges are passive conduits — they carry signal from node to node, but can't reason about what kind of relationship they represent.

Concept node

Strong edge

Weak edge

DELTA promotes edges to first-class citizens. Nodes and edges both carry rich representations and attend to each other in parallel dual streams. Edge-to-edge attention enables reasoning about relationships between relationships — the key to compositional inference.

Concept node (+ memory)

Rich edge node (novel)

Edge-to-edge attention (novel)

Typed relationship

Why this matters: The "capital of" edge between Paris→France can attend to the "capital of" edge between Berlin→Germany and recognize they encode the same relationship type. That's structural analogy — impossible in a GNN where edges are just scalars. At 80% feature noise, this mechanism gives DELTA a +24% accuracy advantage over standard GNNs (Phase 28).

Click through all three tabs to see how each paradigm handles the "Paris is capital of France" relationship differently. The red dashed arrow in the DELTA tab is the key innovation: edge-to-edge attention enables reasoning about relationships between relationships.

See Architecture Overview for full component descriptions. See Key Findings for the experimental evidence behind each claim.