Skip to content

Why DELTA? — Transformer vs Graph Neural Net vs DELTA

Transformer vs Graph Neural Net vs DELTA

Three paradigms for relational reasoning — click through the tabs to see the core architectural difference.

Tokens in a flat sequence. Every token attends to every other — attention is the only way relationships are discovered. The model must reconstruct relational structure from sequential position alone.

Token embeddings Paris is capital of France relationship Multi-head self-attention O(N²) Feed-forward layer Output (reconstructed relationships)
Query token
High attention weight
Low attention / neutral

Nodes pass messages to neighbors. Relationships are edges with scalar weights. Edges are passive conduits — they carry signal from node to node, but can't reason about what kind of relationship they represent.

0.91 weight 0.12 weight 0.44 weight Paris node France node Berlin node Germany node Edges are passive — scalar weights only, no reasoning
Concept node
Strong edge
Weak edge

DELTA promotes edges to first-class citizens. Nodes and edges both carry rich representations and attend to each other in parallel dual streams. Edge-to-edge attention enables reasoning about relationships between relationships — the key to compositional inference.

Parallel dual attention — nodes + edges simultaneously Node attention stream Edge attention stream capital of edge node located in edge node edge→edge attn Paris + memory France + memory Berlin + memory Germany + memory Importance router — scores nodes + edges, guides sparse attention Hot tier (active) Warm tier (compressed) Cold tier (archived)
Concept node (+ memory)
Rich edge node (novel)
Edge-to-edge attention (novel)
Typed relationship
Why this matters: The "capital of" edge between Paris→France can attend to the "capital of" edge between Berlin→Germany and recognize they encode the same relationship type. That's structural analogy — impossible in a GNN where edges are just scalars. At 80% feature noise, this mechanism gives DELTA a +24% accuracy advantage over standard GNNs (Phase 28).

Click through all three tabs to see how each paradigm handles the "Paris is capital of France" relationship differently. The red dashed arrow in the DELTA tab is the key innovation: edge-to-edge attention enables reasoning about relationships between relationships.

See Architecture Overview for full component descriptions. See Key Findings for the experimental evidence behind each claim.