Why DELTA? — Transformer vs Graph Neural Net vs DELTA¶
Transformer vs Graph Neural Net vs DELTA
Three paradigms for relational reasoning — click through the tabs to see the core architectural difference.
Tokens in a flat sequence. Every token attends to every other — attention is the only way relationships are discovered. The model must reconstruct relational structure from sequential position alone.
Nodes pass messages to neighbors. Relationships are edges with scalar weights. Edges are passive conduits — they carry signal from node to node, but can't reason about what kind of relationship they represent.
DELTA promotes edges to first-class citizens. Nodes and edges both carry rich representations and attend to each other in parallel dual streams. Edge-to-edge attention enables reasoning about relationships between relationships — the key to compositional inference.
Click through all three tabs to see how each paradigm handles the "Paris is capital of France" relationship differently. The red dashed arrow in the DELTA tab is the key innovation: edge-to-edge attention enables reasoning about relationships between relationships.
See Architecture Overview for full component descriptions. See Key Findings for the experimental evidence behind each claim.