GEON: Structure-First Decoding

A structural decoding layer for language models. LLMs guess tokens.
GEON enforces structure.

Resolve structure first, then select tokens.

🔥 Demo (why this matters)

🚀 Run the benchmark

This runs a 10-task code generation benchmark comparing:

Baseline token decoding
GEON (structure-first decoding)
Reproduce the benchmark results locally:

python geon_eval_harness_v2.py

Baseline LLM:

next token ← p(token | context)

GEON:

structure → valid options → token selection

Input:

def is_prime(n):
    if n < 2:

Baseline LLM:

return True  ❌

GEON (Structure-First Decoding):

return False  ✅

Input:

def add(a, b):
    return (a + b

Baseline LLM:

return (a + b  ❌

GEON (Structure-First Decoding):

return (a + b)  ✅

DEMO OUTPUT

Benchmark Snapshot

We evaluated a small Python code-generation harness on three tasks:

factorial
sum_list
max_element

Results

Method	Syntax/load pass rate	Semantic pass rate	Canonical pattern rate
Baseline LLM	100.0%	3.3%	70.0%
GEON	100.0%	100.0%	100.0%

Interpretation

Both methods produced syntactically valid Python.

The difference appears at the structural and semantic level:

Baseline LLM often generates plausible but incorrect programs
GEON restricts generation to structurally admissible continuations

This improves not just syntax, but functional correctness.

Harness OUTPUT

Benchmark (10-task harness)

We evaluated GEON against a baseline token-selection approach on 10 Python code generation tasks:

factorial
sum_list
max_element
count_vowels
reverse_string
is_even
is_sorted
count_positive
first_char
square_list

Results

Method	Syntax/load pass rate	Semantic pass rate	Canonical pattern rate
Baseline LLM	100.0%	13.3%	90.0%
GEON	100.0%	100.0%	100.0%

Per-task semantic pass rates (%)

Task	Baseline	GEON
factorial	0.0	100.0
sum_list	0.0	100.0
max_element	0.0	100.0
count_vowels	0.0	100.0
reverse_string	33.3	100.0
is_even	66.7	100.0
is_sorted	0.0	100.0
count_positive	33.3	100.0
first_char	0.0	100.0
square_list	0.0	100.0

Interpretation

Both methods produce syntactically valid Python.

However:

The baseline often generates plausible but incorrect programs
GEON restricts generation to structurally admissible continuations

This leads to consistent semantic correctness across all tasks.

GEON enforces structure before token selection.

How GEON Works (intuition)

Standard decoding selects the next token based on probability: argmax p(token)

GEON changes this process:

Tokens are mapped into equivalence classes (ECs)
ECs are evaluated under structural constraints (S1 field)
Only structurally admissible tokens are allowed
Sampling happens within this reduced set

This enforces:

syntactic closure (e.g. parentheses must match)
logical consistency (e.g. conditional branches align)
structural validity before generation

Why This Matters

Most LLM errors are not due to lack of knowledge, but lack of structure:

incomplete code
invalid syntax
inconsistent logic

GEON addresses this at the decoding level.

Instead of correcting outputs after generation, it prevents invalid outputs from being produced.

Key Idea

GEON does not ask:

"what token is most likely?"

It asks:

"what tokens are structurally valid?"

Then selects among them.

Comparison

Property	Standard LLM	GEON
Syntax validity	✅	✅
Logical consistency	❌	✅
Structural guarantees	❌	✅

Why “GEON”?

The name GEON is inspired by the concept of geons introduced by John Archibald Wheeler.

Wheeler used “geons” to describe structures that are defined not by material substance, but by the relationships and constraints that hold them together.

GEON applies a similar idea to generated sequences such as code and structured language:

meaning is not just in tokens
it emerges from structure and admissibility
validity is defined by constraints, not probability

In this sense, GEON treats sequence generation as a structural process rather than a purely probabilistic one.

Version	Changes	Urgency	Date
main@2026-04-11	Latest activity on main branch	High	4/11/2026
0.0.0	No release found — using repo HEAD	High	4/8/2026

geon-decoder

Description

README

GEON: Structure-First Decoding

🔥 Demo (why this matters)

🚀 Run the benchmark

Baseline LLM:

GEON:

Input:

Baseline LLM:

GEON (Structure-First Decoding):

Input:

Baseline LLM:

GEON (Structure-First Decoding):

DEMO OUTPUT

Benchmark Snapshot

Results

Interpretation

Harness OUTPUT

Benchmark (10-task harness)

Results

Per-task semantic pass rates (%)

Interpretation

How GEON Works (intuition)

Why This Matters

Key Idea

Comparison

Why “GEON”?

Release History

Dependencies & License Audit

Similar Packages

More in Uncategorized