The v2.0 release transforms Ora from a pipeline-based NL2SQL system into a fully agentic runtime
that reasons about its own work, learns from every interaction, and gets measurably smarter over time.
The v1.0 multi-node pipeline (12 nodes, conditional routing) is replaced by a single ReAct orchestrator that thinks, delegates, validates, and re-routes in one loop.
After decomposition — are all entity groups covered? If query says "ALL ASEAN" but only 2 countries resolved, retry with feedback
After schema pruning — does the schema support the query? Missing time dimension for trend queries flagged as data gap
After SQL generation — do filter values exist in actual schema? Non-existent columns stripped with warning
After execution — semantic fitness: does the SQL + result actually answer the original question? Not just "rows > 0" but "are the right dimensions, entities, and metrics present?"
Semantic Layer Evolution
The semantic layer evolves through 4 layers, each building on the previous:
Entity aliases, value mappings (e.g., "friday" -> 5), cross-source joins detected via column name/type matching.
Layer 3: Confirmed from successful queries
evolve_semantic_layer() runs after every successful query. Saves aliases (+3% confidence per confirmation), relationships (with query count), filter patterns (auto-injected after 3+ uses), and column enrichments.
Layer 4: Corrected from user feedback
Structured rules created from corrections with confidence lifecycle. Resolution failures recorded as anti-patterns so the agent avoids repeating mistakes.
Semantic Agent Reasoning Loop
Pre-check — high-confidence aliases (>=0.93) resolved deterministically. Known patterns and column enrichments injected. Past failures loaded as anti-patterns.
LLM reasoning — entity mapping with full schema context, column meanings, and learned vocabulary.
Schema search — for unresolved entities, targeted DB lookups across text columns.
Refinement — merge findings, update confidence, save resolution log entry.
Persistence
All semantic layer state persisted to ~/.sqlagent/uploads/{workspace_id}/:
from sqlagent.rules import load_rules, create_rule, record_rule_outcome
rules = load_rules(workspace_id) # sorted by confidence * log(hit_count)
create_rule(workspace_id, text="DuckDB doesn't support YEAR()", source="user_correction")
record_rule_outcome(workspace_id, rule_ids, succeeded=True) # +0.05 confidence
Rule lifecycle
Created from user correction (0.9) or pattern detection (0.7)
Applied — top 5 rules injected into SQL generation prompt
Confirmed — query succeeds with rule: +0.05 confidence
Weakened — query fails with rule: -0.10 confidence
Expired — confidence below 0.30: rule deactivated
Semantic Fitness Check
After SQL executes successfully (rows > 0), Ora asks the LLM: "Does this SQL + result actually answer the original question?"
The fitness check catches:
UNION without grouping columns (can't do trends with no time axis)
Split comparative queries (entities being compared end up in separate queries)
Missing correlation analysis (user asked for correlation but got raw numbers)
Dropped decomposition parts (query had 4 parts but SQL only addresses 2)
If not fit, Ora re-routes with the specific fix hint from the LLM.
REST API Connectors
New framework: sqlagent/connectors/rest_connector.py
from sqlagent.connectors.catalog.shopify import ShopifyConnector
conn = ShopifyConnector(source_id="shop", store_name="mystore", api_key="shpat_xxx")
await conn.connect() # pulls data into DuckDB
result = await conn.execute("SELECT * FROM orders LIMIT 10")
The RestConnector base class handles OAuth2 (with refresh), API key, Bearer token, Basic auth, cursor/offset/link-header pagination, and token-bucket rate limiting.
Knowledge Page
The Knowledge page is the Semantic Agent's working memory — three tabs:
Graph — force-directed semantic graph with table nodes, relationship edges, confidence halos, and learned term annotations
Taxonomy — 4-layer knowledge feed (Foundation, Inferred, Confirmed, Corrected) with expandable entries showing exact items learned per query
Agent — conversational chat with the Semantic Agent. Full context: aliases, patterns, rules, relationships, evolution history. Conversation memory across messages. Typewriter streaming.
Learning System
How learning improves queries
Training pairs — every thumbs-up/correction saved to Qdrant (persistent on disk). Retrieved via cosine similarity for few-shot prompting.
Context rules — extracted from corrections, injected into every SQL generation prompt.
Entity aliases — learned mappings (e.g., "friday" -> 5) pre-resolved before LLM call at >=93% confidence.
Filter patterns — common filters (e.g., sex='Total') auto-injected after 3+ confirmations.
Failure reflection — past resolution failures loaded as anti-patterns.
Observable in traces
Every query trace shows an "Applied learned context" node: