Ora v2.0 Agentic Intelligence Release

The v2.0 release transforms Ora from a pipeline-based NL2SQL system into a fully agentic runtime that reasons about its own work, learns from every interaction, and gets measurably smarter over time.

Ora v2.0 — full demo walkthrough

Architecture: ReAct Orchestrator
Semantic Layer Evolution
Structured Rule Engine
Semantic Fitness Check
REST API Connectors
Knowledge Page
Learning System
UI Updates
New API Endpoints
Bug Fixes

Architecture: ReAct Orchestrator

The v1.0 multi-node pipeline (12 nodes, conditional routing) is replaced by a single ReAct orchestrator that thinks, delegates, validates, and re-routes in one loop.

Ora ReAct Orchestrator (single node) | +-- Phase 1: Decompose | intent, entities, query structure, cross-source detection | +-- Phase 2: Semantic Resolve | alias pre-check, pattern injection, failure reflection | +-- Phase 3: Schema Context | CHESS LSH pruning, column validation against actual schema | +-- Phase 4: SQL Generation | 3 candidates, parallel, self-correct up to 3x | +-- Phase 5: Semantic Fitness Check | LLM reviews "does this result answer the question?" | +-- Phase 6: Validate & Learn evolve semantic layer, record rule outcomes | +-> respond (NL summary + chart + follow-ups) +-> learn (aliases, patterns, enrichments, rules) LangGraph: ora -> respond -> learn (3 nodes)

What Ora validates at each step

After decomposition — are all entity groups covered? If query says "ALL ASEAN" but only 2 countries resolved, retry with feedback
After schema pruning — does the schema support the query? Missing time dimension for trend queries flagged as data gap
After SQL generation — do filter values exist in actual schema? Non-existent columns stripped with warning
After execution — semantic fitness: does the SQL + result actually answer the original question? Not just "rows > 0" but "are the right dimensions, entities, and metrics present?"

Semantic Layer Evolution

The semantic layer evolves through 4 layers, each building on the previous:

Layer 1: Foundation connect time

analyze_source() discovers domain, column meanings, abbreviation maps, dimension/measure classification. build_initial_taxonomy() detects cross-source join candidates and entity types.

Layer 2: Inferred from data patterns

Entity aliases, value mappings (e.g., "friday" -> 5), cross-source joins detected via column name/type matching.

Layer 3: Confirmed from successful queries

evolve_semantic_layer() runs after every successful query. Saves aliases (+3% confidence per confirmation), relationships (with query count), filter patterns (auto-injected after 3+ uses), and column enrichments.

Layer 4: Corrected from user feedback

Structured rules created from corrections with confidence lifecycle. Resolution failures recorded as anti-patterns so the agent avoids repeating mistakes.

Semantic Agent Reasoning Loop

Pre-check — high-confidence aliases (>=0.93) resolved deterministically. Known patterns and column enrichments injected. Past failures loaded as anti-patterns.
LLM reasoning — entity mapping with full schema context, column meanings, and learned vocabulary.
Schema search — for unresolved entities, targeted DB lookups across text columns.
Refinement — merge findings, update confidence, save resolution log entry.

Persistence

All semantic layer state persisted to ~/.sqlagent/uploads/{workspace_id}/:

File	Purpose
`semantic_manifest.json`	Versioned iteration tracking with history
`semantic_{source_id}.json`	Per-source context (domain, column meanings, tips)
`aliases_{source_id}.json`	Learned entity mappings with confidence
`relationships.json`	Confirmed table joins with query count
`query_patterns.json`	Common filter combinations
`column_enrichments.json`	Column usage context
`resolution_log.json`	Resolution audit trail (last 200)
`rules.json`	Structured rules from corrections

Structured Rule Engine

New file: sqlagent/rules.py

from sqlagent.rules import load_rules, create_rule, record_rule_outcome

rules = load_rules(workspace_id)  # sorted by confidence * log(hit_count)
create_rule(workspace_id, text="DuckDB doesn't support YEAR()", source="user_correction")
record_rule_outcome(workspace_id, rule_ids, succeeded=True)  # +0.05 confidence

Rule lifecycle

Created from user correction (0.9) or pattern detection (0.7)
Applied — top 5 rules injected into SQL generation prompt
Confirmed — query succeeds with rule: +0.05 confidence
Weakened — query fails with rule: -0.10 confidence
Expired — confidence below 0.30: rule deactivated

Semantic Fitness Check

After SQL executes successfully (rows > 0), Ora asks the LLM: "Does this SQL + result actually answer the original question?"

The fitness check catches:

UNION without grouping columns (can't do trends with no time axis)
Split comparative queries (entities being compared end up in separate queries)
Missing correlation analysis (user asked for correlation but got raw numbers)
Dropped decomposition parts (query had 4 parts but SQL only addresses 2)

If not fit, Ora re-routes with the specific fix hint from the LLM.

REST API Connectors

New framework: sqlagent/connectors/rest_connector.py

from sqlagent.connectors.catalog.shopify import ShopifyConnector

conn = ShopifyConnector(source_id="shop", store_name="mystore", api_key="shpat_xxx")
await conn.connect()  # pulls data into DuckDB
result = await conn.execute("SELECT * FROM orders LIMIT 10")

Built-in connectors

Connector	Tables	Connection URL
Shopify	orders, customers, products, inventory, collections	`shopify://store?api_key=xxx`
Salesforce	accounts, contacts, opportunities, leads, cases	`salesforce://instance?access_token=xxx`
Stripe	charges, customers, subscriptions, invoices, payouts	`stripe://?api_key=xxx`
HubSpot	contacts, companies, deals, tickets	`hubspot://?api_key=xxx`
Google Analytics 4	sessions, users, events	`ga4://property_id?access_token=xxx`
Airbyte (300+)	any Airbyte source	`airbyte://?source=source-name`

The RestConnector base class handles OAuth2 (with refresh), API key, Bearer token, Basic auth, cursor/offset/link-header pagination, and token-bucket rate limiting.

Knowledge Page

The Knowledge page is the Semantic Agent's working memory — three tabs:

Graph — force-directed semantic graph with table nodes, relationship edges, confidence halos, and learned term annotations
Taxonomy — 4-layer knowledge feed (Foundation, Inferred, Confirmed, Corrected) with expandable entries showing exact items learned per query
Agent — conversational chat with the Semantic Agent. Full context: aliases, patterns, rules, relationships, evolution history. Conversation memory across messages. Typewriter streaming.

Learning System

How learning improves queries

Training pairs — every thumbs-up/correction saved to Qdrant (persistent on disk). Retrieved via cosine similarity for few-shot prompting.
Context rules — extracted from corrections, injected into every SQL generation prompt.
Entity aliases — learned mappings (e.g., "friday" -> 5) pre-resolved before LLM call at >=93% confidence.
Filter patterns — common filters (e.g., sex='Total') auto-injected after 3+ confirmations.
Failure reflection — past resolution failures loaded as anti-patterns.

Observable in traces

Every query trace shows an "Applied learned context" node:

Applied learned context
  3 past examples (best: 'top selling product' sim:0.94)
  2 rules applied
  4 context notes
  12 pre-resolved aliases
  semantic layer v8

Learn page metrics

Accuracy trend (early queries vs recent queries)
Active rules with hit count and success rate
Impact score showing learning improvement over time

UI Updates

Landing page updated for v2.0 architecture (Ora phases, Semantic Agent passes)
Sources page: SaaS connector cards (Shopify, Salesforce, Stripe, HubSpot, GA4, Airbyte, BigQuery, Snowflake)
Chart axis selection: LLM analyzes which columns vary vs are constant
Token tooltips: hover arrows for "input tokens sent to LLM" / "output tokens received"
Setup chat: persistent messages, streamed greeting, workspace naming before data
Re-execute button in Learn Agent correction flow

New API Endpoints

Endpoint	Method	Purpose
`/learn/re-execute`	POST	Execute edited SQL from correction flow
`/api/semantic/history`	GET	Semantic layer evolution history
`/api/semantic/graph`	GET	Graph-ready semantic model
`/api/learn/impact`	GET	Accuracy trend and rule inventory

Bug Fixes

Fixed React #310 crash: useState hooks before conditional returns
Fixed Qdrant lock conflict: per-workspace vectorstore paths
Fixed package shadowing: SetupAgent, LearningLoop, SchemaAgent re-exported
Fixed SQL correction loop: actual column names listed on retry
Fixed Learn Agent: _ensure_ready() before accessing _learn_graph
Fixed CTE-aware rewriting in Learn Agent correction prompt
Fixed entity coverage validation: filter values can be lists