Executive Summary
For CTOs and Senior Data Engineers, the challenge of Agentic AI is the “Reasoning-to-Execution” gap. How do we allow users to query complex semantic data models using natural language without risking unoptimised queries or hallucinations? The answer lies in the Cube Semantic Layer. By utilising a hybrid architecture of local vector stores (powered by Ollama) and Amazon Nova Lite (via Bedrock), we built a system that translates intent into cube.json compatible requests. This system ensures semantic integrity by employing a Three-Agent Governance Triad before a single API call is made.
1. The Hybrid Compute Strategy: Cloud Reasoning, Local RAG
Our architecture splits the AI workload to optimize for both cost and intelligence:
- Reasoning (Amazon Nova Lite): We utilize
strands.models.BedrockModelto access Amazon Nova Lite. This model was selected for its strict adherence to system prompts, treating the Cube schema as a set of inviolable rules rather than suggestions. - Embeddings (Local Ollama): We shifted our embedding strategy from cloud models to local
mxbai-embed-largevia Ollama. This decoupled our RAG costs from our reasoning costs and eliminated latency during schema retrieval.
2. The “Governance Triad” Architecture
Instead of a single prompt trying to do everything, getreflect_ask_aws.py implements a multi-agent pipeline:
- The Architect: First, this agent scans the user request and the retrieved schema to generate a Governance Policy. It outputs constraints (e.g., “Must filter by
orders.status = 'completed'“) rather than code. - The Compiler: This agent acts as the developer. It takes the Architect’s policy and drafts a raw JSON query.
- The Auditor: The final gatekeeper. It grades the Compiler’s work against the policy, assigning a Confidence Score (0-100). If the score drops below 95%, the system rejects the query rather than guessing.
3. The “Fuzzy” Standardization Layer
A common failure mode in Text-to-SQL is “near-miss” hallucinations (e.g., requesting revenue_total when the schema defines total_revenue).
- The Problem: LLMs often hallucinate synonyms or suffix variations.
- The Fix: We implemented a
QueryStandardizationclass that acts as a middleware. It usesdifflib(Fuzzy Logic) and Regex to intercept the AI’s JSON output. It maps every dimension and measure back to the official Member Map loaded from the Cube API, ensuring that only valid, existing identifiers are ever requested.
4. The 5 Principles of Agentic AI Development
Our methodology was guided by the AWS best practicies and framework, adapted for a hybrid stack:
- Model Selection: Choosing Nova Lite for instruction-following over the “creative” Pro variant.
- RAG & Grounding: Using LanceDB with Ollama embeddings to ground Cloud LLM outputs in actual Cube metadata.
- Agent Specialization: Splitting the task into Architect, Compiler, and Auditor roles to enforce separation of concerns.
- Operational Efficiency: Leveraging the “Hybrid Split”—using free, local compute for embeddings and highly optimized cloud compute for reasoning.
- Zero-Load Reasoning: No query executes until it passes the Auditor’s confidence threshold and the Standardization layer’s validation.
Script: getreflect_ask_aws.py
Python
# Implementation Highlights:
# 1. Hybrid Stack: Amazon Nova Lite (Reasoning) + Ollama mxbai (Embeddings)
# 2. Triad Architecture: Architect -> Compiler -> Auditor pipeline
# 3. Standardization Engine: Fuzzy logic repair for schema hallucinations
# 4. Strict Confidence Thresholding (95%)
# 5. Direct integration via strands.models.BedrockModel