AI Knowledge Chatbot (RAG) | Knowledge Base

Overview

An AI-powered chatbot embedded in the Intelligence Platform that answers questions about HPWH regulations, energy efficiency schemes, and relevant standards. The chatbot uses a Retrieval-Augmented Generation (RAG) architecture: user questions are answered by an LLM grounded in a curated document knowledge base, with source references shown for every answer. This is not a general-purpose AI assistant. It is a domain-specific tool whose value comes entirely from the quality and currency of the knowledge base behind it. The chatbot is only as good as the documents it has access to.

User Stories

As an EnergyAE consultant, I want to ask questions about VEU Activity 44 methodology in plain English and get accurate, referenced answers so I don’t have to manually search PDF documents. As a manufacturer client, I want to understand what testing and registration is required for a specific scheme so I can plan a compliance pathway without waiting for a consultant. As a user, I want to see which document and section an answer came from so I can verify it and cite it myself. As Alastair, I want to add new documents to the knowledge base and have them immediately available to the chatbot without a developer being involved.

Knowledge Base: Document Scope

The RAG system is only as useful as its source documents. The following categories should be included at launch. Alastair is responsible for sourcing and uploading documents; the developer is responsible for building the ingestion pipeline that processes them. Scheme methodology documents:

VEU Activity 44 methodology (current version)
VEU Activity 44 calculation workbook documentation
ESS HEAB method (NSW)
SRES/STC calculation rules
EECA/MBIE NZ scheme documentation

Standards

AS/NZS 4234 (current edition - licensed copy, access-controlled)
AS/NZS 5125.1
Relevant sections of NZ Building Code (G12 hot water)

Regulatory guidance

Essential Services Commission VEU guidelines
IPART ESS auditor guidance
Clean Energy Regulator STC guidance documents

EnergyAE internal documents (optional, staff-only)

Internal methodology notes Project templates and checklist documents This could be a separate knowledge base with different access controls - see open questions

Functional Requirements

Chat interface

Standard conversational UI: user types question, chatbot responds
Each response includes cited sources: document name, section or page number where possible
Sources shown as expandable references below each answer, not inline footnotes
Conversation history maintained within a session
User can start a new conversation (clears context)
Conversation history optionally saved to user account for retrieval

Source citations

Every factual claim in a response must be traceable to a source document If the knowledge base does not contain relevant information, the chatbot must say so explicitly rather than hallucinating an answer Citation shows: document name, version/date, section heading or page number User can click a citation to see the exact source chunk that was retrieved

Confidence and limitations

Chatbot should preface answers on complex regulatory questions with a note that outputs should be verified against current scheme documentation Chatbot should not present itself as providing legal or compliance advice Out-of-scope questions (e.g. general cooking recipes) should be politely declined with a redirect to the intended purpose

Knowledge base management (admin UI)

Alastair (admin role) can upload new documents via a simple UI without developer involvement Supported formats: PDF, DOCX, TXT On upload, document is automatically chunked, embedded, and added to the vector store Admin can tag documents with: market (AU/NZ/both), category (scheme/standard/guidance/internal), and active/inactive status Inactive documents remain in the system but are excluded from retrieval Admin can view all documents in the knowledge base with their status and last-updated date Document versioning: when a new version of a document is uploaded, old version is marked superseded but retained for audit purposes

RAG Architecture

the developer to propose the full technical stack before building. The following are constraints and preferences, not a complete specification. Preferred approach:

Embedding model: OpenAI text-embedding-3-small or equivalent - the developer to recommend Vector store: Pinecone, Supabase pgvector, or Chroma - the developer to recommend based on hosting and cost LLM: Claude via Anthropic API (claude-sonnet-4-20250514) - this is a firm preference, not optional Chunk size and overlap: the developer to propose based on document types; methodology PDFs with dense tabular content may need different chunking to narrative guidance documents Retrieval: top-k semantic search with metadata filtering (market, category) to narrow retrieval scope based on conversation context

Query pipeline

User submits question Question is embedded Vector search retrieves top-k relevant chunks, filtered by applicable metadata Retrieved chunks plus conversation history passed to LLM with system prompt LLM generates answer grounded in retrieved context Source references extracted from retrieved chunks and attached to response Response and sources displayed to user

System prompt requirements

Alastair to draft the system prompt in collaboration with the developer Must instruct the LLM to: answer only from retrieved context, cite sources, acknowledge when information is not available, maintain professional tone, not provide legal or compliance advice System prompt should be stored as a configurable parameter, not hardcoded, so it can be updated without a code deploy

Access Control

This feature has more sensitivity than the news feed because source documents may include licensed standards and internal EnergyAE documents. Access tiers to consider (the developer to implement, Alastair to decide on policy):

EnergyAE staff: full knowledge base including internal documents and licensed standards Registered manufacturer/importer clients: scheme and guidance documents only, no internal or licensed standards Public/unauthenticated: not available

Open question: whether to offer a limited public demo with a restricted document set for marketing purposes.

UI / UX Direction

Chat panel embedded in the platform, not a separate page
Clean minimal interface: input at bottom, conversation scrolls up
Source citations expandable/collapsible below each response to keep the conversation readable
“Suggested questions” shown on first load to guide users toward useful queries (e.g. “What are the eligibility requirements for VEU Activity 44?” / “What testing is required for ESS HEAB registration?”)
Typing indicator while response is being generated
Mobile-responsive
the developer to produce wireframe for review before building

Out of Scope (v1)

Voice input Multi-language support Automated knowledge base updates (documents added manually by admin in v1) Fine-tuned LLM (RAG only in v1) Comparison of two documents or scheme methodologies side by side Integration with simulation results (chatbot cannot reference a user’s specific simulation run in v1) Public demo version

Data Model (indicative)

Documents table:

document_id filename display_name version market (AU / NZ / both) category (scheme / standard / guidance / internal) status (active / inactive / superseded) uploaded_at uploaded_by

Chunks table

chunk_id document_id (foreign key) chunk_text embedding (vector) page_number (where available) section_heading (where extractable)

Conversations table

conversation_id user_id created_at messages (JSON array: role, content, sources, timestamp)

Acceptance Criteria

The feature is done when:

Open Questions (resolve before build starts)

Should the internal EnergyAE knowledge base (methodology notes, templates) be in the same vector store as public scheme documents, or a separate store with stricter access? Separate stores are cleaner architecturally but add complexity. Licensed standards (AS/NZS 4234 etc.) are copyright-protected. Can they be ingested into a vector store for internal use? Alastair to take a view on this, potentially seek advice. A conservative approach would be to include only publicly available documents in v1. What are the 10 test questions Alastair will use for acceptance testing? These should be agreed before build starts, not after. Should manufacturer/importer clients have access to this feature in v1, or is it EnergyAE staff only to start? the developer to recommend vector store and embedding model with rough cost estimate before architecture is finalised. Should the chatbot have visibility of the current date so it can flag when documents may be outdated?