AI Knowledge Chatbot (RAG)
An AI-powered chatbot embedded in the Intelligence Platform that answers questions about HPWH regulations, energy efficiency schemes, and relevant standards.
Overview
An AI-powered chatbot embedded in the Intelligence Platform that answers questions about HPWH regulations, energy efficiency schemes, and relevant standards. The chatbot uses a Retrieval-Augmented Generation (RAG) architecture: user questions are answered by an LLM grounded in a curated document knowledge base, with source references shown for every answer. This is not a general-purpose AI assistant. It is a domain-specific tool whose value comes entirely from the quality and currency of the knowledge base behind it. The chatbot is only as good as the documents it has access to.
User Stories
As an EnergyAE consultant, I want to ask questions about VEU Activity 44 methodology in plain English and get accurate, referenced answers so I don’t have to manually search PDF documents. As a manufacturer client, I want to understand what testing and registration is required for a specific scheme so I can plan a compliance pathway without waiting for a consultant. As a user, I want to see which document and section an answer came from so I can verify it and cite it myself. As Alastair, I want to add new documents to the knowledge base and have them immediately available to the chatbot without a developer being involved.
Knowledge Base: Document Scope
The RAG system is only as useful as its source documents. The following categories should be included at launch. Alastair is responsible for sourcing and uploading documents; the developer is responsible for building the ingestion pipeline that processes them. Scheme methodology documents:
- VEU Activity 44 methodology (current version)
- VEU Activity 44 calculation workbook documentation
- ESS HEAB method (NSW)
- SRES/STC calculation rules
- EECA/MBIE NZ scheme documentation
Standards
- AS/NZS 4234 (current edition - licensed copy, access-controlled)
- AS/NZS 5125.1
- Relevant sections of NZ Building Code (G12 hot water)
Regulatory guidance
- Essential Services Commission VEU guidelines
- IPART ESS auditor guidance
- Clean Energy Regulator STC guidance documents
EnergyAE internal documents (optional, staff-only)
Internal methodology notes Project templates and checklist documents This could be a separate knowledge base with different access controls - see open questions
Functional Requirements
Chat interface
- Standard conversational UI: user types question, chatbot responds
- Each response includes cited sources: document name, section or page number where possible
- Sources shown as expandable references below each answer, not inline footnotes
- Conversation history maintained within a session
- User can start a new conversation (clears context)
- Conversation history optionally saved to user account for retrieval
Source citations
Every factual claim in a response must be traceable to a source document If the knowledge base does not contain relevant information, the chatbot must say so explicitly rather than hallucinating an answer Citation shows: document name, version/date, section heading or page number User can click a citation to see the exact source chunk that was retrieved
Confidence and limitations
Chatbot should preface answers on complex regulatory questions with a note that outputs should be verified against current scheme documentation Chatbot should not present itself as providing legal or compliance advice Out-of-scope questions (e.g. general cooking recipes) should be politely declined with a redirect to the intended purpose
Knowledge base management (admin UI)
Alastair (admin role) can upload new documents via a simple UI without developer involvement Supported formats: PDF, DOCX, TXT On upload, document is automatically chunked, embedded, and added to the vector store Admin can tag documents with: market (AU/NZ/both), category (scheme/standard/guidance/internal), and active/inactive status Inactive documents remain in the system but are excluded from retrieval Admin can view all documents in the knowledge base with their status and last-updated date Document versioning: when a new version of a document is uploaded, old version is marked superseded but retained for audit purposes
RAG Architecture
the developer to propose the full technical stack before building. The following are constraints and preferences, not a complete specification. Preferred approach:
Embedding model: OpenAI text-embedding-3-small or equivalent - the developer to recommend Vector store: Pinecone, Supabase pgvector, or Chroma - the developer to recommend based on hosting and cost LLM: Claude via Anthropic API (claude-sonnet-4-20250514) - this is a firm preference, not optional Chunk size and overlap: the developer to propose based on document types; methodology PDFs with dense tabular content may need different chunking to narrative guidance documents Retrieval: top-k semantic search with metadata filtering (market, category) to narrow retrieval scope based on conversation context
Query pipeline
User submits question Question is embedded Vector search retrieves top-k relevant chunks, filtered by applicable metadata Retrieved chunks plus conversation history passed to LLM with system prompt LLM generates answer grounded in retrieved context Source references extracted from retrieved chunks and attached to response Response and sources displayed to user
System prompt requirements
Alastair to draft the system prompt in collaboration with the developer Must instruct the LLM to: answer only from retrieved context, cite sources, acknowledge when information is not available, maintain professional tone, not provide legal or compliance advice System prompt should be stored as a configurable parameter, not hardcoded, so it can be updated without a code deploy
Access Control
This feature has more sensitivity than the news feed because source documents may include licensed standards and internal EnergyAE documents. Access tiers to consider (the developer to implement, Alastair to decide on policy):
EnergyAE staff: full knowledge base including internal documents and licensed standards Registered manufacturer/importer clients: scheme and guidance documents only, no internal or licensed standards Public/unauthenticated: not available
Open question: whether to offer a limited public demo with a restricted document set for marketing purposes.
UI / UX Direction
- Chat panel embedded in the platform, not a separate page
- Clean minimal interface: input at bottom, conversation scrolls up
- Source citations expandable/collapsible below each response to keep the conversation readable
- “Suggested questions” shown on first load to guide users toward useful queries (e.g. “What are the eligibility requirements for VEU Activity 44?” / “What testing is required for ESS HEAB registration?”)
- Typing indicator while response is being generated
- Mobile-responsive
- the developer to produce wireframe for review before building
Out of Scope (v1)
Voice input Multi-language support Automated knowledge base updates (documents added manually by admin in v1) Fine-tuned LLM (RAG only in v1) Comparison of two documents or scheme methodologies side by side Integration with simulation results (chatbot cannot reference a user’s specific simulation run in v1) Public demo version
Data Model (indicative)
Documents table:
document_id filename display_name version market (AU / NZ / both) category (scheme / standard / guidance / internal) status (active / inactive / superseded) uploaded_at uploaded_by
Chunks table
chunk_id document_id (foreign key) chunk_text embedding (vector) page_number (where available) section_heading (where extractable)
Conversations table
conversation_id user_id created_at messages (JSON array: role, content, sources, timestamp)
Acceptance Criteria
The feature is done when:
- Chatbot answers domain-relevant questions accurately using retrieved context
- Every response includes source citations showing document name and section/page
- Chatbot explicitly states when it cannot find relevant information rather than fabricating an answer
- Admin can upload a new PDF and it is retrievable by the chatbot within 5 minutes, without developer involvement
- Document metadata (market, category, status) correctly filters retrieval scope
- Superseded documents are excluded from retrieval when a newer version exists
- Access control correctly restricts document access by user tier
- Conversation history is saved and retrievable per user
- System prompt is configurable without a code deploy
- Alastair has tested the chatbot against a set of 10 known questions with verified correct answers before sign-off
- Chatbot declines out-of-scope questions gracefully
- the developer has documented the chunking strategy and rationale
Open Questions (resolve before build starts)
Should the internal EnergyAE knowledge base (methodology notes, templates) be in the same vector store as public scheme documents, or a separate store with stricter access? Separate stores are cleaner architecturally but add complexity. Licensed standards (AS/NZS 4234 etc.) are copyright-protected. Can they be ingested into a vector store for internal use? Alastair to take a view on this, potentially seek advice. A conservative approach would be to include only publicly available documents in v1. What are the 10 test questions Alastair will use for acceptance testing? These should be agreed before build starts, not after. Should manufacturer/importer clients have access to this feature in v1, or is it EnergyAE staff only to start? the developer to recommend vector store and embedding model with rough cost estimate before architecture is finalised. Should the chatbot have visibility of the current date so it can flag when documents may be outdated?