Internal Framework Processing Diagram
This diagram focuses on the logic inside the NestJS Framework API, not the FE/Admin UI.
Clear ERD relationship table
| Group | Relationship | Type | Join key | Meaning |
|---|---|---|---|---|
| Tenant | workspaces → client_configs, sources, users_roles, visitors, sessions, atoms, pages, leads |
1:N | workspace_id |
Separates data by client/project and prevents cross-client data mixing. |
| Ingestion | sources → source_files → source_pages → raw_chunks |
1:N chain | source_id, source_file_id, source_page_id |
Source documents are split into files, pages, and chunks for OCR, debugging, and indexing. |
| Knowledge | atoms → atom_versions, atom_embeddings |
1:N | atom_id |
One atom has multiple versions and embeddings/indexes for RAG. |
| Evidence | atoms ↔ raw_chunks qua atom_chunk_links |
N:N | atom_id, raw_chunk_id |
An atom can have evidence/citations from many chunks; one chunk can support many atoms. |
| Atom classification | atoms ↔ taxonomy_terms qua atom_taxonomy_links |
N:N | atom_id, taxonomy_term_id |
An atom can be linked to many labels such as topic, product, industry, persona, and pain point. |
| Page context | pages ↔ taxonomy_terms qua page_taxonomy_links |
N:N | page_id, taxonomy_term_id |
Pages/routes are labeled so AI understands the user context. |
| Anonymous user | visitors → sessions → conversations → conversation_messages |
1:N chain | visitor_id, session_id, conversation_id |
Anonymous visitors still have sessions and transcripts before becoming leads. |
| Runtime state | sessions → session_slots, events |
1:N | session_id |
Stores known fields, event tracking, Gen UI actions, and RAG queries per session. |
| Lead/demo | leads → demo_requests |
1:N | lead_id |
One lead can have many demo, quote, or handoff requests. |
| Memory | memory_facts → visitor / lead / user / account |
Polymorphic | subject_type, subject_id |
One memory table can serve visitors, leads, logged-in users, or company accounts. |
Plain-English explanation for each table
| Table | Clear explanation | When data is created | Why it is required |
|---|---|---|---|
workspaces |
A workspace is the working area for one client/project. If the framework serves 10 different websites, each website has its own workspace. | Created when onboarding a new client. | Keeps DataSys data separate from other clients; every important table is scoped by workspace_id. |
client_configs |
The per-workspace configuration: brand, language, AI tone, answer rules, lead rules, and CRM/Slack/email connector settings. | Created during client setup and updated when rules, prompts, connectors, or policies change. | Keeps the core framework reusable, without hardcoding client-specific logic into code. |
users_roles |
The list of internal Admin users and their permissions, such as document uploaders, knowledge reviewers, and sales users who view leads. | Created when inviting users into Admin or changing permissions. | Controls who can upload, edit, approve, publish, reindex, or view customer data. |
sources |
The source record for original material. It represents an information source such as a PDF, old website, FAQ, slide deck, pricing sheet, or contract template. | Created when Admin uploads/imports a new data source. | Shows which document an AI answer came from, whether it is still valid, who uploaded it, and whether it is approved. |
source_files |
A physical file in storage under a source. One source can contain many files, for example a document package with multiple PDFs. | Created when a file is uploaded to storage. | Manages file path, format, checksum, size, parse status, and retries when errors occur. |
source_pages |
Parsed/OCR content by page or section. For example, page 5 of a PDF has its own OCR text. | Created after the system parses files, OCRs PDFs/images, or crawls HTML. | Lets Admin inspect OCR quality and lets AI cite the exact page/section. |
raw_chunks |
Small text segments cut from source_pages for easier retrieval. This is technical data, not the main content Admin should edit directly. | Created after document chunking. | Used to debug RAG: which segment was retrieved, which segment is noisy, and which should be linked to an atom. |
atoms |
An approved, normalized knowledge unit used by AI to answer. An atom is not only a product; it can be FAQ, policy, pricing, process, case study, company information, or technical documentation. | Created from raw chunks or directly written/edited by Admin in AMS. | This is the official source of truth for AI. AI should answer from approved/published atoms instead of unreviewed raw text. |
atom_versions |
The change history of an atom. Each edit stores a version so the previous content is preserved. | Created when an atom is edited, merged, split, approved, or republished. | Supports rollback, audit, old/new comparison, and prevents loss of important content. |
atom_embeddings |
The semantic vector of an atom. It helps the system find the right atom even when the user phrases the question differently. | Created when an atom is indexed or reindexed. | Lets RAG find relevant knowledge by meaning, not only by keyword. |
atom_chunk_links |
A link table connecting atoms to original chunks as evidence. One atom can rely on many chunks, and one chunk can support many atoms. | Created when an atom is generated from documents or when Admin manually links evidence. | Allows AI answers to include citations and lets Admin verify which document the atom came from. |
taxonomy_terms |
A shared taxonomy label set. Labels can represent industry, product, pain point, persona, topic, funnel stage, region, or language. | Created when Admin configures taxonomy or imports it from documents. | Used to filter RAG, understand page context, and personalize answers without hardcoding into atoms/pages. |
atom_taxonomy_links |
A link table attaching atoms to multiple taxonomy labels. For example, one atom can belong to ERP, manufacturing, and inventory. | Created when Admin tags an atom or when the system auto-tags after ingestion. | Lets one atom be reused in many contexts without duplicating data. |
pages |
A record describing a website route/page. It does not store full page HTML; it stores context so AI understands where the user is. | Created when the website has a new route, landing page, or campaign page. | Helps AI know the page topic, which atoms to prioritize, and which form/CTA to ask next. |
page_taxonomy_links |
A link table attaching a page to multiple taxonomy labels. For example, an ERP manufacturing page maps to manufacturing, ERP, and inventory_accuracy. | Created when Admin maps a route to industries/topics or when the system auto-maps it. | Lets AI understand context as soon as a user lands on the page, without asking again for industry/product if the page already shows it. |
visitors |
An anonymous visitor whose identity is not known yet. The system identifies them by anonymous_id in cookie/localStorage, without requiring email at first. | Created when a new browser visits the website. | Keeps anonymous visitor context, tracks consent, and merges into a lead when the visitor leaves contact information. |
sessions |
A visit/interaction session for a visitor/lead/user. One visitor can have many sessions across multiple returns to the website. | Created when a user starts a new visit or interaction. | Groups current page, UTM, campaign, known slots, events, and conversations within one visit. |
conversations |
A specific conversation within a session. One session can have one or more conversations depending on UI design. | Created when the user opens chat or sends the first message. | Manages conversation status, summary, handoff, and transcript. |
conversation_messages |
Each line in the conversation: user question, assistant answer, tool API call, or RAG source result. | Created every time there is a message or tool result. | Stores the full transcript, debugs answers, extracts memory/lead insights, and audits tokens/citations. |
session_slots |
Known fields within the session, such as industry, company size, need, timeline, and budget. | Created/updated when the user speaks in chat, clicks Gen UI, or fills a form. | Prevents repeated questions and lets forms automatically hide fields that are already known. |
events |
Runtime behavior and event log. An event is not necessarily a message; it is any action/state worth recording. | Created for page views, CTA clicks, chat turns, Gen UI actions, RAG queries, and CRM syncs. | Used for funnel analysis, lead scoring, flow debugging, and action audit. |
leads |
A profile for a potential customer. A visitor becomes a lead when contact information or clear buying intent exists. | Created when the user leaves email/phone, books a demo, requests a quote, or reaches a high-intent score. | Enables sales follow-up, CRM sync, summaries, owner assignment, and sales status tracking. |
demo_requests |
A specific request from a lead for a demo, consultation, quote, or contact. | Created when the user clicks book demo, submits a consultation form, or AI confirms a demo need. | Used to send Slack/email/CRM notifications and manage schedule and sales handling status. |
memory_facts |
Internal long-term memory fallback when Mem0 is not used. Each record is a memorable fact about a visitor, lead, user, or account. | Created when the system extracts stable information such as industry, company size, pain point, preference, and has proper consent. | Lets AI avoid asking again next time and personalize consultation based on long-term history. |
Complete data table dictionary
| Table | Clear description | Example data | How AI/Admin uses it |
|---|---|---|---|
workspaces |
Tenant/project root. Each customer or project using the framework has its own workspace. | datasys, education_client, real_estate_client |
Separates data, config, knowledge, users, and leads by client. |
client_configs |
Per-workspace configuration: brand, theme, prompt rules, connectors, and policies. | Consulting tone, language, CRM connector, and rules for not answering without sources. | NestJS loads config per client so the same core can run many projects. |
users_roles |
Internal users and permissions in Admin/AMS. | admin, editor, reviewer, sales, viewer |
Controls permission to upload, edit atoms, approve, reindex, view leads, or audit. |
sources |
Original source documents imported into the system by Admin. | Company profile PDF, old website, FAQ, slides, transcript, pricing sheet. | Tracks which source created which knowledge; supports audit and re-parse when documents change. |
source_files |
Physical source files in storage with technical metadata. | Storage path, MIME type, checksum, and parse status. | Shows which files parsed successfully, failed, or need reprocessing. |
source_pages |
Content by page/section after OCR or parsing. | Page 3 OCR text, screenshot path, quality score. | Admin reviews original content, checks OCR, and traces citations back. |
raw_chunks |
Technical chunks cut from source pages for retrieval/debugging. | A 300-800 token segment about an ERP feature or policy. | Not the main editable source; used to link evidence, debug RAG, and mark noisy chunks. |
atoms |
The official normalized/approved knowledge unit. It is not only a product record. | FAQ, policy, pricing, case study, implementation process, technical doc, sales script. | AI uses it as the official answer source; Admin edits, merges/splits, approves, and publishes it. |
atom_versions |
Atom change history. | Version 1 old pricing, version 2 updated policy. | Rollback, audit who changed what, and compare before/after content. |
atom_embeddings |
Atom vector embedding for semantic search. | Embedding model, vector, metadata filter, indexed_at. | RAG finds relevant atoms based on the question and context. |
atom_chunk_links |
Link table connecting atoms to raw chunks as evidence/citations. | The “implementation process” atom links to chunks from a proposal PDF. | AI answers include sources; Admin verifies which segment an atom came from. |
taxonomy_terms |
Shared taxonomy label system for atoms/pages/queries. | product=ERP, industry=manufacturing, persona=COO, topic=pricing. | Filters RAG, understands page context, and classifies content without hardcoding by product. |
atom_taxonomy_links |
Many-to-many link between atoms and taxonomy terms. | One atom tagged with ERP + manufacturing + inventory_accuracy. | One knowledge item can belong to many topics/industries/personas at once. |
pages |
Route/page context on the website, not hardcoded page content. | /erp-manufacturing, /pricing, /case-study |
AI knows where the user is, which knowledge to prioritize, and which CTA/form fits. |
page_taxonomy_links |
Links pages to taxonomy terms. | ERP Manufacturing page tagged with industry=manufacturing and product=ERP. | When the user is on this page, AI automatically understands the initial context. |
visitors |
Anonymous visitor before email, phone, or login is known. | anonymous_id from cookie/localStorage, consent_status, first_seen_at. | Tracks context and temporary memory; merges into a lead when the visitor provides information. |
sessions |
One website visit or interaction session. | visitor_id, lead_id, current_page, UTM, started_at. | Keeps runtime context: which page the user is on, which campaign, and which slots are known. |
conversations |
One conversation in a session. | web chat, status=open, summary=customer asks about ERP for manufacturing. | Groups messages, creates summaries, and analyzes lead insights. |
conversation_messages |
Each message in the conversation: user, assistant, or tool. | role=user, content="I need ERP for 200 employees". | Stores transcript, citation, tokens, and tool result; used for audit and memory extraction. |
session_slots |
Known fields in the session. | industry=manufacturing, company_size=200, pain_point=inventory. | Does not ask known information again; forms automatically skip fields already provided. |
events |
Tracks behavior and system events. | page_view, chat_turn, cta_click, gen_ui_action, rag_query. | Analyzes funnel, debugs flow, calculates lead score, and audits actions. |
leads |
Potential customer profile valuable enough for sales/marketing follow-up. | name, email, company_size, industry, interest, score, owner. | Creates sales summaries, syncs CRM, supports follow-up, and assigns owners. |
demo_requests |
Demo/quote/consultation request attached to a lead. | preferred_time, solution_interest, handoff_payload. | Sales receives schedule/demo requests; CRM/Slack/email receives notifications. |
memory_facts |
Long-term memory fallback when Mem0 is not used. | subject_type=lead, fact_key=company_size, fact_value=200. | Remembers long-term information by visitor/lead/user/account with consent and confidence. |
Detailed storage table
| Data group | Where it is stored | Suggested table/collection | Purpose | How Admin handles it |
|---|---|---|---|---|
| Source documents | Powabase Storage + Postgres | sources, source_files, source_pages |
Store PDFs, old web pages, OCR text, and source metadata. | Upload, inspect pages, re-parse, archive. |
| Technical chunks | Powabase Postgres/RAG index | raw_chunks, atom_chunk_links |
Debug retrieval and inspect which text was chunked/indexed. | View, filter, mark noisy, link to atoms, and reindex. Do not edit as the main source. |
| Knowledge atoms | Powabase Postgres + vector/RAG | atoms, atom_versions, atom_embeddings |
Official knowledge source for RAG/runtime; not just products, can be FAQ, policy, case study, pricing, or process. | Edit, merge/split, attach taxonomy, approve/publish, inspect versions. |
| Page context | Powabase Postgres | pages, page_taxonomy_links |
Identifies which topic, industry, or funnel stage a route/page belongs to. | Attach taxonomy to each route, set context hint, and default CTA. |
| Anonymous visitor | Powabase Postgres | visitors |
Identify anonymous visitors via anonymous_id/cookie before email/phone is known. | Check consent and merge visitor into lead when contact info is provided. |
| Context/session | Powabase Postgres | sessions, session_slots |
Know which page/campaign the user is in and which fields they already provided. | Debug sessions and check known/missing slots. |
| Conversation | Powabase Postgres | conversations, conversation_messages |
Store conversation history: user, assistant, tool result, citation, token, trace. | View transcripts, debug answers, create summaries, extract lead insights. |
| User memory | Mem0 or Powabase fallback | Mem0 memories or memory_facts |
Remember long-term facts such as industry, company size, pain point, and preference. | View/edit/delete when permission and consent allow it; avoid unnecessary sensitive PII. |
| Lead/event | Powabase Postgres + CRM connector | leads, events, demo_requests, crm_sync_logs |
Track funnel, create sales handoff, sync CRM. | View lead summary, retry CRM sync, export/report. |
| Governance | Powabase Postgres | audit_logs, content_gaps, workflow_tasks |
Track who changed what, which questions lack data, and which tasks need review. | Review, assign, resolve content gaps, inspect audit. |