Internal Framework Processing Diagram

Clear ERD relationship table

Group	Relationship	Type	Join key	Meaning
Tenant	`workspaces` → `client_configs`, `sources`, `users_roles`, `visitors`, `sessions`, `atoms`, `pages`, `leads`	1:N	`workspace_id`	Separates data by client/project and prevents cross-client data mixing.
Ingestion	`sources` → `source_files` → `source_pages` → `raw_chunks`	1:N chain	`source_id`, `source_file_id`, `source_page_id`	Source documents are split into files, pages, and chunks for OCR, debugging, and indexing.
Knowledge	`atoms` → `atom_versions`, `atom_embeddings`	1:N	`atom_id`	One atom has multiple versions and embeddings/indexes for RAG.
Evidence	`atoms` ↔ `raw_chunks` qua `atom_chunk_links`	N:N	`atom_id`, `raw_chunk_id`	An atom can have evidence/citations from many chunks; one chunk can support many atoms.
Atom classification	`atoms` ↔ `taxonomy_terms` qua `atom_taxonomy_links`	N:N	`atom_id`, `taxonomy_term_id`	An atom can be linked to many labels such as topic, product, industry, persona, and pain point.
Page context	`pages` ↔ `taxonomy_terms` qua `page_taxonomy_links`	N:N	`page_id`, `taxonomy_term_id`	Pages/routes are labeled so AI understands the user context.
Anonymous user	`visitors` → `sessions` → `conversations` → `conversation_messages`	1:N chain	`visitor_id`, `session_id`, `conversation_id`	Anonymous visitors still have sessions and transcripts before becoming leads.
Runtime state	`sessions` → `session_slots`, `events`	1:N	`session_id`	Stores known fields, event tracking, Gen UI actions, and RAG queries per session.
Lead/demo	`leads` → `demo_requests`	1:N	`lead_id`	One lead can have many demo, quote, or handoff requests.
Memory	`memory_facts` → `visitor` / `lead` / `user` / `account`	Polymorphic	`subject_type`, `subject_id`	One memory table can serve visitors, leads, logged-in users, or company accounts.

Plain-English explanation for each table

Table	Clear explanation	When data is created	Why it is required
`workspaces`	A workspace is the working area for one client/project. If the framework serves 10 different websites, each website has its own workspace.	Created when onboarding a new client.	Keeps DataSys data separate from other clients; every important table is scoped by `workspace_id`.
`client_configs`	The per-workspace configuration: brand, language, AI tone, answer rules, lead rules, and CRM/Slack/email connector settings.	Created during client setup and updated when rules, prompts, connectors, or policies change.	Keeps the core framework reusable, without hardcoding client-specific logic into code.
`users_roles`	The list of internal Admin users and their permissions, such as document uploaders, knowledge reviewers, and sales users who view leads.	Created when inviting users into Admin or changing permissions.	Controls who can upload, edit, approve, publish, reindex, or view customer data.
`sources`	The source record for original material. It represents an information source such as a PDF, old website, FAQ, slide deck, pricing sheet, or contract template.	Created when Admin uploads/imports a new data source.	Shows which document an AI answer came from, whether it is still valid, who uploaded it, and whether it is approved.
`source_files`	A physical file in storage under a source. One source can contain many files, for example a document package with multiple PDFs.	Created when a file is uploaded to storage.	Manages file path, format, checksum, size, parse status, and retries when errors occur.
`source_pages`	Parsed/OCR content by page or section. For example, page 5 of a PDF has its own OCR text.	Created after the system parses files, OCRs PDFs/images, or crawls HTML.	Lets Admin inspect OCR quality and lets AI cite the exact page/section.
`raw_chunks`	Small text segments cut from source_pages for easier retrieval. This is technical data, not the main content Admin should edit directly.	Created after document chunking.	Used to debug RAG: which segment was retrieved, which segment is noisy, and which should be linked to an atom.
`atoms`	An approved, normalized knowledge unit used by AI to answer. An atom is not only a product; it can be FAQ, policy, pricing, process, case study, company information, or technical documentation.	Created from raw chunks or directly written/edited by Admin in AMS.	This is the official source of truth for AI. AI should answer from approved/published atoms instead of unreviewed raw text.
`atom_versions`	The change history of an atom. Each edit stores a version so the previous content is preserved.	Created when an atom is edited, merged, split, approved, or republished.	Supports rollback, audit, old/new comparison, and prevents loss of important content.
`atom_embeddings`	The semantic vector of an atom. It helps the system find the right atom even when the user phrases the question differently.	Created when an atom is indexed or reindexed.	Lets RAG find relevant knowledge by meaning, not only by keyword.
`atom_chunk_links`	A link table connecting atoms to original chunks as evidence. One atom can rely on many chunks, and one chunk can support many atoms.	Created when an atom is generated from documents or when Admin manually links evidence.	Allows AI answers to include citations and lets Admin verify which document the atom came from.
`taxonomy_terms`	A shared taxonomy label set. Labels can represent industry, product, pain point, persona, topic, funnel stage, region, or language.	Created when Admin configures taxonomy or imports it from documents.	Used to filter RAG, understand page context, and personalize answers without hardcoding into atoms/pages.
`atom_taxonomy_links`	A link table attaching atoms to multiple taxonomy labels. For example, one atom can belong to ERP, manufacturing, and inventory.	Created when Admin tags an atom or when the system auto-tags after ingestion.	Lets one atom be reused in many contexts without duplicating data.
`pages`	A record describing a website route/page. It does not store full page HTML; it stores context so AI understands where the user is.	Created when the website has a new route, landing page, or campaign page.	Helps AI know the page topic, which atoms to prioritize, and which form/CTA to ask next.
`page_taxonomy_links`	A link table attaching a page to multiple taxonomy labels. For example, an ERP manufacturing page maps to manufacturing, ERP, and inventory_accuracy.	Created when Admin maps a route to industries/topics or when the system auto-maps it.	Lets AI understand context as soon as a user lands on the page, without asking again for industry/product if the page already shows it.
`visitors`	An anonymous visitor whose identity is not known yet. The system identifies them by anonymous_id in cookie/localStorage, without requiring email at first.	Created when a new browser visits the website.	Keeps anonymous visitor context, tracks consent, and merges into a lead when the visitor leaves contact information.
`sessions`	A visit/interaction session for a visitor/lead/user. One visitor can have many sessions across multiple returns to the website.	Created when a user starts a new visit or interaction.	Groups current page, UTM, campaign, known slots, events, and conversations within one visit.
`conversations`	A specific conversation within a session. One session can have one or more conversations depending on UI design.	Created when the user opens chat or sends the first message.	Manages conversation status, summary, handoff, and transcript.
`conversation_messages`	Each line in the conversation: user question, assistant answer, tool API call, or RAG source result.	Created every time there is a message or tool result.	Stores the full transcript, debugs answers, extracts memory/lead insights, and audits tokens/citations.
`session_slots`	Known fields within the session, such as industry, company size, need, timeline, and budget.	Created/updated when the user speaks in chat, clicks Gen UI, or fills a form.	Prevents repeated questions and lets forms automatically hide fields that are already known.
`events`	Runtime behavior and event log. An event is not necessarily a message; it is any action/state worth recording.	Created for page views, CTA clicks, chat turns, Gen UI actions, RAG queries, and CRM syncs.	Used for funnel analysis, lead scoring, flow debugging, and action audit.
`leads`	A profile for a potential customer. A visitor becomes a lead when contact information or clear buying intent exists.	Created when the user leaves email/phone, books a demo, requests a quote, or reaches a high-intent score.	Enables sales follow-up, CRM sync, summaries, owner assignment, and sales status tracking.
`demo_requests`	A specific request from a lead for a demo, consultation, quote, or contact.	Created when the user clicks book demo, submits a consultation form, or AI confirms a demo need.	Used to send Slack/email/CRM notifications and manage schedule and sales handling status.
`memory_facts`	Internal long-term memory fallback when Mem0 is not used. Each record is a memorable fact about a visitor, lead, user, or account.	Created when the system extracts stable information such as industry, company size, pain point, preference, and has proper consent.	Lets AI avoid asking again next time and personalize consultation based on long-term history.

Complete data table dictionary

Table	Clear description	Example data	How AI/Admin uses it
`workspaces`	Tenant/project root. Each customer or project using the framework has its own workspace.	`datasys`, `education_client`, `real_estate_client`	Separates data, config, knowledge, users, and leads by client.
`client_configs`	Per-workspace configuration: brand, theme, prompt rules, connectors, and policies.	Consulting tone, language, CRM connector, and rules for not answering without sources.	NestJS loads config per client so the same core can run many projects.
`users_roles`	Internal users and permissions in Admin/AMS.	`admin`, `editor`, `reviewer`, `sales`, `viewer`	Controls permission to upload, edit atoms, approve, reindex, view leads, or audit.
`sources`	Original source documents imported into the system by Admin.	Company profile PDF, old website, FAQ, slides, transcript, pricing sheet.	Tracks which source created which knowledge; supports audit and re-parse when documents change.
`source_files`	Physical source files in storage with technical metadata.	Storage path, MIME type, checksum, and parse status.	Shows which files parsed successfully, failed, or need reprocessing.
`source_pages`	Content by page/section after OCR or parsing.	Page 3 OCR text, screenshot path, quality score.	Admin reviews original content, checks OCR, and traces citations back.
`raw_chunks`	Technical chunks cut from source pages for retrieval/debugging.	A 300-800 token segment about an ERP feature or policy.	Not the main editable source; used to link evidence, debug RAG, and mark noisy chunks.
`atoms`	The official normalized/approved knowledge unit. It is not only a product record.	FAQ, policy, pricing, case study, implementation process, technical doc, sales script.	AI uses it as the official answer source; Admin edits, merges/splits, approves, and publishes it.
`atom_versions`	Atom change history.	Version 1 old pricing, version 2 updated policy.	Rollback, audit who changed what, and compare before/after content.
`atom_embeddings`	Atom vector embedding for semantic search.	Embedding model, vector, metadata filter, indexed_at.	RAG finds relevant atoms based on the question and context.
`atom_chunk_links`	Link table connecting atoms to raw chunks as evidence/citations.	The “implementation process” atom links to chunks from a proposal PDF.	AI answers include sources; Admin verifies which segment an atom came from.
`taxonomy_terms`	Shared taxonomy label system for atoms/pages/queries.	product=ERP, industry=manufacturing, persona=COO, topic=pricing.	Filters RAG, understands page context, and classifies content without hardcoding by product.
`atom_taxonomy_links`	Many-to-many link between atoms and taxonomy terms.	One atom tagged with ERP + manufacturing + inventory_accuracy.	One knowledge item can belong to many topics/industries/personas at once.
`pages`	Route/page context on the website, not hardcoded page content.	`/erp-manufacturing`, `/pricing`, `/case-study`	AI knows where the user is, which knowledge to prioritize, and which CTA/form fits.
`page_taxonomy_links`	Links pages to taxonomy terms.	ERP Manufacturing page tagged with industry=manufacturing and product=ERP.	When the user is on this page, AI automatically understands the initial context.
`visitors`	Anonymous visitor before email, phone, or login is known.	anonymous_id from cookie/localStorage, consent_status, first_seen_at.	Tracks context and temporary memory; merges into a lead when the visitor provides information.
`sessions`	One website visit or interaction session.	visitor_id, lead_id, current_page, UTM, started_at.	Keeps runtime context: which page the user is on, which campaign, and which slots are known.
`conversations`	One conversation in a session.	web chat, status=open, summary=customer asks about ERP for manufacturing.	Groups messages, creates summaries, and analyzes lead insights.
`conversation_messages`	Each message in the conversation: user, assistant, or tool.	role=user, content="I need ERP for 200 employees".	Stores transcript, citation, tokens, and tool result; used for audit and memory extraction.
`session_slots`	Known fields in the session.	industry=manufacturing, company_size=200, pain_point=inventory.	Does not ask known information again; forms automatically skip fields already provided.
`events`	Tracks behavior and system events.	page_view, chat_turn, cta_click, gen_ui_action, rag_query.	Analyzes funnel, debugs flow, calculates lead score, and audits actions.
`leads`	Potential customer profile valuable enough for sales/marketing follow-up.	name, email, company_size, industry, interest, score, owner.	Creates sales summaries, syncs CRM, supports follow-up, and assigns owners.
`demo_requests`	Demo/quote/consultation request attached to a lead.	preferred_time, solution_interest, handoff_payload.	Sales receives schedule/demo requests; CRM/Slack/email receives notifications.
`memory_facts`	Long-term memory fallback when Mem0 is not used.	subject_type=lead, fact_key=company_size, fact_value=200.	Remembers long-term information by visitor/lead/user/account with consent and confidence.

Detailed storage table

Data group	Where it is stored	Suggested table/collection	Purpose	How Admin handles it
Source documents	Powabase Storage + Postgres	`sources`, `source_files`, `source_pages`	Store PDFs, old web pages, OCR text, and source metadata.	Upload, inspect pages, re-parse, archive.
Technical chunks	Powabase Postgres/RAG index	`raw_chunks`, `atom_chunk_links`	Debug retrieval and inspect which text was chunked/indexed.	View, filter, mark noisy, link to atoms, and reindex. Do not edit as the main source.
Knowledge atoms	Powabase Postgres + vector/RAG	`atoms`, `atom_versions`, `atom_embeddings`	Official knowledge source for RAG/runtime; not just products, can be FAQ, policy, case study, pricing, or process.	Edit, merge/split, attach taxonomy, approve/publish, inspect versions.
Page context	Powabase Postgres	`pages`, `page_taxonomy_links`	Identifies which topic, industry, or funnel stage a route/page belongs to.	Attach taxonomy to each route, set context hint, and default CTA.
Anonymous visitor	Powabase Postgres	`visitors`	Identify anonymous visitors via anonymous_id/cookie before email/phone is known.	Check consent and merge visitor into lead when contact info is provided.
Context/session	Powabase Postgres	`sessions`, `session_slots`	Know which page/campaign the user is in and which fields they already provided.	Debug sessions and check known/missing slots.
Conversation	Powabase Postgres	`conversations`, `conversation_messages`	Store conversation history: user, assistant, tool result, citation, token, trace.	View transcripts, debug answers, create summaries, extract lead insights.
User memory	Mem0 or Powabase fallback	Mem0 memories or `memory_facts`	Remember long-term facts such as industry, company size, pain point, and preference.	View/edit/delete when permission and consent allow it; avoid unnecessary sensitive PII.
Lead/event	Powabase Postgres + CRM connector	`leads`, `events`, `demo_requests`, `crm_sync_logs`	Track funnel, create sales handoff, sync CRM.	View lead summary, retry CRM sync, export/report.
Governance	Powabase Postgres	`audit_logs`, `content_gaps`, `workflow_tasks`	Track who changed what, which questions lack data, and which tasks need review.	Review, assign, resolve content gaps, inspect audit.