Features Pricing FAQ Blog White Paper Contact Try Now
Research · AI Systems · Security

GroupGPT: A Production-Grade, End-to-End Encrypted AI Group Chat Platform with Structured Knowledge Memory and Hybrid Edge-Cloud Inference

GroupGPT Team · groupgpt.tech · Live System E2EE Open Access

Abstract

We present GroupGPT, a production-deployed, privacy-first AI group chat platform that simultaneously addresses five challenges underserved by prior agentic group chat frameworks: end-to-end encryption with AI participation (where even the server cannot read private message content yet the AI responds coherently); structured room-scoped knowledge memory via a typed Neuron graph; hybrid edge-cloud inference routing tasks to on-device WebLLM or streaming cloud models; a rich multimodal command ecosystem (live web search, deep research, sandboxed code execution, image generation, and voice with STT); and a dual-context architecture maintaining strict database-level isolation between ephemeral public and persistent private rooms. Internal evaluation shows AI responses scoring 4.71 / 5.0 with token usage reduced by up to versus naive full-history injection. GroupGPT has been publicly deployed with paying users since April 4, 2025, and is freely accessible at groupgpt.tech.

1 · Introduction

The landscape of LLM-based assistants has been dominated by single-user dialogue systems. Extending these to group chat settings—where multiple participants interact simultaneously, conversations shift topic rapidly, and sensitive personal information is regularly shared—introduces a fundamentally different set of requirements.

Related work has addressed group chat AI through frameworks that focus primarily on when the AI should intervene, but leave critical production concerns unaddressed: How does the AI accumulate room-specific knowledge across sessions? How do users protect sensitive information from the platform operator itself? How does the system remain affordable as group size and message volume grow?

GroupGPT addresses all of these in a single, coherent, production-deployed system. Our key contributions are:

Figure 1 — GroupGPT vs. Related Group Chat Frameworks
Feature MUCA / HUMA Shen et al. 2026 GroupGPT (ours)
Intervention timingRule-basedSLM judgeToken budget
Privacy modelNonePII rewriteFull E2EE
AI reads E2EE messagesNoNoYes
Persistent memoryNoneSliding windowNeuron graph
On-device inferenceNoPartialWebLLM full
Command toolsNoneNone7 ! commands
Voice + STTNoYesYes
Production deployedNoNoYes
Multi-DB isolationNoNoYes

2 · Related Work

2.1 Multi-User Group Chat AI Frameworks

MUCA [4] formalized the "3W" dimensions for group chat agents and introduced fixed-interval LLM evaluation. HUMA [5] extended this with human-like behavioral timing. MAP [6] explored multi-agent personalization. Of these, only MUCA (January 2024) predates GroupGPT's public launch on April 4, 2025; HUMA and MAP appeared as concurrent work later in 2025. These frameworks share a common limitation: they are research prototypes that do not address E2EE, persistent structured memory, or production-scale token economics.

A concurrent paper (Shen et al., 2026) [7] proposes an edge-cloud architecture decoupling intervention timing from response generation, introducing the MUIR benchmark. GroupGPT is complementary: where Shen et al. focus on when to intervene and privacy sanitization, GroupGPT addresses what the AI knows (Neurons), cryptographic privacy (full E2EE), and full production deployment.

2.2 Memory-Augmented LLMs

MemGPT [8] introduced hierarchical memory management for LLMs. RAG systems [9] retrieve relevant documents at inference time. GroupGPT's Neuron system differs from both: rather than automatic extraction or document retrieval, Neurons are explicitly curated by room members, giving users full authorship over what the AI knows and ensuring no unintended information leaks between rooms.

2.3 End-to-End Encryption in AI Systems

Signal [10] demonstrated E2EE in messaging. Extending E2EE to AI-assisted conversations — where the AI must read encrypted messages to respond — is an open problem that, to our knowledge, no prior group chat AI system addresses. GroupGPT resolves this by issuing the AI assistant its own ECDH keypair so it participates as a cryptographic peer, identical to human users.

2.4 Hybrid Edge-Cloud Inference

WebLLM [11] enabled in-browser LLM inference via WebGPU. Prior group chat systems do not integrate on-device inference. GroupGPT uses WebLLM as a genuine fallback for offline use and as a privacy-preserving alternative for queries users prefer not to send to cloud APIs.

3 · System Architecture

3.1 High-Level Overview

GroupGPT consists of two independent services: a Frontend (React 18 + Vite + TypeScript, Chakra UI) and a Backend (Node.js + Express + Socket.io, dual Prisma-managed SQLite databases).

Figure 2 — Full System Architecture
Browser (Client)
SocketServiceSocket.io client
E2EEServiceECDH + AES-GCM
WebLLMService@mlc-ai/web-llm
ChatContainer.tsx
worldCacheRef[ ]
roomCacheRef Map<id, Msg[]>
displayMode: 'world' | 'private'
Socket.io over HTTPS/WSS  ·  JWT in auth.token handshake
ChatServer (Node.js)
SecurityManager + IntrusionDetectionSystem IP ban · Connection rate limits · Auth middleware · XSS filter · Socket.io handshake (fires before connection)
ChatController
message.handler.ts → commands/detect.ts → system-prompt.ts (inject Neurons) → ai-stream.handler.ts (Gemini 2.5 Flash)
room.handlerjoin/leave/history
voice.handlerWebRTC + STT
TranslationServiceGemini 2.0 Flash fan-out
cortezRoomKeysMap<roomId, CryptoKey>
ChatManager
UserManager
RoomManager
AuthManager
SecurityManager
↓                        ↓
dev.db (main app)
User
Room
RoomMessage
Neuron
PromptLog
RoomKeyBundle
BannedIP
Feedback
world.db (world chat)
WorldMessageno roomId

3.2 Message Routing Pipeline

Every incoming message passes through a multi-stage decision tree before a response is emitted: security checks → rate limiting → optional E2EE decryption → token budget check → command detection → Neuron-augmented AI streaming → optional re-encryption and translation fan-out.

Figure 3 — Message Routing Pipeline
SecurityManager
Auth middleware
Banned IP / Invalid JWT → Rejected, no response
Authenticated connection proceeds
ChatManager
XSS · rate limit · dedup
Rate limit exceeded → Notice to user
Duplicate detected → Silently dropped
Clean message passes through
E2EE enabled?
Yes → Decrypt with cortezRoomKeys[roomId] (transient, in-memory only)
No → plaintext passes through as-is
Token budget
availableTokens ≥ 500?
No → Reject + upgrade prompt sent to user
Yes → continue to command detection
Command detection
commands/detect.ts
search!Tavily Search API
browse!Tavily Scrape API
learn!Multi-step Tavily
code!E2B Sandbox
imageOpenRouter
URLAuto Preview
↓ No command match
Neurons + system prompt
system-prompt.ts
Room Neurons injected into AI system prompt alongside sliding window of recent messages
Inference routing
WebLLM (offline / privacy)On-device · WebGPU · zero API cost
Gemini 2.5 Flash (cloud)via OpenRouter · streaming axios POST
Post-process & emit
E2EE re-encrypt (if room is E2EE-enabled)
Translation fan-out via TranslationService (only to users with translator mode on)
PromptLog write (thoughtProcess, toolsUsed, commandUsed, metadata)
Emit ai_response to room Socket.io channel

3.3 World Chat vs. Private Rooms: Dual-Database Isolation

GroupGPT enforces strict isolation at every layer. World chat (world.dbWorldMessage, no roomId) and private rooms (dev.dbRoomMessage, indexed by (roomId, timestamp)) are backed by completely separate SQLite files. The critical invariant is that session.roomId is mutated in-place on join_room; creating a new session object was the historical root cause of world-chat history bleeding into private rooms.

AspectWorld ChatPrivate Room
EncryptionNoneE2EE optional
Knowledge memoryNoneNeurons injected
History exportNoYes
In-memory cacheworldHistory[]roomHistory Map<id,Msg[]>
Persistence functionpersistWorldMessage()persistRoomMessage()
Database tableWorldMessage (world.db)RoomMessage (dev.db)

4 · Neuron / Brain System

4.1 Motivation and Design

Prior group chat AI systems use flat chat history as their only source of context. This is expensive (token cost grows linearly with history length), fragile (key facts scroll out of context), and nonspecific (the AI must rediscover room norms from history on every session). GroupGPT's Neuron system solves all three by treating room context the way Neuron Surgery [13] treats model cognition: distinguishing raw knowledge from filtered experience from applied wisdom. Flat chat history is raw knowledge — a growing ledger the AI must re-parse on every request. Neurons are curated wisdom: room members act as human mentors, explicitly shaping what the AI should know, how it should behave, and what constraints apply, rather than leaving it to infer context from noise.

Room members curate a graph of typed knowledge nodes, stored in dev.db and injected into the AI's system prompt alongside a sliding window of recent messages:

TypePurposeExample
skillBehavioral instruction for AI"Write Python 3.12 code with full type annotations"
knowledgeFactual context about the project"Our API uses OAuth 2.0 with PKCE, not Basic Auth"
processTeam workflow rules"Always check test coverage ≥ 80% before approving PRs"
memoryPersistent historical facts"Migrated from MySQL to Postgres in Q1 2025"

4.2 Token Efficiency

Figure 6 compares token usage across three context strategies for an active room with 90 days of history.

Figure 6 — Average Tokens per AI Request by Strategy
Full History Injection
4,200
Sliding Window Only
1,800
GroupGPT (Neurons + Window)
1,400

3× reduction vs. full history injection. Bounded and preserves room-specific context.

Figure 15 — Annual Token Projection by Group Activity
500 msg/day
Saves $420 / yr
$630 → $210 / yr
1,500 msg/day
Saves $1,260 / yr
$1,890 → $630 / yr
3,000 msg/day
Saves $2,520 / yr
$3,780 → $1,260 / yr

At $0.30 / 1M tokens (Gemini Flash). Full-history baseline vs. GroupGPT Neurons + sliding window.

4.3 Neuron Portability

Neurons can be exported as .neurons files and imported into other rooms, enabling teams to share AI configuration across projects. The frontend provides a canvas-based graph visualization (NeuronPanel.tsx) for visual editing.

5 · End-to-End Encryption with AI Participation

5.1 Cryptographic Protocol

GroupGPT implements E2EE where the AI assistant participates as a full cryptographic peer. The scheme uses ECDH P-256 key agreement to derive per-room shared secrets, then wraps an AES-256-GCM room key for every participant — including Cortez (the AI), whose ECDH keypair is loaded at server start from the CORTEZ_PRIVATE_KEY_JWK environment variable.

The key exchange sequence:

  1. Each user generates a permanent ECDH keypair stored in localStorage and uploads the public key to /api/e2ee/public-key.
  2. The room creator generates an AES-256-GCM room key and ECDH-wraps it for every member and Cortez, storing all bundles at /api/e2ee/room-key.
  3. On join, each user unwraps their bundle to recover the room key.
  4. The server unwraps Cortez's bundle at ChatController initialization and caches it in cortezRoomKeys: Map<roomId, CryptoKey>.
  5. All subsequent messages are AES-GCM encrypted; the server decrypts transiently in memory for Cortez, never writing plaintext to the database.
Figure 7 — E2EE Key Exchange Sequence Diagram
# User A (creator) User B (member) Cortez AI Server
1 Generate permanent ECDH P-256 keypair
stored in localStorage POST /api/e2ee/public-key
(same — on own device) Store public keys by userId dev.db · no private key ever leaves device
2 Generate AES-256-GCM room key
ECDH-wrap for each member + Cortez
POST /api/e2ee/room-key (all bundles)
Store wrapped key bundles
RoomKeyBundle table · ciphertext only
3 On room join: GET own bundle
ECDH-unwrap with local private key
AES-GCM room key recovered in browser
Return User B's wrapped bundle
4 At server start: ECDH keypair loaded from CORTEZ_PRIVATE_KEY_JWK
Unwrap Cortez bundle cache in
cortezRoomKeys[roomId]
Provide Cortez bundle on init
5 Encrypt message with AES-GCM room key
{ encrypted: true, iv, ciphertext } Socket.io emit
Receive encrypted payload
Decrypt in browser with room key plaintext never leaves device
Server decrypts transiently for Cortez
using cortezRoomKeys[roomId]
plaintext in memory only — never written to DB
Persist ciphertext + IV only
dev.db · RoomMessage row
6 Receive Cortez response (encrypted)
Decrypt in browser with room key
Same — decrypt Cortez reply Generate AI reply → re-encrypt with room key
emit encrypted response to room
Relay encrypted AI response
Persist ciphertext only

At no point does the server hold an unencrypted message at rest. Cortez participates as a cryptographic peer — its plaintext window exists only transiently in process memory during inference.

5.2 Security Properties

Figure 8 — E2EE Threat Model and Mitigations
ThreatMitigation
Database breach (dev.db leaked) All message content is AES-GCM ciphertext. DB contains only ciphertext + IVs. Plaintext is unrecoverable without the room key.
Platform operator reads messages Server never holds room key in plaintext at rest. Cortez's copy is server-side but messages are only decrypted transiently in memory, never written back.
Man-in-the-middle on socket WSS transport + JWT auth. Ciphertext provides a second layer of protection even if transport is compromised.
Rogue room member Room key rotation re-wraps only for current members. Removed members lose access to future messages.
AI backdoor / privileged path Cortez is a cryptographic peer. No privileged plaintext path exists — uses the same ECDH protocol as human users.
Figure 17 — E2EE Latency Overhead
OperationMedianp95Impact
Message encrypt (AES-GCM)<2 ms<3 msImperceptible
Message decrypt (AES-GCM)<2 ms<3 msImperceptible
Key unwrap (ECDH, 1 member)~8 ms~15 msOne-time on join
Room key distribution — 5 members~15 ms~25 msOne-time on create
Room key distribution — 10 members~28 ms~45 msOne-time on create
Room key distribution — 20 members~50 ms~80 msOne-time on create

Encryption adds <4 ms of per-message overhead. Key operations are one-time costs. User-perceptible latency impact: effectively zero.

6 · Multimodal Command Ecosystem

GroupGPT provides a natural-language ! command interface that extends AI capabilities with live data and execution tools. Command detection in commands/detect.ts runs before the LLM streaming path, bypassing the AI entirely for commands that don't require it.

Figure 9 — Command Ecosystem: Routing, Tools, and Latency
CommandTriggersBackend ServiceMedian Latency
search! <query> Explicit prefix Tavily Search API → ranked links + snippets ~1.1 s
browse! <url> Explicit prefix Tavily Scrape API → inline preview card ~1.4 s
learn! <topic> Explicit prefix Multi-step Tavily pipeline → long-form synthesis ~4.8 s
code! <description> Explicit prefix E2B Cloud Sandbox → scaffold + build + zip download ~12 s
Image intent detectImageCommand() OpenRouter image model → inline in chat ~6.0 s
URL in message detectUrlsInMessage() Passive Tavily preview — no user syntax needed ~1.3 s
report! / suggest! Explicit prefix Writes to Feedback table in dev.db <0.1 s

7 · Voice Chat and Speech-to-Text Pipeline

7.1 WebRTC Peer-to-Peer Architecture

GroupGPT supports real-time P2P audio calls within any room. The server acts as a pure signaling relay — no audio traverses the server during live calls. ICE negotiation uses Google STUN and openrelay.metered.ca TURN servers. Clients exchange voice_offer, voice_answer, and voice_ice_candidate Socket.io events, then establish a direct WebRTC audio track.

7.2 Voice Message → AI Pipeline

Beyond live calls, users can send voice messages that route through the full AI pipeline:

  1. Browser records audio and emits a voice_message Socket.io event with a raw buffer + MIME type.
  2. Server converts the buffer to WAV via ffmpeg-static.
  3. WAV is sent to OpenAI Whisper API → returns transcript text.
  4. Transcript is routed through the standard message.handler.ts pipeline — including command detection and AI streaming.

A voice message saying "search! latest Node.js release" will trigger the Tavily search command and return results to the entire group, just as a typed message would.

8 · Hybrid Edge-Cloud Inference

8.1 Inference Routing

GroupGPT supports two inference backends, selected transparently based on availability and user preference:

Cloud Path (default)On-Device Path (WebLLM)
EngineGemini 2.5 Flash via OpenRouter@mlc-ai/web-llm (WebGPU)
When usedDefault for all connected usersOffline, or user enables privacy-first mode
StreamingYesYes (in-browser)
Data leaves browserYes (to OpenRouter)No — fully on-device
API costPer-user billing via token budgetZero
First token latency~0.8 sModel + VRAM dependent
ConstraintRequires internet connectionVRAM and model size limited

8.2 Multilingual Translation Fan-Out

A background TranslationService (Gemini 2.0 Flash) fans translated message copies only to users who have translator mode enabled and whose configured language differs from the message language. A 10-person English-only room incurs zero translation calls per message. A mixed 10-person room with 3 users needing translation incurs 3 calls — never 10.

9 · Security Subsystem

GroupGPT employs a five-layer concentric security model from network edge to AI response.

Figure 12 — Layered Security Model
1
Network / Transport BannedIP table, connection rate limits via IntrusionDetectionSystem, HTTPS/WSS transport. Monitors connection bursts, auth failure rates, and message flooding — triggers auto-ban.
2
Socket Authentication JWT validated in Socket.io handshake middleware, fires before the connection event. Invalid tokens are rejected at transport layer and never reach application code.
3
Message Sanitization ChatManager applies XSS filtering to all incoming message content, duplicate detection (prevents replay floods), and per-user rate limiting.
4
Token Budget Enforcement availableTokens >= 500 check before every AI call. Subscription tier controls per-period limits. Pay-as-you-go pool for over-limit users.
5
End-to-End Encryption (private rooms) ECDH P-256 + AES-256-GCM. Server stores ciphertext only — no plaintext at rest. Admin routes gate on a separate x-admin-secret header.

10 · Authentication and Subscription

10.1 Authentication Paths

Three authentication paths converge onto JWT issuance by AuthManager: email/password (bcrypt), Google OAuth via Passport.js (/auth/callback), and Firebase (frontend Google sign-in fallback). All paths result in a JWT that is validated on Socket.io handshake.

10.2 Token Budget Model

TierMonthly Token LimitReset Cadence
FreeLow (entry)Monthly
StarterMediumMonthly
ProHighMonthly
TeamVery highMonthly
Pay-as-you-goUnlimited (billed per use)Per-use

Before each AI call: check availableTokens >= 500. After: UserManager.addTokenUsage(tokensUsed). Stripe manages billing; webhooks update UserManager state.

11 · Evaluation

11.1 Response Quality (N = 300)

AI responses were evaluated on four dimensions using LLM-as-judge methodology [12] on a stratified sample of 300 responses across 50 private room sessions spanning engineering, creative writing, and support use cases.

4.90
Fluency
4.79
Coherence
4.68
Groundedness
4.46
Helpfulness
Figure 14 — Score Distribution by Dimension (% at each Likert score)
Fluency — Score 5 (93.3%)
93.3%
Coherence — Score 5 (85.6%)
85.6%
Groundedness — Score 5 (82.0%)
82.0%
Helpfulness — Score 5 (67.7%)
67.7%

Overall average: 4.71 / 5.0. Comparable to Shen et al. [7]: 4.72 / 5.0.

11.2 Command Latency Profile

Figure 16 — Median Response Latency by Command Type
report! / suggest!
<0.1 s
AI stream (first token)
0.8 s
search!
1.1 s
auto URL preview
1.3 s
browse!
1.4 s
learn!
4.8 s
image generation
6.0 s
code!
12.0 s

12 · Discussion

12.1 Comparison with Shen et al. (2026)

Figure 18 — GroupGPT Platform vs. Shen et al. Framework [7]
DimensionShen et al. 2026GroupGPT (ours)
Primary focusIntervention timingFull production stack
Privacy modelPII rewritingCryptographic E2EE
Server sees plaintextYes (sanitized)No (never)
AI reads E2EE messagesNoYes (ECDH peer)
Memory systemSliding windowTyped Neuron graph
Memory persistenceSession-scopedPermanent (DB-backed)
Memory curationAutomaticUser-curated
On-device inferenceClassifier onlyFull WebLLM
Command toolsNone7 ! commands
Voice + STTYes (caption model)Yes (Whisper)
Image generationInput onlyInput + generation
Code executionNoYes (E2B sandbox)
Multi-DB isolationNoYes
BenchmarkMUIR (2,500 samples)Live user traffic
Deployment statusResearch prototypeLive at groupgpt.tech since April 4, 2025

These systems are complementary. MUIR is a valuable benchmark; future work will evaluate GroupGPT's intervention timing on MUIR to enable direct comparison.

12.2 Neuron System vs. Automatic Memory Extraction

Automatic memory extraction (MemGPT [8], RAG [9]) requires no user effort but introduces risk: automatically extracted "memories" can silently include sensitive or incorrect information, and the AI's knowledge base grows opaque over time. GroupGPT's Neurons require explicit curation — a higher bar — but give room members full authorship over what the AI knows. Neuron Surgery [13] frames this as the difference between knowledge and wisdom: an AI with automatically harvested facts may be well-informed but still miss the mark, while one whose context has been deliberately shaped by the people it serves is far more likely to produce outputs that feel right to them. The visual Neuron graph editor in NeuronPanel.tsx lowers the curation effort substantially, and the .neurons export format enables reuse across rooms.

12.3 Limitations and Future Work

Figure 19 — Current Limitations and Planned Mitigations
LimitationStatus / Planned Mitigation
SQLite scales to moderate load PostgreSQL migration path identified; schema is Prisma-abstracted for straightforward swap.
WebLLM constrained by GPU memory Model selection guided by VRAM detection at init. Smaller quantized models available as fallback.
Neurons require manual curation Planned: background summarizer suggests Neurons from chat history for user review.
Dynamic room member addition requires key re-wrap Currently handled by room creator. Full key re-distribution protocol in development.
No MUIR benchmark evaluation Planned: evaluate GroupGPT's intervention pipeline on MUIR for direct comparison with Shen et al. [7].

13 · Conclusion

GroupGPT is a production-grade AI group chat platform that advances the state of the art across five dimensions simultaneously: structured persistent knowledge memory (Neurons), cryptographic end-to-end encryption with AI participation, hybrid edge-cloud inference for privacy and offline support, a rich multimodal ! command ecosystem spanning voice, code, research, and image generation, and a dual-database architecture that enforces strict isolation between public and private contexts. Having served paying users since April 4, 2025, the platform demonstrates that these five capabilities can coexist in a single maintainable production system — not merely as research prototypes — and remains freely accessible at groupgpt.tech.

References

  1. OpenAI. GPT-4 Technical Report. arXiv:2303.08774, 2023.
  2. Bai, Y., et al. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073, 2022.
  3. Comanici, G., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv:2507.06261, 2025.
  4. Mao, M., et al. Multi-user chat assistant (MUCA): a framework using LLMs to facilitate group conversations. arXiv:2401.04883, 2024.
  5. Jacniacki, M. and Carmona Serrat, M. Humanlike Multi-user Agent (HUMA). arXiv:2511.17315, 2025.
  6. Lee, C. P., Choi, J., and Mutlu, B. MAP: Multi-user Personalization with Collaborative LLM-powered Agents. CHI Extended Abstracts, 2025.
  7. Shen, Z., et al. GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant. arXiv:2603.01059, 2026.
  8. Packer, C., et al. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560, 2023.
  9. Lewis, P., et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS, 2020.
  10. Marlinspike, M. and Perrin, T. The Signal Protocol: Double Ratchet Algorithm. Signal Foundation, 2016 (Revision 4, November 2025).
  11. MLC AI. WebLLM: High-Performance In-Browser LLM Inference. GitHub: mlc-ai/web-llm, 2024.
  12. Liu, Y., et al. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. EMNLP, 2023.
  13. Madison III, W. Neuron Surgery: Sculpting Smarter SLMs Through Task-Based Experience, Human-Guided Introspection, and Acceptability Mapping. Cortex Research Group Blog, March 27, 2025. cortexresearch.group/blog.