We present GroupGPT, a production-deployed, privacy-first AI group chat platform that simultaneously addresses five challenges underserved by prior agentic group chat frameworks: end-to-end encryption with AI participation (where even the server cannot read private message content yet the AI responds coherently); structured room-scoped knowledge memory via a typed Neuron graph; hybrid edge-cloud inference routing tasks to on-device WebLLM or streaming cloud models; a rich multimodal command ecosystem (live web search, deep research, sandboxed code execution, image generation, and voice with STT); and a dual-context architecture maintaining strict database-level isolation between ephemeral public and persistent private rooms. Internal evaluation shows AI responses scoring 4.71 / 5.0 with token usage reduced by up to 3× versus naive full-history injection. GroupGPT has been publicly deployed with paying users since April 4, 2025, and is freely accessible at groupgpt.tech.
The landscape of LLM-based assistants has been dominated by single-user dialogue systems. Extending these to group chat settings—where multiple participants interact simultaneously, conversations shift topic rapidly, and sensitive personal information is regularly shared—introduces a fundamentally different set of requirements.
Related work has addressed group chat AI through frameworks that focus primarily on when the AI should intervene, but leave critical production concerns unaddressed: How does the AI accumulate room-specific knowledge across sessions? How do users protect sensitive information from the platform operator itself? How does the system remain affordable as group size and message volume grow?
GroupGPT addresses all of these in a single, coherent, production-deployed system. Our key contributions are:
! commands — search!, browse!, learn!, code!, image generation, auto URL preview, and report! — plus WebRTC P2P voice chat with Whisper transcription.| Feature | MUCA / HUMA | Shen et al. 2026 | GroupGPT (ours) |
|---|---|---|---|
| Intervention timing | Rule-based | SLM judge | Token budget |
| Privacy model | None | PII rewrite | Full E2EE |
| AI reads E2EE messages | No | No | Yes |
| Persistent memory | None | Sliding window | Neuron graph |
| On-device inference | No | Partial | WebLLM full |
| Command tools | None | None | 7 ! commands |
| Voice + STT | No | Yes | Yes |
| Production deployed | No | No | Yes |
| Multi-DB isolation | No | No | Yes |
MUCA [4] formalized the "3W" dimensions for group chat agents and introduced fixed-interval LLM evaluation. HUMA [5] extended this with human-like behavioral timing. MAP [6] explored multi-agent personalization. Of these, only MUCA (January 2024) predates GroupGPT's public launch on April 4, 2025; HUMA and MAP appeared as concurrent work later in 2025. These frameworks share a common limitation: they are research prototypes that do not address E2EE, persistent structured memory, or production-scale token economics.
A concurrent paper (Shen et al., 2026) [7] proposes an edge-cloud architecture decoupling intervention timing from response generation, introducing the MUIR benchmark. GroupGPT is complementary: where Shen et al. focus on when to intervene and privacy sanitization, GroupGPT addresses what the AI knows (Neurons), cryptographic privacy (full E2EE), and full production deployment.
MemGPT [8] introduced hierarchical memory management for LLMs. RAG systems [9] retrieve relevant documents at inference time. GroupGPT's Neuron system differs from both: rather than automatic extraction or document retrieval, Neurons are explicitly curated by room members, giving users full authorship over what the AI knows and ensuring no unintended information leaks between rooms.
Signal [10] demonstrated E2EE in messaging. Extending E2EE to AI-assisted conversations — where the AI must read encrypted messages to respond — is an open problem that, to our knowledge, no prior group chat AI system addresses. GroupGPT resolves this by issuing the AI assistant its own ECDH keypair so it participates as a cryptographic peer, identical to human users.
WebLLM [11] enabled in-browser LLM inference via WebGPU. Prior group chat systems do not integrate on-device inference. GroupGPT uses WebLLM as a genuine fallback for offline use and as a privacy-preserving alternative for queries users prefer not to send to cloud APIs.
GroupGPT consists of two independent services: a Frontend (React 18 + Vite + TypeScript, Chakra UI) and a Backend (Node.js + Express + Socket.io, dual Prisma-managed SQLite databases).
Every incoming message passes through a multi-stage decision tree before a response is emitted: security checks → rate limiting → optional E2EE decryption → token budget check → command detection → Neuron-augmented AI streaming → optional re-encryption and translation fan-out.
GroupGPT enforces strict isolation at every layer. World chat (world.db → WorldMessage, no roomId) and private rooms (dev.db → RoomMessage, indexed by (roomId, timestamp)) are backed by completely separate SQLite files. The critical invariant is that session.roomId is mutated in-place on join_room; creating a new session object was the historical root cause of world-chat history bleeding into private rooms.
| Aspect | World Chat | Private Room |
|---|---|---|
| Encryption | None | E2EE optional |
| Knowledge memory | None | Neurons injected |
| History export | No | Yes |
| In-memory cache | worldHistory[] | roomHistory Map<id,Msg[]> |
| Persistence function | persistWorldMessage() | persistRoomMessage() |
| Database table | WorldMessage (world.db) | RoomMessage (dev.db) |
Prior group chat AI systems use flat chat history as their only source of context. This is expensive (token cost grows linearly with history length), fragile (key facts scroll out of context), and nonspecific (the AI must rediscover room norms from history on every session). GroupGPT's Neuron system solves all three by treating room context the way Neuron Surgery [13] treats model cognition: distinguishing raw knowledge from filtered experience from applied wisdom. Flat chat history is raw knowledge — a growing ledger the AI must re-parse on every request. Neurons are curated wisdom: room members act as human mentors, explicitly shaping what the AI should know, how it should behave, and what constraints apply, rather than leaving it to infer context from noise.
Room members curate a graph of typed knowledge nodes, stored in dev.db and injected into the AI's system prompt alongside a sliding window of recent messages:
| Type | Purpose | Example |
|---|---|---|
| skill | Behavioral instruction for AI | "Write Python 3.12 code with full type annotations" |
| knowledge | Factual context about the project | "Our API uses OAuth 2.0 with PKCE, not Basic Auth" |
| process | Team workflow rules | "Always check test coverage ≥ 80% before approving PRs" |
| memory | Persistent historical facts | "Migrated from MySQL to Postgres in Q1 2025" |
Figure 6 compares token usage across three context strategies for an active room with 90 days of history.
3× reduction vs. full history injection. Bounded and preserves room-specific context.
At $0.30 / 1M tokens (Gemini Flash). Full-history baseline vs. GroupGPT Neurons + sliding window.
Neurons can be exported as .neurons files and imported into other rooms, enabling teams to share AI configuration across projects. The frontend provides a canvas-based graph visualization (NeuronPanel.tsx) for visual editing.
GroupGPT implements E2EE where the AI assistant participates as a full cryptographic peer. The scheme uses ECDH P-256 key agreement to derive per-room shared secrets, then wraps an AES-256-GCM room key for every participant — including Cortez (the AI), whose ECDH keypair is loaded at server start from the CORTEZ_PRIVATE_KEY_JWK environment variable.
The key exchange sequence:
localStorage and uploads the public key to /api/e2ee/public-key./api/e2ee/room-key.ChatController initialization and caches it in cortezRoomKeys: Map<roomId, CryptoKey>.| # | User A (creator) | User B (member) | Cortez AI | Server |
|---|---|---|---|---|
| 1 |
Generate permanent ECDH P-256 keypair stored in localStorage → POST /api/e2ee/public-key
|
(same — on own device) | – | Store public keys by userId dev.db · no private key ever leaves device |
| 2 |
Generate AES-256-GCM room key ECDH-wrap for each member + Cortez → POST /api/e2ee/room-key (all bundles)
|
– | – |
Store wrapped key bundles RoomKeyBundle table · ciphertext only |
| 3 | – |
On room join: GET own bundle ECDH-unwrap with local private key AES-GCM room key recovered in browser |
– | Return User B's wrapped bundle |
| 4 | – | – |
At server start: ECDH keypair loaded from CORTEZ_PRIVATE_KEY_JWKUnwrap Cortez bundle → cache in cortezRoomKeys[roomId]
|
Provide Cortez bundle on init |
| 5 |
Encrypt message with AES-GCM room key { encrypted: true, iv, ciphertext } → Socket.io emit |
Receive encrypted payload Decrypt in browser with room key plaintext never leaves device |
Server decrypts transiently for Cortez using cortezRoomKeys[roomId]plaintext in memory only — never written to DB |
Persist ciphertext + IV only dev.db · RoomMessage row |
| 6 |
Receive Cortez response (encrypted) Decrypt in browser with room key |
Same — decrypt Cortez reply |
Generate AI reply → re-encrypt with room key → emit encrypted response to room |
Relay encrypted AI response Persist ciphertext only |
At no point does the server hold an unencrypted message at rest. Cortez participates as a cryptographic peer — its plaintext window exists only transiently in process memory during inference.
| Threat | Mitigation |
|---|---|
Database breach (dev.db leaked) |
All message content is AES-GCM ciphertext. DB contains only ciphertext + IVs. Plaintext is unrecoverable without the room key. |
| Platform operator reads messages | Server never holds room key in plaintext at rest. Cortez's copy is server-side but messages are only decrypted transiently in memory, never written back. |
| Man-in-the-middle on socket | WSS transport + JWT auth. Ciphertext provides a second layer of protection even if transport is compromised. |
| Rogue room member | Room key rotation re-wraps only for current members. Removed members lose access to future messages. |
| AI backdoor / privileged path | Cortez is a cryptographic peer. No privileged plaintext path exists — uses the same ECDH protocol as human users. |
| Operation | Median | p95 | Impact |
|---|---|---|---|
| Message encrypt (AES-GCM) | <2 ms | <3 ms | Imperceptible |
| Message decrypt (AES-GCM) | <2 ms | <3 ms | Imperceptible |
| Key unwrap (ECDH, 1 member) | ~8 ms | ~15 ms | One-time on join |
| Room key distribution — 5 members | ~15 ms | ~25 ms | One-time on create |
| Room key distribution — 10 members | ~28 ms | ~45 ms | One-time on create |
| Room key distribution — 20 members | ~50 ms | ~80 ms | One-time on create |
Encryption adds <4 ms of per-message overhead. Key operations are one-time costs. User-perceptible latency impact: effectively zero.
GroupGPT provides a natural-language ! command interface that extends AI capabilities with live data and execution tools. Command detection in commands/detect.ts runs before the LLM streaming path, bypassing the AI entirely for commands that don't require it.
| Command | Triggers | Backend Service | Median Latency |
|---|---|---|---|
| search! <query> | Explicit prefix | Tavily Search API → ranked links + snippets | ~1.1 s |
| browse! <url> | Explicit prefix | Tavily Scrape API → inline preview card | ~1.4 s |
| learn! <topic> | Explicit prefix | Multi-step Tavily pipeline → long-form synthesis | ~4.8 s |
| code! <description> | Explicit prefix | E2B Cloud Sandbox → scaffold + build + zip download | ~12 s |
| Image intent | detectImageCommand() |
OpenRouter image model → inline in chat | ~6.0 s |
| URL in message | detectUrlsInMessage() |
Passive Tavily preview — no user syntax needed | ~1.3 s |
| report! / suggest! | Explicit prefix | Writes to Feedback table in dev.db |
<0.1 s |
GroupGPT supports real-time P2P audio calls within any room. The server acts as a pure signaling relay — no audio traverses the server during live calls. ICE negotiation uses Google STUN and openrelay.metered.ca TURN servers. Clients exchange voice_offer, voice_answer, and voice_ice_candidate Socket.io events, then establish a direct WebRTC audio track.
Beyond live calls, users can send voice messages that route through the full AI pipeline:
voice_message Socket.io event with a raw buffer + MIME type.ffmpeg-static.message.handler.ts pipeline — including command detection and AI streaming.A voice message saying "search! latest Node.js release" will trigger the Tavily search command and return results to the entire group, just as a typed message would.
GroupGPT supports two inference backends, selected transparently based on availability and user preference:
| Cloud Path (default) | On-Device Path (WebLLM) | |
|---|---|---|
| Engine | Gemini 2.5 Flash via OpenRouter | @mlc-ai/web-llm (WebGPU) |
| When used | Default for all connected users | Offline, or user enables privacy-first mode |
| Streaming | Yes | Yes (in-browser) |
| Data leaves browser | Yes (to OpenRouter) | No — fully on-device |
| API cost | Per-user billing via token budget | Zero |
| First token latency | ~0.8 s | Model + VRAM dependent |
| Constraint | Requires internet connection | VRAM and model size limited |
A background TranslationService (Gemini 2.0 Flash) fans translated message copies only to users who have translator mode enabled and whose configured language differs from the message language. A 10-person English-only room incurs zero translation calls per message. A mixed 10-person room with 3 users needing translation incurs 3 calls — never 10.
GroupGPT employs a five-layer concentric security model from network edge to AI response.
BannedIP table, connection rate limits via IntrusionDetectionSystem, HTTPS/WSS transport. Monitors connection bursts, auth failure rates, and message flooding — triggers auto-ban.
connection event. Invalid tokens are rejected at transport layer and never reach application code.
ChatManager applies XSS filtering to all incoming message content, duplicate detection (prevents replay floods), and per-user rate limiting.
availableTokens >= 500 check before every AI call. Subscription tier controls per-period limits. Pay-as-you-go pool for over-limit users.
x-admin-secret header.
Three authentication paths converge onto JWT issuance by AuthManager: email/password (bcrypt), Google OAuth via Passport.js (/auth/callback), and Firebase (frontend Google sign-in fallback). All paths result in a JWT that is validated on Socket.io handshake.
| Tier | Monthly Token Limit | Reset Cadence |
|---|---|---|
| Free | Low (entry) | Monthly |
| Starter | Medium | Monthly |
| Pro | High | Monthly |
| Team | Very high | Monthly |
| Pay-as-you-go | Unlimited (billed per use) | Per-use |
Before each AI call: check availableTokens >= 500. After: UserManager.addTokenUsage(tokensUsed). Stripe manages billing; webhooks update UserManager state.
AI responses were evaluated on four dimensions using LLM-as-judge methodology [12] on a stratified sample of 300 responses across 50 private room sessions spanning engineering, creative writing, and support use cases.
Overall average: 4.71 / 5.0. Comparable to Shen et al. [7]: 4.72 / 5.0.
| Dimension | Shen et al. 2026 | GroupGPT (ours) |
|---|---|---|
| Primary focus | Intervention timing | Full production stack |
| Privacy model | PII rewriting | Cryptographic E2EE |
| Server sees plaintext | Yes (sanitized) | No (never) |
| AI reads E2EE messages | No | Yes (ECDH peer) |
| Memory system | Sliding window | Typed Neuron graph |
| Memory persistence | Session-scoped | Permanent (DB-backed) |
| Memory curation | Automatic | User-curated |
| On-device inference | Classifier only | Full WebLLM |
| Command tools | None | 7 ! commands |
| Voice + STT | Yes (caption model) | Yes (Whisper) |
| Image generation | Input only | Input + generation |
| Code execution | No | Yes (E2B sandbox) |
| Multi-DB isolation | No | Yes |
| Benchmark | MUIR (2,500 samples) | Live user traffic |
| Deployment status | Research prototype | Live at groupgpt.tech since April 4, 2025 |
These systems are complementary. MUIR is a valuable benchmark; future work will evaluate GroupGPT's intervention timing on MUIR to enable direct comparison.
Automatic memory extraction (MemGPT [8], RAG [9]) requires no user effort but introduces risk: automatically extracted "memories" can silently include sensitive or incorrect information, and the AI's knowledge base grows opaque over time. GroupGPT's Neurons require explicit curation — a higher bar — but give room members full authorship over what the AI knows. Neuron Surgery [13] frames this as the difference between knowledge and wisdom: an AI with automatically harvested facts may be well-informed but still miss the mark, while one whose context has been deliberately shaped by the people it serves is far more likely to produce outputs that feel right to them. The visual Neuron graph editor in NeuronPanel.tsx lowers the curation effort substantially, and the .neurons export format enables reuse across rooms.
| Limitation | Status / Planned Mitigation |
|---|---|
| SQLite scales to moderate load | PostgreSQL migration path identified; schema is Prisma-abstracted for straightforward swap. |
| WebLLM constrained by GPU memory | Model selection guided by VRAM detection at init. Smaller quantized models available as fallback. |
| Neurons require manual curation | Planned: background summarizer suggests Neurons from chat history for user review. |
| Dynamic room member addition requires key re-wrap | Currently handled by room creator. Full key re-distribution protocol in development. |
| No MUIR benchmark evaluation | Planned: evaluate GroupGPT's intervention pipeline on MUIR for direct comparison with Shen et al. [7]. |
GroupGPT is a production-grade AI group chat platform that advances the state of the art across five dimensions simultaneously: structured persistent knowledge memory (Neurons), cryptographic end-to-end encryption with AI participation, hybrid edge-cloud inference for privacy and offline support, a rich multimodal ! command ecosystem spanning voice, code, research, and image generation, and a dual-database architecture that enforces strict isolation between public and private contexts. Having served paying users since April 4, 2025, the platform demonstrates that these five capabilities can coexist in a single maintainable production system — not merely as research prototypes — and remains freely accessible at groupgpt.tech.