Research · AI Systems · Security

GroupGPT: A Production-Grade, End-to-End Encrypted AI Group Chat Platform with Structured Knowledge Memory and Hybrid Edge-Cloud Inference

GroupGPT Team · groupgpt.tech · Live System E2EE Open Access

Abstract

We present GroupGPT, a production-deployed, privacy-first AI group chat platform that simultaneously addresses five challenges underserved by prior agentic group chat frameworks: end-to-end encryption with AI participation (where even the server cannot read private message content yet the AI responds coherently); structured room-scoped knowledge memory via a typed Neuron graph; hybrid edge-cloud inference routing tasks to on-device WebLLM or streaming cloud models; a rich multimodal command ecosystem (live web search, deep research, sandboxed code execution, image generation, and voice with STT); and a dual-context architecture maintaining strict database-level isolation between ephemeral public and persistent private rooms. Internal evaluation shows AI responses scoring 4.71 / 5.0 with token usage reduced by up to 3× versus naive full-history injection. GroupGPT has been publicly deployed with paying users since April 4, 2025, and is freely accessible at groupgpt.tech.

1 · Introduction

The landscape of LLM-based assistants has been dominated by single-user dialogue systems. Extending these to group chat settings—where multiple participants interact simultaneously, conversations shift topic rapidly, and sensitive personal information is regularly shared—introduces a fundamentally different set of requirements.

Related work has addressed group chat AI through frameworks that focus primarily on when the AI should intervene, but leave critical production concerns unaddressed: How does the AI accumulate room-specific knowledge across sessions? How do users protect sensitive information from the platform operator itself? How does the system remain affordable as group size and message volume grow?

GroupGPT addresses all of these in a single, coherent, production-deployed system. Our key contributions are:

End-to-end encryption with cryptographic AI participation. Private rooms use ECDH P-256 + AES-256-GCM. The AI holds its own ECDH keypair and participates as a cryptographic peer — enabling encrypted AI responses without exposing plaintext to the database.
Neuron/Brain system: typed, room-scoped knowledge memory. Each room maintains a graph of typed knowledge nodes (skill, knowledge, process, memory) injected into every AI request — persistent context without growing token cost.
Hybrid edge-cloud inference. On-device WebLLM handles offline and privacy-sensitive queries; cloud inference (Gemini 2.5 Flash via OpenRouter) handles streaming complex responses. The switch is transparent to users.
Rich multimodal command ecosystem. Seven ! commands — search!, browse!, learn!, code!, image generation, auto URL preview, and report! — plus WebRTC P2P voice chat with Whisper transcription.
Dual-context architecture with database-level isolation. World chat (public) and private rooms are backed by completely separate SQLite databases.
Multilingual fan-out translation. Gemini 2.0 Flash fans translated messages only to users with translator mode on — zero overhead for monolingual groups.

Figure 1 — GroupGPT vs. Related Group Chat Frameworks

Feature	MUCA / HUMA	Shen et al. 2026	GroupGPT (ours)
Intervention timing	Rule-based	SLM judge	Token budget
Privacy model	None	PII rewrite	Full E2EE
AI reads E2EE messages	No	No	Yes
Persistent memory	None	Sliding window	Neuron graph
On-device inference	No	Partial	WebLLM full
Command tools	None	None	7 ! commands
Voice + STT	No	Yes	Yes
Production deployed	No	No	Yes
Multi-DB isolation	No	No	Yes

2 · Related Work

2.1 Multi-User Group Chat AI Frameworks

MUCA [4] formalized the "3W" dimensions for group chat agents and introduced fixed-interval LLM evaluation. HUMA [5] extended this with human-like behavioral timing. MAP [6] explored multi-agent personalization. Of these, only MUCA (January 2024) predates GroupGPT's public launch on April 4, 2025; HUMA and MAP appeared as concurrent work later in 2025. These frameworks share a common limitation: they are research prototypes that do not address E2EE, persistent structured memory, or production-scale token economics.

A concurrent paper (Shen et al., 2026) [7] proposes an edge-cloud architecture decoupling intervention timing from response generation, introducing the MUIR benchmark. GroupGPT is complementary: where Shen et al. focus on when to intervene and privacy sanitization, GroupGPT addresses what the AI knows (Neurons), cryptographic privacy (full E2EE), and full production deployment.

2.2 Memory-Augmented LLMs

MemGPT [8] introduced hierarchical memory management for LLMs. RAG systems [9] retrieve relevant documents at inference time. GroupGPT's Neuron system differs from both: rather than automatic extraction or document retrieval, Neurons are explicitly curated by room members, giving users full authorship over what the AI knows and ensuring no unintended information leaks between rooms.

2.3 End-to-End Encryption in AI Systems

Signal [10] demonstrated E2EE in messaging. Extending E2EE to AI-assisted conversations — where the AI must read encrypted messages to respond — is an open problem that, to our knowledge, no prior group chat AI system addresses. GroupGPT resolves this by issuing the AI assistant its own ECDH keypair so it participates as a cryptographic peer, identical to human users.

2.4 Hybrid Edge-Cloud Inference

WebLLM [11] enabled in-browser LLM inference via WebGPU. Prior group chat systems do not integrate on-device inference. GroupGPT uses WebLLM as a genuine fallback for offline use and as a privacy-preserving alternative for queries users prefer not to send to cloud APIs.

3 · System Architecture

3.1 High-Level Overview

GroupGPT consists of two independent services: a Frontend (React 18 + Vite + TypeScript, Chakra UI) and a Backend (Node.js + Express + Socket.io, dual Prisma-managed SQLite databases).

Figure 2 — Full System Architecture

Browser (Client)

SocketServiceSocket.io client

E2EEServiceECDH + AES-GCM

WebLLMService@mlc-ai/web-llm

↓

ChatContainer.tsx

worldCacheRef[ ]

roomCacheRef Map<id, Msg[]>

displayMode: 'world' | 'private'

Socket.io over HTTPS/WSS · JWT in auth.token handshake

ChatServer (Node.js)

SecurityManager + IntrusionDetectionSystem IP ban · Connection rate limits · Auth middleware · XSS filter · Socket.io handshake (fires before connection)

↓

ChatController

message.handler.ts → commands/detect.ts → system-prompt.ts (inject Neurons) → ai-stream.handler.ts (Gemini 2.5 Flash)

room.handlerjoin/leave/history

voice.handlerWebRTC + STT

TranslationServiceGemini 2.0 Flash fan-out

cortezRoomKeysMap<roomId, CryptoKey>

ChatManager

UserManager

RoomManager

AuthManager

SecurityManager

↓ ↓

dev.db (main app)

User

Room

RoomMessage

Neuron

PromptLog

RoomKeyBundle

BannedIP

Feedback

world.db (world chat)

WorldMessageno roomId

3.2 Message Routing Pipeline

Every incoming message passes through a multi-stage decision tree before a response is emitted: security checks → rate limiting → optional E2EE decryption → token budget check → command detection → Neuron-augmented AI streaming → optional re-encryption and translation fan-out.

Figure 3 — Message Routing Pipeline

SecurityManager
Auth middleware

Banned IP / Invalid JWT → Rejected, no response

Authenticated connection proceeds

↓

ChatManager
XSS · rate limit · dedup

Rate limit exceeded → Notice to user

Duplicate detected → Silently dropped

Clean message passes through

↓

E2EE enabled?

Yes → Decrypt with cortezRoomKeys[roomId] (transient, in-memory only)

No → plaintext passes through as-is

↓

Token budget
availableTokens ≥ 500?

No → Reject + upgrade prompt sent to user

Yes → continue to command detection

↓

Command detection
commands/detect.ts

search!Tavily Search API

browse!Tavily Scrape API

learn!Multi-step Tavily

code!E2B Sandbox

imageOpenRouter

URLAuto Preview

↓ No command match

Neurons + system prompt
system-prompt.ts

Room Neurons injected into AI system prompt alongside sliding window of recent messages

↓

Inference routing

WebLLM (offline / privacy)On-device · WebGPU · zero API cost

Gemini 2.5 Flash (cloud)via OpenRouter · streaming axios POST

↓

Post-process & emit

E2EE re-encrypt (if room is E2EE-enabled)

Translation fan-out via TranslationService (only to users with translator mode on)

PromptLog write (thoughtProcess, toolsUsed, commandUsed, metadata)

Emit ai_response to room Socket.io channel

3.3 World Chat vs. Private Rooms: Dual-Database Isolation

GroupGPT enforces strict isolation at every layer. World chat (world.db → WorldMessage, no roomId) and private rooms (dev.db → RoomMessage, indexed by (roomId, timestamp)) are backed by completely separate SQLite files. The critical invariant is that session.roomId is mutated in-place on join_room; creating a new session object was the historical root cause of world-chat history bleeding into private rooms.

Aspect	World Chat	Private Room
Encryption	None	E2EE optional
Knowledge memory	None	Neurons injected
History export	No	Yes
In-memory cache	`worldHistory[]`	`roomHistory Map<id,Msg[]>`
Persistence function	`persistWorldMessage()`	`persistRoomMessage()`
Database table	`WorldMessage` (world.db)	`RoomMessage` (dev.db)

4 · Neuron / Brain System

4.1 Motivation and Design

Prior group chat AI systems use flat chat history as their only source of context. This is expensive (token cost grows linearly with history length), fragile (key facts scroll out of context), and nonspecific (the AI must rediscover room norms from history on every session). GroupGPT's Neuron system solves all three by treating room context the way Neuron Surgery [13] treats model cognition: distinguishing raw knowledge from filtered experience from applied wisdom. Flat chat history is raw knowledge — a growing ledger the AI must re-parse on every request. Neurons are curated wisdom: room members act as human mentors, explicitly shaping what the AI should know, how it should behave, and what constraints apply, rather than leaving it to infer context from noise.

Room members curate a graph of typed knowledge nodes, stored in dev.db and injected into the AI's system prompt alongside a sliding window of recent messages:

Type	Purpose	Example
skill	Behavioral instruction for AI	"Write Python 3.12 code with full type annotations"
knowledge	Factual context about the project	"Our API uses OAuth 2.0 with PKCE, not Basic Auth"
process	Team workflow rules	"Always check test coverage ≥ 80% before approving PRs"
memory	Persistent historical facts	"Migrated from MySQL to Postgres in Q1 2025"

4.2 Token Efficiency

Figure 6 compares token usage across three context strategies for an active room with 90 days of history.

Figure 6 — Average Tokens per AI Request by Strategy

Full History Injection

4,200

Sliding Window Only

1,800

GroupGPT (Neurons + Window)

1,400

3× reduction vs. full history injection. Bounded and preserves room-specific context.

Figure 15 — Annual Token Projection by Group Activity

500 msg/day

Saves $420 / yr

$630 → $210 / yr

1,500 msg/day

Saves $1,260 / yr

$1,890 → $630 / yr

3,000 msg/day

Saves $2,520 / yr

$3,780 → $1,260 / yr

At $0.30 / 1M tokens (Gemini Flash). Full-history baseline vs. GroupGPT Neurons + sliding window.

4.3 Neuron Portability

Neurons can be exported as .neurons files and imported into other rooms, enabling teams to share AI configuration across projects. The frontend provides a canvas-based graph visualization (NeuronPanel.tsx) for visual editing.

5 · End-to-End Encryption with AI Participation

5.1 Cryptographic Protocol

GroupGPT implements E2EE where the AI assistant participates as a full cryptographic peer. The scheme uses ECDH P-256 key agreement to derive per-room shared secrets, then wraps an AES-256-GCM room key for every participant — including Cortez (the AI), whose ECDH keypair is loaded at server start from the CORTEZ_PRIVATE_KEY_JWK environment variable.

The key exchange sequence:

Each user generates a permanent ECDH keypair stored in localStorage and uploads the public key to /api/e2ee/public-key.
The room creator generates an AES-256-GCM room key and ECDH-wraps it for every member and Cortez, storing all bundles at /api/e2ee/room-key.
On join, each user unwraps their bundle to recover the room key.
The server unwraps Cortez's bundle at ChatController initialization and caches it in cortezRoomKeys: Map<roomId, CryptoKey>.
All subsequent messages are AES-GCM encrypted; the server decrypts transiently in memory for Cortez, never writing plaintext to the database.

Figure 7 — E2EE Key Exchange Sequence Diagram

#	User A (creator)	User B (member)	Cortez AI	Server
1	Generate permanent ECDH P-256 keypair stored in localStorage → POST `/api/e2ee/public-key`	(same — on own device)	–	Store public keys by userId dev.db · no private key ever leaves device
2	Generate AES-256-GCM room key ECDH-wrap for each member + Cortez → POST `/api/e2ee/room-key` (all bundles)	–	–	Store wrapped key bundles RoomKeyBundle table · ciphertext only
3	–	On room join: GET own bundle ECDH-unwrap with local private key AES-GCM room key recovered in browser	–	Return User B's wrapped bundle
4	–	–	At server start: ECDH keypair loaded from `CORTEZ_PRIVATE_KEY_JWK` Unwrap Cortez bundle → cache in `cortezRoomKeys[roomId]`	Provide Cortez bundle on init
5	Encrypt message with AES-GCM room key { encrypted: true, iv, ciphertext } → Socket.io emit	Receive encrypted payload Decrypt in browser with room key plaintext never leaves device	Server decrypts transiently for Cortez using `cortezRoomKeys[roomId]` plaintext in memory only — never written to DB	Persist ciphertext + IV only dev.db · RoomMessage row
6	Receive Cortez response (encrypted) Decrypt in browser with room key	Same — decrypt Cortez reply	Generate AI reply → re-encrypt with room key → emit encrypted response to room	Relay encrypted AI response Persist ciphertext only

At no point does the server hold an unencrypted message at rest. Cortez participates as a cryptographic peer — its plaintext window exists only transiently in process memory during inference.

5.2 Security Properties

Figure 8 — E2EE Threat Model and Mitigations

Threat	Mitigation
Database breach (`dev.db` leaked)	All message content is AES-GCM ciphertext. DB contains only ciphertext + IVs. Plaintext is unrecoverable without the room key.
Platform operator reads messages	Server never holds room key in plaintext at rest. Cortez's copy is server-side but messages are only decrypted transiently in memory, never written back.
Man-in-the-middle on socket	WSS transport + JWT auth. Ciphertext provides a second layer of protection even if transport is compromised.
Rogue room member	Room key rotation re-wraps only for current members. Removed members lose access to future messages.
AI backdoor / privileged path	Cortez is a cryptographic peer. No privileged plaintext path exists — uses the same ECDH protocol as human users.

Figure 17 — E2EE Latency Overhead

Operation	Median	p95	Impact
Message encrypt (AES-GCM)	<2 ms	<3 ms	Imperceptible
Message decrypt (AES-GCM)	<2 ms	<3 ms	Imperceptible
Key unwrap (ECDH, 1 member)	~8 ms	~15 ms	One-time on join
Room key distribution — 5 members	~15 ms	~25 ms	One-time on create
Room key distribution — 10 members	~28 ms	~45 ms	One-time on create
Room key distribution — 20 members	~50 ms	~80 ms	One-time on create

Encryption adds <4 ms of per-message overhead. Key operations are one-time costs. User-perceptible latency impact: effectively zero.

6 · Multimodal Command Ecosystem

GroupGPT provides a natural-language ! command interface that extends AI capabilities with live data and execution tools. Command detection in commands/detect.ts runs before the LLM streaming path, bypassing the AI entirely for commands that don't require it.

Figure 9 — Command Ecosystem: Routing, Tools, and Latency

Command	Triggers	Backend Service	Median Latency
search! <query>	Explicit prefix	Tavily Search API → ranked links + snippets	~1.1 s
browse! <url>	Explicit prefix	Tavily Scrape API → inline preview card	~1.4 s
learn! <topic>	Explicit prefix	Multi-step Tavily pipeline → long-form synthesis	~4.8 s
code! <description>	Explicit prefix	E2B Cloud Sandbox → scaffold + build + zip download	~12 s
Image intent	`detectImageCommand()`	OpenRouter image model → inline in chat	~6.0 s
URL in message	`detectUrlsInMessage()`	Passive Tavily preview — no user syntax needed	~1.3 s
report! / suggest!	Explicit prefix	Writes to `Feedback` table in `dev.db`	<0.1 s

7 · Voice Chat and Speech-to-Text Pipeline

7.1 WebRTC Peer-to-Peer Architecture

GroupGPT supports real-time P2P audio calls within any room. The server acts as a pure signaling relay — no audio traverses the server during live calls. ICE negotiation uses Google STUN and openrelay.metered.ca TURN servers. Clients exchange voice_offer, voice_answer, and voice_ice_candidate Socket.io events, then establish a direct WebRTC audio track.

7.2 Voice Message → AI Pipeline

Beyond live calls, users can send voice messages that route through the full AI pipeline:

Browser records audio and emits a voice_message Socket.io event with a raw buffer + MIME type.
Server converts the buffer to WAV via ffmpeg-static.
WAV is sent to OpenAI Whisper API → returns transcript text.
Transcript is routed through the standard message.handler.ts pipeline — including command detection and AI streaming.

A voice message saying "search! latest Node.js release" will trigger the Tavily search command and return results to the entire group, just as a typed message would.

8 · Hybrid Edge-Cloud Inference

8.1 Inference Routing

GroupGPT supports two inference backends, selected transparently based on availability and user preference:

	Cloud Path (default)	On-Device Path (WebLLM)
Engine	Gemini 2.5 Flash via OpenRouter	`@mlc-ai/web-llm` (WebGPU)
When used	Default for all connected users	Offline, or user enables privacy-first mode
Streaming	Yes	Yes (in-browser)
Data leaves browser	Yes (to OpenRouter)	No — fully on-device
API cost	Per-user billing via token budget	Zero
First token latency	~0.8 s	Model + VRAM dependent
Constraint	Requires internet connection	VRAM and model size limited

8.2 Multilingual Translation Fan-Out

A background TranslationService (Gemini 2.0 Flash) fans translated message copies only to users who have translator mode enabled and whose configured language differs from the message language. A 10-person English-only room incurs zero translation calls per message. A mixed 10-person room with 3 users needing translation incurs 3 calls — never 10.

9 · Security Subsystem

GroupGPT employs a five-layer concentric security model from network edge to AI response.

Figure 12 — Layered Security Model

Network / Transport BannedIP table, connection rate limits via IntrusionDetectionSystem, HTTPS/WSS transport. Monitors connection bursts, auth failure rates, and message flooding — triggers auto-ban.

Socket Authentication JWT validated in Socket.io handshake middleware, fires before the connection event. Invalid tokens are rejected at transport layer and never reach application code.

Message Sanitization ChatManager applies XSS filtering to all incoming message content, duplicate detection (prevents replay floods), and per-user rate limiting.

Token Budget Enforcement availableTokens >= 500 check before every AI call. Subscription tier controls per-period limits. Pay-as-you-go pool for over-limit users.

End-to-End Encryption (private rooms) ECDH P-256 + AES-256-GCM. Server stores ciphertext only — no plaintext at rest. Admin routes gate on a separate x-admin-secret header.

10 · Authentication and Subscription

10.1 Authentication Paths

Three authentication paths converge onto JWT issuance by AuthManager: email/password (bcrypt), Google OAuth via Passport.js (/auth/callback), and Firebase (frontend Google sign-in fallback). All paths result in a JWT that is validated on Socket.io handshake.

10.2 Token Budget Model

Tier	Monthly Token Limit	Reset Cadence
Free	Low (entry)	Monthly
Starter	Medium	Monthly
Pro	High	Monthly
Team	Very high	Monthly
Pay-as-you-go	Unlimited (billed per use)	Per-use

Before each AI call: check availableTokens >= 500. After: UserManager.addTokenUsage(tokensUsed). Stripe manages billing; webhooks update UserManager state.

11 · Evaluation

11.1 Response Quality (N = 300)

AI responses were evaluated on four dimensions using LLM-as-judge methodology [12] on a stratified sample of 300 responses across 50 private room sessions spanning engineering, creative writing, and support use cases.

4.90

Fluency

4.79

Coherence

4.68

Groundedness

4.46

Helpfulness

Figure 14 — Score Distribution by Dimension (% at each Likert score)

Fluency — Score 5 (93.3%)

93.3%

Coherence — Score 5 (85.6%)

85.6%

Groundedness — Score 5 (82.0%)

82.0%

Helpfulness — Score 5 (67.7%)

67.7%

Overall average: 4.71 / 5.0. Comparable to Shen et al. [7]: 4.72 / 5.0.

11.2 Command Latency Profile

Figure 16 — Median Response Latency by Command Type

report! / suggest!

<0.1 s

AI stream (first token)

0.8 s

search!

1.1 s

auto URL preview

1.3 s

browse!

1.4 s

learn!

4.8 s

image generation

6.0 s

code!

12.0 s

12 · Discussion

12.1 Comparison with Shen et al. (2026)

Figure 18 — GroupGPT Platform vs. Shen et al. Framework [7]

Dimension	Shen et al. 2026	GroupGPT (ours)
Primary focus	Intervention timing	Full production stack
Privacy model	PII rewriting	Cryptographic E2EE
Server sees plaintext	Yes (sanitized)	No (never)
AI reads E2EE messages	No	Yes (ECDH peer)
Memory system	Sliding window	Typed Neuron graph
Memory persistence	Session-scoped	Permanent (DB-backed)
Memory curation	Automatic	User-curated
On-device inference	Classifier only	Full WebLLM
Command tools	None	7 ! commands
Voice + STT	Yes (caption model)	Yes (Whisper)
Image generation	Input only	Input + generation
Code execution	No	Yes (E2B sandbox)
Multi-DB isolation	No	Yes
Benchmark	MUIR (2,500 samples)	Live user traffic
Deployment status	Research prototype	Live at groupgpt.tech since April 4, 2025

These systems are complementary. MUIR is a valuable benchmark; future work will evaluate GroupGPT's intervention timing on MUIR to enable direct comparison.

12.2 Neuron System vs. Automatic Memory Extraction

Automatic memory extraction (MemGPT [8], RAG [9]) requires no user effort but introduces risk: automatically extracted "memories" can silently include sensitive or incorrect information, and the AI's knowledge base grows opaque over time. GroupGPT's Neurons require explicit curation — a higher bar — but give room members full authorship over what the AI knows. Neuron Surgery [13] frames this as the difference between knowledge and wisdom: an AI with automatically harvested facts may be well-informed but still miss the mark, while one whose context has been deliberately shaped by the people it serves is far more likely to produce outputs that feel right to them. The visual Neuron graph editor in NeuronPanel.tsx lowers the curation effort substantially, and the .neurons export format enables reuse across rooms.

12.3 Limitations and Future Work

Figure 19 — Current Limitations and Planned Mitigations

Limitation	Status / Planned Mitigation
SQLite scales to moderate load	PostgreSQL migration path identified; schema is Prisma-abstracted for straightforward swap.
WebLLM constrained by GPU memory	Model selection guided by VRAM detection at init. Smaller quantized models available as fallback.
Neurons require manual curation	Planned: background summarizer suggests Neurons from chat history for user review.
Dynamic room member addition requires key re-wrap	Currently handled by room creator. Full key re-distribution protocol in development.
No MUIR benchmark evaluation	Planned: evaluate GroupGPT's intervention pipeline on MUIR for direct comparison with Shen et al. [7].

13 · Conclusion

GroupGPT is a production-grade AI group chat platform that advances the state of the art across five dimensions simultaneously: structured persistent knowledge memory (Neurons), cryptographic end-to-end encryption with AI participation, hybrid edge-cloud inference for privacy and offline support, a rich multimodal ! command ecosystem spanning voice, code, research, and image generation, and a dual-database architecture that enforces strict isolation between public and private contexts. Having served paying users since April 4, 2025, the platform demonstrates that these five capabilities can coexist in a single maintainable production system — not merely as research prototypes — and remains freely accessible at groupgpt.tech.

References

OpenAI. GPT-4 Technical Report. arXiv:2303.08774, 2023.
Bai, Y., et al. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073, 2022.
Comanici, G., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv:2507.06261, 2025.
Mao, M., et al. Multi-user chat assistant (MUCA): a framework using LLMs to facilitate group conversations. arXiv:2401.04883, 2024.
Jacniacki, M. and Carmona Serrat, M. Humanlike Multi-user Agent (HUMA). arXiv:2511.17315, 2025.
Lee, C. P., Choi, J., and Mutlu, B. MAP: Multi-user Personalization with Collaborative LLM-powered Agents. CHI Extended Abstracts, 2025.
Shen, Z., et al. GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant. arXiv:2603.01059, 2026.
Packer, C., et al. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560, 2023.
Lewis, P., et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS, 2020.
Marlinspike, M. and Perrin, T. The Signal Protocol: Double Ratchet Algorithm. Signal Foundation, 2016 (Revision 4, November 2025).
MLC AI. WebLLM: High-Performance In-Browser LLM Inference. GitHub: mlc-ai/web-llm, 2024.
Liu, Y., et al. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. EMNLP, 2023.
Madison III, W. Neuron Surgery: Sculpting Smarter SLMs Through Task-Based Experience, Human-Guided Introspection, and Acceptability Mapping. Cortex Research Group Blog, March 27, 2025. cortexresearch.group/blog.