WebLLM In-Browser AI, Offline Models & Mobile Ban Fix

GroupGPT can now run AI inference entirely in the browser using WebGPU — no server, no API key, no internet connection required. Also: mobile users were being silently banned by a reconnect-detection false positive. Fixed.

WebLLM: fully offline AI in the browser

This is the most experimental feature we've shipped. When the GroupGPT backend is unreachable — or when the user explicitly loads a local model — the app can now run AI inference entirely in the browser using WebLLM (WebGPU/WASM). No server. No API key. No internet connection.

The model is downloaded once and cached in the browser's origin-private file system. On subsequent loads, the service checks for a cached model before initiating any download — a user who loaded the model once never downloads it again unless they clear their browser data. When the model is loaded and the user is offline, all regular chat messages are routed through the local model instead of the socket server.

The OfflineService is the new coordinator. It monitors online/offline state and mediates between socket-based message paths (when the server is reachable) and WebLLM-based paths (when it isn't). The handoff is transparent: the same chat input, the same message rendering, different inference path.

Mobile model size restriction

WebLLM crashed on mobile Chrome. The default model selection exceeded GPU memory limits on mobile hardware, producing a hard crash with no useful error message. The fix: when a mobile user agent is detected, model selection is restricted to smaller quantizations (≤4-bit, under 2GB). Desktop Chrome on capable hardware can still load larger models. This tradeoff is intentional — a smaller model that works is better than a larger model that crashes.

Skip AI for bare URLs

If a message is a bare YouTube link or a pure URL with no surrounding text, the AI was generating low-value "I see you shared a link" replies. The message handler now detects this pattern and skips the AI call entirely. The URL renders as-is, linkified in the chat — no token cost, no unhelpful commentary.

Mobile ban false positives

This was a silent production bug. Socket.io's reconnection loop on mobile networks — where connections drop and re-establish frequently as users move between WiFi and cellular — was being misread by the IntrusionDetectionSystem as rapid-fire connection attempts from a malicious client. The result: legitimate mobile users were getting IP-banned.

The reconnect detection was tuned to distinguish Socket.io's own reconnect handshake from genuine connection flooding. The key signal is the presence of a valid session ID in the reconnecting socket's auth payload — genuine reconnects have one, flooding bots typically don't. The threshold for connection-rate triggering was also relaxed for the reconnect path.

Logo upload path fix

The logo upload endpoint was returning an absolute URL tied to the backend's current Railway deployment URL. When Railway rotated the deployment URL (which happens on every redeploy without a custom domain), all stored logos became 404s. Changed to return a relative path (/uploads/logo.png) so it stays valid regardless of where the backend is hosted.

ToS and Privacy pages

Static Terms of Service and Privacy Policy pages are now live at /terms and /privacy. Required for any app handling user data (authentication, payment, stored messages), and linked from the auth page footer. Better late than never.

Why it matters

WebLLM is a genuine product differentiator. An AI group chat that works with no server, no API key, and no internet connection is a different product category than a cloud-only chatbot. Whether you're on a plane, in a low-connectivity environment, or simply don't want your conversations routed through any server, the offline mode delivers the same core AI experience. The mobile ban fix was a production correctness issue — silently locking out mobile users is exactly the kind of problem that shows up in churn before it shows up in logs.