Scaling Architecture
Cortex uses a hybrid local-first + cloud approach to scale real-time connections beyond 50,000 concurrent users while keeping cloud costs near zero.
The Problem
The current WebSocket infrastructure runs on a single Cloudflare Durable Object (WebSocketRoom). This works well up to ~50,000 concurrent connections per DO instance. Beyond that:
- Single DO hits Cloudflare's connection limit
- Cloud WebSocket costs grow linearly with connections
- Every connected user costs wall-clock time on the DO
The Solution: Hybrid Local + Cloud
DESKTOP APP (user's machine) ACROBI CLOUD (Cloudflare)
============================ ========================
Next.js Renderer (3691) Cloudflare Worker
| |
v v
WebSocket client ----ws://localhost:3693 POST /broadcast
| |
v v
LocalRealtimeServer (Electron) WebSocketRoom DO
| (per-org sharded,
|<---- HTTP POST /ingest ---- web-only users only)
| (cloud push events) |
v v
EventSyncBridge ----HTTPS poll/push----> Cloud API
| (adaptive 5-30s)
v
better-sqlite3
(event replay)How It Works
Desktop users (majority): Real-time events are delivered via an embedded WebSocket server running locally in the Electron app on port 3693. Zero cloud WebSocket connections needed.
Web-only users (minority): Connect to the cloud WebSocket DO, which is sharded per-organization for scalability.
Cross-user sync: When Team Member A's agent completes, the cloud pushes the event to Team Member B's desktop via webhook callback. No persistent cloud connection needed.
Cost Comparison
| Scale | All Cloud | Hybrid (Desktop + Cloud) | Savings |
|---|---|---|---|
| 100K users | ~$150-300/mo | ~$27-55/mo | 80-85% |
| 500K users | ~$750-1,500/mo | ~$115-230/mo | 85% |
| 1M users | ~$1,500-3,000/mo | ~$225-450/mo | 85% |
Assumptions: 85% desktop adoption, average 8hr/day connection, 22 workdays/month.
Desktop: LocalRealtimeServer
The Electron desktop app embeds a WebSocket server in the main process, following the same pattern as the existing OllamaProxy service.
Architecture
- Port: 3693 (next in sequence: 3690=API, 3691=Next.js, 3692=Ollama)
- WebSocket:
ws://127.0.0.1:3693/ws— renderer connects here - HTTP:
POST /ingest— receives events pushed from cloud - Health:
GET /health— standard health check - Storage:
better-sqlite3event log for offline replay and deduplication
Local-Only Auth
No JWT needed for localhost connections. The server binds to 127.0.0.1 only — not accessible from the network.
Event Persistence
Events are stored in a local SQLite table for:
- Offline replay: When the app restarts, the last 24 hours of events are available
- Deduplication: Events have UUIDs to prevent duplicates from poll + push
- Auto-cleanup: Events older than 24 hours are pruned
CREATE TABLE realtime_events (
id TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
user_id TEXT NOT NULL,
type TEXT NOT NULL,
payload TEXT NOT NULL,
timestamp INTEGER NOT NULL,
received_at INTEGER NOT NULL
);
CREATE INDEX idx_events_org_ts ON realtime_events(org_id, timestamp);Cloud: Per-Org DO Sharding
For web-only users, the cloud WebSocket is sharded by organization:
// Before: single global DO
const doId = env.WEBSOCKET_ROOMS.idFromName("global");
// After: per-org sharding
const doId = env.WEBSOCKET_ROOMS.idFromName(`org:${orgId}`);This means each organization gets its own DO instance. With per-org sharding, you'd need a single org with 50,000+ concurrent web-only users to hit the limit — extremely unlikely.
Migration
The current WebSocketRoom DO uses zero persistent storage (all in-memory). Migration is seamless:
- Deploy new code with org-based DO key
- Existing connections close naturally
- Clients reconnect to their org-specific DO
- Backward-compatible
"global"fallback during rollout
Cross-User Sync: EventSyncBridge
When Team Member A triggers an event, Team Member B's desktop needs to know. This is handled without persistent cloud WebSocket connections:
Primary: Webhook Push
- Desktop app registers a callback URL with the cloud on startup
- Cloud's
broadcastActivity()POSTs events to registered desktop callback URLs - Desktop's
/ingestendpoint receives the event and broadcasts locally - Latency: sub-second when tunnels are active
Fallback: Adaptive Polling
If webhook push is unavailable (NAT, firewall):
- Desktop polls
GET /api/events/since?cursor=<timestamp>&orgId=<id> - Poll interval adapts: 5s when activity is high, 30s when idle
- Latency: 5-30 seconds
Event Envelope
interface CrossUserEvent {
id: string; // UUID for deduplication
orgId: string; // Organization scope
userId: string; // Originator
type: string; // Event type (agent.completed, task.created, etc.)
payload: object; // Event data
timestamp: number; // Unix ms
}Client URL Resolution
The WebSocket client automatically selects the right server:
function getWebSocketUrl(): string {
if (window.cortex) {
// Running in Electron desktop app — use local server
return 'ws://127.0.0.1:3693/ws';
}
// Running in browser — use cloud WebSocket
return `wss://api.cortex.acrobi.com/api/ws?token=${getToken()}`;
}Why Not P2P?
We evaluated peer-to-peer mesh networking (desktop-to-desktop WebRTC/libp2p) and decided against it:
- Requires STUN/TURN servers ($50-200/month at scale)
- NAT traversal fails for ~15-20% of corporate networks
- Exposes user IP addresses (security concern)
- Marginal savings (~10-15%) over the hybrid approach
- Significant implementation complexity
The hybrid approach already eliminates 85% of cloud connections. P2P provides diminishing returns.