Skip to content

Scaling Architecture

Cortex uses a hybrid local-first + cloud approach to scale real-time connections beyond 50,000 concurrent users while keeping cloud costs near zero.

The Problem

The current WebSocket infrastructure runs on a single Cloudflare Durable Object (WebSocketRoom). This works well up to ~50,000 concurrent connections per DO instance. Beyond that:

  • Single DO hits Cloudflare's connection limit
  • Cloud WebSocket costs grow linearly with connections
  • Every connected user costs wall-clock time on the DO

The Solution: Hybrid Local + Cloud

DESKTOP APP (user's machine)              ACROBI CLOUD (Cloudflare)
============================              ========================

Next.js Renderer (3691)                   Cloudflare Worker
  |                                         |
  v                                         v
WebSocket client ----ws://localhost:3693    POST /broadcast
  |                                         |
  v                                         v
LocalRealtimeServer (Electron)            WebSocketRoom DO
  |                                       (per-org sharded,
  |<---- HTTP POST /ingest ----           web-only users only)
  |       (cloud push events)               |
  v                                         v
EventSyncBridge ----HTTPS poll/push----> Cloud API
  |                                      (adaptive 5-30s)
  v
better-sqlite3
(event replay)

How It Works

  1. Desktop users (majority): Real-time events are delivered via an embedded WebSocket server running locally in the Electron app on port 3693. Zero cloud WebSocket connections needed.

  2. Web-only users (minority): Connect to the cloud WebSocket DO, which is sharded per-organization for scalability.

  3. Cross-user sync: When Team Member A's agent completes, the cloud pushes the event to Team Member B's desktop via webhook callback. No persistent cloud connection needed.

Cost Comparison

ScaleAll CloudHybrid (Desktop + Cloud)Savings
100K users~$150-300/mo~$27-55/mo80-85%
500K users~$750-1,500/mo~$115-230/mo85%
1M users~$1,500-3,000/mo~$225-450/mo85%

Assumptions: 85% desktop adoption, average 8hr/day connection, 22 workdays/month.

Desktop: LocalRealtimeServer

The Electron desktop app embeds a WebSocket server in the main process, following the same pattern as the existing OllamaProxy service.

Architecture

  • Port: 3693 (next in sequence: 3690=API, 3691=Next.js, 3692=Ollama)
  • WebSocket: ws://127.0.0.1:3693/ws — renderer connects here
  • HTTP: POST /ingest — receives events pushed from cloud
  • Health: GET /health — standard health check
  • Storage: better-sqlite3 event log for offline replay and deduplication

Local-Only Auth

No JWT needed for localhost connections. The server binds to 127.0.0.1 only — not accessible from the network.

Event Persistence

Events are stored in a local SQLite table for:

  • Offline replay: When the app restarts, the last 24 hours of events are available
  • Deduplication: Events have UUIDs to prevent duplicates from poll + push
  • Auto-cleanup: Events older than 24 hours are pruned
sql
CREATE TABLE realtime_events (
  id TEXT PRIMARY KEY,
  org_id TEXT NOT NULL,
  user_id TEXT NOT NULL,
  type TEXT NOT NULL,
  payload TEXT NOT NULL,
  timestamp INTEGER NOT NULL,
  received_at INTEGER NOT NULL
);

CREATE INDEX idx_events_org_ts ON realtime_events(org_id, timestamp);

Cloud: Per-Org DO Sharding

For web-only users, the cloud WebSocket is sharded by organization:

typescript
// Before: single global DO
const doId = env.WEBSOCKET_ROOMS.idFromName("global");

// After: per-org sharding
const doId = env.WEBSOCKET_ROOMS.idFromName(`org:${orgId}`);

This means each organization gets its own DO instance. With per-org sharding, you'd need a single org with 50,000+ concurrent web-only users to hit the limit — extremely unlikely.

Migration

The current WebSocketRoom DO uses zero persistent storage (all in-memory). Migration is seamless:

  1. Deploy new code with org-based DO key
  2. Existing connections close naturally
  3. Clients reconnect to their org-specific DO
  4. Backward-compatible "global" fallback during rollout

Cross-User Sync: EventSyncBridge

When Team Member A triggers an event, Team Member B's desktop needs to know. This is handled without persistent cloud WebSocket connections:

Primary: Webhook Push

  1. Desktop app registers a callback URL with the cloud on startup
  2. Cloud's broadcastActivity() POSTs events to registered desktop callback URLs
  3. Desktop's /ingest endpoint receives the event and broadcasts locally
  4. Latency: sub-second when tunnels are active

Fallback: Adaptive Polling

If webhook push is unavailable (NAT, firewall):

  1. Desktop polls GET /api/events/since?cursor=<timestamp>&orgId=<id>
  2. Poll interval adapts: 5s when activity is high, 30s when idle
  3. Latency: 5-30 seconds

Event Envelope

typescript
interface CrossUserEvent {
  id: string;          // UUID for deduplication
  orgId: string;       // Organization scope
  userId: string;      // Originator
  type: string;        // Event type (agent.completed, task.created, etc.)
  payload: object;     // Event data
  timestamp: number;   // Unix ms
}

Client URL Resolution

The WebSocket client automatically selects the right server:

typescript
function getWebSocketUrl(): string {
  if (window.cortex) {
    // Running in Electron desktop app — use local server
    return 'ws://127.0.0.1:3693/ws';
  }
  // Running in browser — use cloud WebSocket
  return `wss://api.cortex.acrobi.com/api/ws?token=${getToken()}`;
}

Why Not P2P?

We evaluated peer-to-peer mesh networking (desktop-to-desktop WebRTC/libp2p) and decided against it:

  • Requires STUN/TURN servers ($50-200/month at scale)
  • NAT traversal fails for ~15-20% of corporate networks
  • Exposes user IP addresses (security concern)
  • Marginal savings (~10-15%) over the hybrid approach
  • Significant implementation complexity

The hybrid approach already eliminates 85% of cloud connections. P2P provides diminishing returns.

Built by Acrobi