Scaling Guide
Enzyme is designed for single-server deployment with SQLite. The default configuration works well on a 2 GB / 1 vCPU box (~500 concurrent users). This guide explains how to scale up.
For a full list of configurable options, see Configuration Reference.
SQLite Tuning
SQLite handles all storage in Enzyme. These pragmas are set per-connection via DSN parameters, so every connection in the pool gets them.
| Setting | Config Key | Default | What It Does |
|---|---|---|---|
max_open_conns |
database.max_open_conns |
2 |
Number of connections in the pool. With WAL mode, readers don't block writers, so >1 is safe. |
busy_timeout |
database.busy_timeout |
5000 |
Milliseconds to retry when the database is locked, before returning SQLITE_BUSY. |
cache_size |
database.cache_size |
-2000 |
Page cache size per connection. Negative = KB (-2000 = ~2 MB). Larger cache = fewer disk reads. |
mmap_size |
database.mmap_size |
0 |
Memory-mapped I/O in bytes. 0 = disabled. Enables the OS to page database data directly into memory. |
When to Adjust
- More concurrent users: Increase
max_open_conns(4-8 is reasonable for most workloads). Each read query can run on its own connection without blocking writes. With the default of2, a single slow query (e.g., notification aggregation across many channels) can hold one connection while every other handler — auth validation, message sends, SSE event persistence — queues behind the remaining one. - Write contention errors: Increase
busy_timeout. If you seeSQLITE_BUSYin logs, the default 5 seconds isn't enough for your write volume. - Slow queries on large databases: Increase
cache_size(e.g.,-64000for ~64 MB) and enablemmap_size(e.g.,268435456for 256 MB). This keeps hot pages in memory. - Small VPS with limited RAM: Keep defaults. The ~2 MB per-connection cache is intentionally conservative.
Note:
cache_sizeis per-connection. Total cache memory is roughlycache_size × max_open_conns. With the defaults (-2000and2), that's ~4 MB total.
HTTP Server Tuning
| Setting | Config Key | Default | What It Does |
|---|---|---|---|
read_timeout |
server.read_timeout |
30s |
Max time to read the full request (headers + body). |
write_timeout |
server.write_timeout |
60s |
Max time to write the response. SSE connections disable this per-connection. |
idle_timeout |
server.idle_timeout |
120s |
How long to keep idle keep-alive connections open. |
When to Adjust
- Large file uploads over slow connections: Increase
read_timeout(e.g.,120s). - Clients on high-latency networks: Increase
write_timeout. - Many idle connections consuming file descriptors: Decrease
idle_timeout.
SSE Tuning
| Setting | Config Key | Default | What It Does |
|---|---|---|---|
heartbeat_interval |
sse.heartbeat_interval |
30s |
How often heartbeat events are sent to keep connections alive. |
client_buffer_size |
sse.client_buffer_size |
256 |
Go channel buffer per connected SSE client. |
event_retention |
sse.event_retention |
24h |
How long events are stored in the database for reconnection catch-up. |
When to Adjust
- High-traffic workspaces (many messages/second): Increase
client_buffer_size(e.g.,512or1024). If the buffer fills, the slow client misses events and must reconnect. - Aggressive proxies/load balancers dropping idle connections: Decrease
heartbeat_interval(e.g.,15s). - Database growing too large from event storage: Decrease
event_retention.
OS-Level Tuning
File Descriptors
Each SSE connection uses a file descriptor. The default Linux limit (1024) can be a bottleneck with many concurrent users. Add to your systemd unit:
[Service]
LimitNOFILE=65536
Or set system-wide in /etc/security/limits.conf:
enzyme soft nofile 65536
enzyme hard nofile 65536
TCP Memory
Under heavy broadcast load, the kernel's TCP buffer memory can hit the pressure threshold and start dropping connections. This manifests as periodic waves of disconnections followed by reconnections.
Check current TCP memory usage:
cat /proc/net/sockstat | grep TCP
# TCP: inuse 5012 ... mem 18432
The mem value is in pages (4 KB each). Compare it to the kernel's pressure threshold:
sysctl net.ipv4.tcp_mem
# net.ipv4.tcp_mem = 383139 510854 766278
# min pressure max (in pages)
If mem approaches the pressure value during load, increase the limits:
sysctl -w net.ipv4.tcp_mem="50000 75000 100000"
Add to /etc/sysctl.conf to persist across reboots. On a 1 GB server with 5,000 SSE connections, TCP memory can peak at ~80 MB (20,000 pages) during broadcast bursts.
TCP Backlog
For servers handling many simultaneous connections:
sysctl -w net.core.somaxconn=4096
sysctl -w net.ipv4.tcp_max_syn_backlog=4096
sysctl -w net.core.netdev_max_backlog=5000
The default tcp_max_syn_backlog of 128 is too low — if connections drop and thousands of clients reconnect simultaneously, the SYN queue overflows and reconnections fail.
Add to /etc/sysctl.conf to persist across reboots.
Capabilities and SQLite
If you use setcap cap_net_bind_service=+ep on the Enzyme binary to bind to port 443 without root, be aware that file capabilities interact badly with modernc.org/sqlite (the pure-Go SQLite driver). The capability flag changes how the kernel handles the process's memory mappings, causing disk I/O errors on all write operations.
Instead, grant the capability through systemd:
[Service]
User=enzyme
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
This achieves the same result without modifying the binary.
Profiling
To investigate performance issues, you can add Go's built-in pprof profiler. Import net/http/pprof and start a listener on a localhost-only port in a goroutine:
import "net/http/pprof"
mux := http.NewServeMux()
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
go http.ListenAndServe("localhost:6060", mux)
Then access it via SSH tunnel:
ssh -L 6060:localhost:6060 root@your-server
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30 # CPU
go tool pprof http://localhost:6060/debug/pprof/heap # memory
SSE Performance
Enzyme broadcasts real-time events to all connected clients via Server-Sent Events. At high connection counts, three bottlenecks can appear:
Per-subscriber serialization
Each SSE event is serialized to JSON once before broadcast and sent as pre-marshaled bytes to all subscribers. If you see high CPU in json.Marshal under load, this is already optimized in current versions.
HTTP/2 vs HTTP/1.1
HTTP/2 multiplexes streams over a shared TCP connection, while HTTP/1.1 gives each SSE client its own socket. In practice, real browsers open one SSE connection per user (not multiple streams over one TCP connection), so HTTP/2's multiplexing doesn't affect SSE fan-out. Enzyme leaves HTTP/2 enabled (Go's default) — testing showed no performance difference for SSE workloads.
Flush overhead
SSE writes are batched — when an event arrives, all pending events in the client's buffer are drained and written before a single Flush() call. This reduces syscall overhead under burst traffic. The client_buffer_size setting controls how many events can queue before the client is considered too slow.
Reverse Proxy Tuning
See the Self-Hosting Guide for nginx and Caddy configuration examples. The key considerations for scaling:
- Disable response buffering for SSE endpoints (
proxy_buffering offin nginx,flush_interval -1in Caddy) - Set long read timeouts on SSE paths (e.g.,
proxy_read_timeout 86400s) to prevent the proxy from killing idle SSE connections - Increase worker connections in nginx (
worker_connections 4096) if you expect thousands of concurrent SSE clients
Example Configs
Small (2 GB / 1 vCPU, ~500 users)
Use the defaults. No config changes needed. A single vCPU comfortably handles ~2,000 SSE connections, and real-world message rates are far lower than stress-test conditions.
Medium (8 GB / 4 vCPU, ~5,000 users)
database:
max_open_conns: 4
busy_timeout: 10000
cache_size: -16000 # ~16 MB per conn, ~64 MB total
mmap_size: 134217728 # 128 MB
server:
read_timeout: '60s'
write_timeout: '120s'
idle_timeout: '120s'
sse:
client_buffer_size: 512
Large (16 GB / 8 vCPU, ~20,000 users)
database:
max_open_conns: 8
busy_timeout: 15000
cache_size: -64000 # ~64 MB per conn, ~512 MB total
mmap_size: 1073741824 # 1 GB
server:
read_timeout: '60s'
write_timeout: '120s'
idle_timeout: '300s'
sse:
heartbeat_interval: '20s'
client_buffer_size: 1024
telemetry:
sample_rate: 0.1 # 10% — full sampling is too expensive at this scale
Also set LimitNOFILE=65536 in the systemd unit for this profile.
Telemetry Sampling
If OpenTelemetry is enabled, the default sample_rate of 1.0 traces every request. This is fine for small deployments but adds overhead at scale — each trace generates spans for the HTTP request, database queries, and SSE operations.
For the Medium profile (~1,000 users), 0.5 (50%) is a reasonable starting point. For the Large profile (~10,000 users), drop to 0.1 (10%) or lower. You can also disable tracing entirely (0.0) and keep only metrics, which are always aggregated and cheap regardless of traffic volume.
telemetry:
enabled: true
sample_rate: 0.1 # adjust based on traffic
Monitoring
Key metrics to watch when scaling:
- SQLite busy retries:
SQLITE_BUSYerrors in logs indicate write contention. Increasebusy_timeoutor reducemax_open_conns. Occasional SQLITE_BUSY under peak load is normal — the important thing is that they don't cascade into persistent I/O errors (which would indicate an outdatedmodernc.org/sqliteversion; v1.46.1+ is required). - SSE connection count: Monitor the number of active SSE clients. Each consumes memory proportional to
client_buffer_size. - Memory usage: (
cache_sizexmax_open_conns) +mmap_size+ (SSE clients x buffer size x avg event size) gives a rough memory floor. - File descriptors:
ls /proc/$(pidof enzyme)/fd | wc -lshows current usage. Compare toLimitNOFILE. - Response latency: If P99 response times degrade, check
max_open_connsfirst. Expensive read queries (like notification aggregation) can hold connections for hundreds of milliseconds, starving other handlers. Increasing the pool size (e.g., to 4-8) gives handlers their own connections. If latency persists after that, the database may benefit from a larger cache or mmap. UseEXPLAIN QUERY PLANon slow queries to verify they're using indexes. - TCP memory: During SSE broadcast bursts, check
cat /proc/net/sockstat | grep TCP— if thememvalue approaches the kernel'stcp_mempressure threshold, connections will be dropped. See TCP Memory above. - Trace export backpressure: If the OTLP exporter queue fills up, you'll see dropped spans in logs. Lower
sample_rateor scale your collector.