Performance & Scalability
Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28
Overview
This document defines performance targets, scalability strategies, and observability requirements for the MPAC platform. The architecture is designed to support 400,000+ concurrent devices, 80M daily transactions, and 15,000 RPS sustained throughput with 99.9% availability. The system leverages horizontal scaling, database optimization, caching strategies, and comprehensive monitoring through the mpac-obs observability stack.
Table of Contents
Performance Targets
Purpose: Quantifiable metrics for system performance and reliability.
Key Performance Indicators
| Metric | Target | Measurement | Alert Threshold |
|---|---|---|---|
| API Latency (P50) | <200ms | CloudWatch/Prometheus | >300ms |
| API Latency (P95) | <500ms | CloudWatch/Prometheus | >800ms |
| API Latency (P99) | <1000ms | CloudWatch/Prometheus | >1500ms |
| PGW Latency (P95) | <200ms | Tempo tracing | >400ms |
| Database Query (P95) | <50ms | pgBadger, Prometheus | >100ms |
| Throughput | 15,000 RPS | Load testing, Prometheus | <12,000 RPS |
| Concurrent Devices | 400,000+ | Active WebSocket connections | N/A |
| Daily Transactions | 80M | Business metrics | N/A |
| Availability | 99.9% | Uptime monitoring | <99.5% |
| Error Rate | <0.5% | Prometheus | >1% |
| Database Connections | <80% pool | pgBouncer metrics | >90% pool |
| Redis Hit Rate | >95% | Redis INFO stats | <90% |
Latency Breakdown by Endpoint Category
| Endpoint Category | P50 Target | P95 Target | P99 Target |
|---|---|---|---|
| Authentication (JWT issue) | <100ms | <200ms | <500ms |
| Device Token Request | <150ms | <300ms | <600ms |
| Order Creation | <200ms | <500ms | <1000ms |
| Payment Intent Creation | <150ms | <300ms | <800ms |
| Payment Confirmation | <200ms | <400ms | <1000ms |
| Report Generation (sync) | <500ms | <2000ms | <5000ms |
| Report Generation (async) | <50ms | <100ms | <200ms (queue time) |
Load Testing Scenarios
Scenario 1: Peak Transaction Volume
Duration: 1 hour
Target: 15,000 RPS sustained
Traffic Mix:
- 40% Order Creation (6,000 RPS)
- 30% Payment Confirmation (4,500 RPS)
- 20% Bill Retrieval (3,000 RPS)
- 10% Device Token Requests (1,500 RPS)
Success Criteria:
- P95 latency < 500ms
- Error rate < 0.5%
- No database connection exhaustion
- CPU < 70% per containerScenario 2: Device Fleet Connectivity
Duration: 30 minutes
Target: 400,000 concurrent WebSocket connections
Traffic Mix:
- 400,000 devices maintain WebSocket
- 5,000 devices/min reconnect (network churn)
- Heartbeat every 30 seconds
Success Criteria:
- WebSocket connection success rate > 99%
- Reconnection time < 5 seconds
- Memory usage stable (no leaks)
- Network bandwidth < 100 MbpsScenario 3: Burst Load (Black Friday)
Duration: 2 hours
Target: 25,000 RPS peak (167% of normal)
Traffic Pattern:
- Ramp up: 15k → 25k RPS over 15 minutes
- Sustain: 25k RPS for 90 minutes
- Ramp down: 25k → 15k RPS over 15 minutes
Success Criteria:
- Auto-scaling triggers within 2 minutes
- P99 latency < 2000ms during peak
- No service outages
- Graceful degradation (queue async tasks)Scalability Strategies
Purpose: Architectural patterns to handle growth in devices, transactions, and users.
Horizontal Scaling
Stateless Services:
All Backend Services (svc-portal, svc-smarttab, mpac-pgw)
└─ Stateless design (no in-memory session state)
└─ Session data stored in Redis
└─ ECS Auto Scaling configuration:
├─ Min capacity: 2 tasks per service
├─ Max capacity: 50 tasks per service
├─ Target CPU: 60%
├─ Target memory: 70%
└─ Scale-out: Add 2 tasks if CPU > 60% for 2 minutes
Scale-in: Remove 1 task if CPU < 40% for 5 minutesAuto Scaling Policy Example:
{
"ServiceName": "svc-smarttab",
"ScalableTargetAction": {
"MinCapacity": 2,
"MaxCapacity": 50
},
"PolicyConfiguration": {
"TargetTrackingScaling": {
"TargetValue": 60.0,
"PredefinedMetric": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}
}
}Load Balancing:
Application Load Balancer (ALB)
└─ Target Group: svc-smarttab
├─ Health Check: GET /health (every 30s)
├─ Deregistration Delay: 30s (graceful shutdown)
├─ Stickiness: Disabled (stateless)
└─ Load Balancing Algorithm: Round robinDatabase Optimization
Read Replicas:
Primary Database (Write)
└─ svc-portal-primary.rds.amazonaws.com
├─ Instance: db.r6g.xlarge (4 vCPU, 32 GB)
├─ IOPS: 12,000 (gp3)
└─ Connections: 200 max
Read Replicas (Read-only)
├─ svc-portal-replica-1.rds.amazonaws.com
├─ svc-portal-replica-2.rds.amazonaws.com
└─ svc-portal-replica-3.rds.amazonaws.com (reporting)
├─ Instance: db.r6g.large (2 vCPU, 16 GB)
├─ Lag: <5 seconds
└─ Connections: 100 max per replica
Query Routing:
- Write operations → Primary
- Read operations (reports) → Replica 3
- Read operations (API) → Replica 1 or 2 (round robin)Connection Pooling (pgBouncer):
pgBouncer Configuration
└─ Mode: Transaction pooling
└─ Max connections per service: 100
└─ Server connections: 200 (shared)
└─ Pool size: 25 per database
└─ Reserve pool: 5 (for admin)
# pgbouncer.ini
[databases]
mpac_portal = host=rds-primary port=5432 dbname=mpac_portal
mpac_smarttab = host=rds-primary port=5432 dbname=mpac_smarttab
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600Benefits:
- Reduced connection overhead (connection reuse)
- Protection against connection exhaustion
- Lower database CPU usage
- Faster query execution (no handshake overhead)
Partitioning Strategy:
-- Monthly partitions for high-volume tables
CREATE TABLE payment_transactions (
id UUID NOT NULL,
payment_intent_id TEXT NOT NULL,
provider_name TEXT NOT NULL,
amount DECIMAL(15, 2) NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Auto-create partitions with pg_partman
SELECT partman.create_parent(
p_parent_table := 'public.payment_transactions',
p_control := 'created_at',
p_type := 'native',
p_interval := '1 month',
p_premake := 3 -- Pre-create 3 future partitions
);Partitioned Tables:
payment_transactions- 80M+ rows/monthauthentication_logs- High write volume (1M+/day)webhook_events- Retention 90 days
Maintenance:
-- Drop old partitions (automated via cron)
SELECT partman.run_maintenance('public.payment_transactions');
-- Detach partition before drop (instant)
ALTER TABLE payment_transactions
DETACH PARTITION payment_transactions_2025_01;
-- Archive to S3 (Parquet format)
COPY payment_transactions_2025_01 TO PROGRAM
'aws s3 cp - s3://mpac-archive/payment_transactions/2025-01.parquet'
WITH (FORMAT parquet);
-- Drop detached partition
DROP TABLE payment_transactions_2025_01;Indexing Best Practices:
-- Example: Bills table indexing strategy
CREATE TABLE bills (
id UUID PRIMARY KEY,
merchant_id INT NOT NULL,
store_id INT NOT NULL,
device_id UUID,
bill_number TEXT NOT NULL,
status TEXT NOT NULL,
total_amount DECIMAL(15, 2),
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL
);
-- Tenant-scoped queries (most common)
CREATE INDEX idx_bills_merchant_store ON bills(merchant_id, store_id);
-- Status queries (frequent)
CREATE INDEX idx_bills_status ON bills(status)
WHERE status IN ('open', 'pending'); -- Partial index
-- Time-range queries (reports)
CREATE INDEX idx_bills_created_at ON bills(created_at DESC);
-- Composite for complex filters
CREATE INDEX idx_bills_store_status_created
ON bills(store_id, status, created_at DESC)
WHERE status = 'closed'; -- Reports on closed bills
-- Device lookup (rare, but critical)
CREATE INDEX idx_bills_device_id ON bills(device_id);Index Monitoring:
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan AS scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 100 -- Less than 100 scans
AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Drop unused indexes (manual review)
DROP INDEX idx_bills_rarely_used;Caching Strategy
Redis Configuration:
ElastiCache Redis
└─ Cluster Mode: Enabled (3 shards)
└─ Replication: 2 replicas per shard
└─ Instance: cache.r6g.large (13.07 GB memory)
└─ Eviction Policy: allkeys-lru
└─ Max Memory: 12 GB (with 1 GB reserved)Cache Usage by Service:
| Data Type | TTL | Invalidation | Hit Rate Target |
|---|---|---|---|
| User session tokens | 15 minutes | On logout | 98% |
| Device tokens | 90 seconds | On expiry | 95% |
| Rate limiting counters | 1 minute | Time-based | 100% |
| Idempotency keys | 24 hours | On expiry | 80% |
| Merchant/store config | 5 minutes | On update | 99% |
| JWT public keys | 5 minutes | On rotation | 99.9% |
| Payment provider configs | 10 minutes | On change | 95% |
Application-Level Caching:
# JWT public key cache (in-memory with TTL)
from cachetools import TTLCache
import asyncio
jwt_key_cache = TTLCache(maxsize=10, ttl=300) # 5 minutes
async def get_jwt_public_key(key_id: str) -> str:
"""Get JWT public key with in-memory caching."""
if key_id in jwt_key_cache:
return jwt_key_cache[key_id]
# Cache miss: fetch from database
key = await db.query(
"SELECT public_key_pem FROM jwt_keys WHERE key_id = :key_id",
{"key_id": key_id}
)
if key:
jwt_key_cache[key_id] = key.public_key_pem
return key.public_key_pem
raise KeyError(f"JWT key not found: {key_id}")Cache Invalidation:
# Merchant config cache invalidation on update
async def update_merchant(merchant_id: int, data: dict):
"""Update merchant and invalidate cache."""
# Update database
await db.execute(
"UPDATE merchants SET name = :name, updated_at = NOW() WHERE id = :id",
{"id": merchant_id, "name": data["name"]}
)
# Invalidate cache
cache_key = f"merchant:{merchant_id}"
await redis.delete(cache_key)
# Sync to PGW
await pgw_client.sync_merchant(merchant_id)Message Queue (Future)
NATS Configuration (Planned):
NATS JetStream Cluster
└─ 3 nodes for high availability
└─ Streams:
├─ receipts.delivery (email/SMS delivery)
├─ settlements.generation (daily batch)
├─ reports.generation (async reports)
└─ devices.sync (device config updates)
Stream Configuration:
- Retention: Work queue (delete after ack)
- Max age: 24 hours (unprocessed messages)
- Replicas: 3 (for durability)
- Max consumers: 10 per streamUse Cases:
- Async receipt delivery (email/SMS)
- Settlement generation (daily batch)
- Report generation (large reports)
- Device sync events (config updates)
Monitoring & Observability (mpac-obs)
Purpose: Comprehensive visibility into system health, performance, and business metrics.
Logging (Loki)
Log Format:
{
"timestamp": "2026-01-28T11:05:00.123Z",
"level": "INFO",
"service": "svc-smarttab",
"correlation_id": "req_abc123",
"trace_id": "01HXYZ...",
"span_id": "span_456",
"caller": {
"type": "device",
"id": "device_uuid",
"merchant_id": 1,
"store_id": 2
},
"message": "Order created successfully",
"metadata": {
"order_id": "ORDER-123",
"amount": 100000,
"duration_ms": 45
}
}Log Aggregation:
All Services
└─ Structured JSON logs to stdout
└─ Alloy collector (DaemonSet on ECS)
├─ Parse JSON logs
├─ Add labels: service, environment, region
├─ Filter sensitive data (PII, credentials)
└─ Forward to Loki (OTLP HTTP)
Loki Storage
└─ Retention: 14 days (configurable)
└─ Compression: gzip
└─ Indexing: By service, level, correlation_idCorrelation ID Propagation:
# Middleware injects correlation ID
@app.middleware("http")
async def correlation_id_middleware(request: Request, call_next):
correlation_id = request.headers.get("X-Correlation-ID", str(uuid4()))
request.state.correlation_id = correlation_id
# Propagate to response
response = await call_next(request)
response.headers["X-Correlation-ID"] = correlation_id
return response
# Logger includes correlation ID
logger.info(
"Order created",
extra={"correlation_id": request.state.correlation_id}
)Metrics (Prometheus)
Metric Categories:
1. HTTP Metrics:
from prometheus_client import Counter, Histogram
http_requests_total = Counter(
"http_requests_total",
"Total HTTP requests",
["method", "endpoint", "status"]
)
http_request_duration_seconds = Histogram(
"http_request_duration_seconds",
"HTTP request latency",
["method", "endpoint"],
buckets=[0.01, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0]
)
# Usage in middleware
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
duration = time.time() - start_time
http_requests_total.labels(
method=request.method,
endpoint=request.url.path,
status=response.status_code
).inc()
http_request_duration_seconds.labels(
method=request.method,
endpoint=request.url.path
).observe(duration)
return response2. Database Metrics:
db_query_duration_seconds = Histogram(
"db_query_duration_seconds",
"Database query latency",
["operation", "table"],
buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0]
)
db_connection_pool_size = Gauge(
"db_connection_pool_size",
"Current database connection pool size"
)
db_connection_pool_available = Gauge(
"db_connection_pool_available",
"Available database connections in pool"
)3. Redis Metrics:
redis_hit_rate = Gauge(
"redis_hit_rate",
"Redis cache hit rate (percentage)"
)
redis_operations_total = Counter(
"redis_operations_total",
"Total Redis operations",
["operation", "result"] # result: hit, miss, error
)4. Business Metrics:
payments_total = Counter(
"payments_total",
"Total payments processed",
["payment_method", "status"]
)
payment_amount_total = Counter(
"payment_amount_total",
"Total payment amount processed (JPY)",
["payment_method"]
)
devices_online = Gauge(
"devices_online",
"Number of devices currently online"
)Metric Retention:
- High-resolution: 7 days (15s scrape interval)
- Low-resolution: 30 days (5m downsampled)
Tracing (Tempo)
OpenTelemetry Instrumentation:
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Initialize tracer
tracer = trace.get_tracer(__name__)
# Auto-instrument FastAPI
FastAPIInstrumentor.instrument_app(app)
# Configure OTLP exporter
trace_exporter = OTLPSpanExporter(
endpoint="http://alloy:4317",
insecure=True
)
# Manual span creation
async def create_order(order_data: dict):
with tracer.start_as_current_span("create_order") as span:
span.set_attribute("order.amount", order_data["amount"])
span.set_attribute("order.merchant_id", order_data["merchant_id"])
# Database operation
with tracer.start_as_current_span("db.insert_order"):
order_id = await db.insert_order(order_data)
# External API call
with tracer.start_as_current_span("pgw.create_payment_intent"):
payment_intent = await pgw_client.create_payment_intent(order_id)
return order_idTrace Context Propagation:
# Propagate trace context to downstream services
async def call_pgw(data: dict):
headers = {}
# Inject trace context into headers
from opentelemetry.propagate import inject
inject(headers)
# Make HTTP request with trace headers
response = await httpx.post(
"https://api.pgw.mpac-cloud.com/v1/payment_intents",
json=data,
headers=headers # Contains traceparent header
)
return response.json()Trace Retention:
- Full traces: 7 days
- Sampled traces: 30 days (10% sample rate)
Dashboards (Grafana)
Pre-built Dashboards:
1. Service Health Dashboard
- Request rate (RPS)
- Latency (P50, P95, P99)
- Error rate (5xx responses)
- Active connections
- CPU and memory usage
2. Payment Processing Dashboard
- Payment intent creation rate
- Payment confirmation latency
- Payment success/failure rates by provider
- Webhook processing time
- Idempotency key hit rate
3. Device Fleet Dashboard
- Devices online count
- Device authentication rate
- WebSocket connections active
- Device token refresh rate
- Offline devices (by merchant/store)
4. Database Performance Dashboard
- Query latency (P95, P99)
- Connection pool usage
- Slow query count (>100ms)
- Table size growth
- Index hit rate
5. Business Metrics Dashboard
- Daily transaction volume
- Payment method breakdown
- Top merchants by volume
- Average transaction amount
- Refund rate
Alerting
Alert Routing:
PagerDuty Integration
└─ Critical Alerts → Immediate page (24/7)
└─ Warning Alerts → Slack notification
└─ Info Alerts → Dashboard onlyAlert Rules:
Critical (PagerDuty):
- alert: ServiceDown
expr: up{job="svc-smarttab"} == 0
for: 2m
annotations:
summary: "Service {{ $labels.instance }} is down"
description: "Service has been down for 2 minutes"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is {{ $value }}% (threshold: 5%)"
- alert: DatabaseConnectionExhaustion
expr: db_connection_pool_available < 10
for: 2m
annotations:
summary: "Database connection pool nearly exhausted"
description: "Only {{ $value }} connections available"Warning (Slack):
- alert: HighLatency
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1.0
for: 10m
annotations:
summary: "High latency on {{ $labels.endpoint }}"
description: "P95 latency is {{ $value }}s (threshold: 1s)"
- alert: LowRedisHitRate
expr: redis_hit_rate < 90
for: 15m
annotations:
summary: "Low Redis cache hit rate"
description: "Hit rate is {{ $value }}% (threshold: 90%)"Cross-References
Related Domains
- Payment Gateway - Payment processing performance requirements
- Device Management - WebSocket scalability
- Reporting & Analytics - Report generation performance
Related Technical Sections
- Communication Patterns - Rate limiting and throttling
- Database Architecture - Database optimization strategies
- Security Architecture - Security overhead on performance
Related Deployment Sections
- AWS Infrastructure - ECS auto-scaling configuration
- Load Testing - Performance test scenarios
- Capacity Planning - Resource sizing guidelines
Navigation
Previous: Security ArchitectureNext: Deployment OverviewUp: Technical Architecture Index