Skip to content

Performance & Scalability

Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28


Overview

This document defines performance targets, scalability strategies, and observability requirements for the MPAC platform. The architecture is designed to support 400,000+ concurrent devices, 80M daily transactions, and 15,000 RPS sustained throughput with 99.9% availability. The system leverages horizontal scaling, database optimization, caching strategies, and comprehensive monitoring through the mpac-obs observability stack.

Table of Contents


Performance Targets

Purpose: Quantifiable metrics for system performance and reliability.

Key Performance Indicators

MetricTargetMeasurementAlert Threshold
API Latency (P50)<200msCloudWatch/Prometheus>300ms
API Latency (P95)<500msCloudWatch/Prometheus>800ms
API Latency (P99)<1000msCloudWatch/Prometheus>1500ms
PGW Latency (P95)<200msTempo tracing>400ms
Database Query (P95)<50mspgBadger, Prometheus>100ms
Throughput15,000 RPSLoad testing, Prometheus<12,000 RPS
Concurrent Devices400,000+Active WebSocket connectionsN/A
Daily Transactions80MBusiness metricsN/A
Availability99.9%Uptime monitoring<99.5%
Error Rate<0.5%Prometheus>1%
Database Connections<80% poolpgBouncer metrics>90% pool
Redis Hit Rate>95%Redis INFO stats<90%

Latency Breakdown by Endpoint Category

Endpoint CategoryP50 TargetP95 TargetP99 Target
Authentication (JWT issue)<100ms<200ms<500ms
Device Token Request<150ms<300ms<600ms
Order Creation<200ms<500ms<1000ms
Payment Intent Creation<150ms<300ms<800ms
Payment Confirmation<200ms<400ms<1000ms
Report Generation (sync)<500ms<2000ms<5000ms
Report Generation (async)<50ms<100ms<200ms (queue time)

Load Testing Scenarios

Scenario 1: Peak Transaction Volume

Duration: 1 hour
Target: 15,000 RPS sustained
Traffic Mix:
  - 40% Order Creation (6,000 RPS)
  - 30% Payment Confirmation (4,500 RPS)
  - 20% Bill Retrieval (3,000 RPS)
  - 10% Device Token Requests (1,500 RPS)

Success Criteria:
  - P95 latency < 500ms
  - Error rate < 0.5%
  - No database connection exhaustion
  - CPU < 70% per container

Scenario 2: Device Fleet Connectivity

Duration: 30 minutes
Target: 400,000 concurrent WebSocket connections
Traffic Mix:
  - 400,000 devices maintain WebSocket
  - 5,000 devices/min reconnect (network churn)
  - Heartbeat every 30 seconds

Success Criteria:
  - WebSocket connection success rate > 99%
  - Reconnection time < 5 seconds
  - Memory usage stable (no leaks)
  - Network bandwidth < 100 Mbps

Scenario 3: Burst Load (Black Friday)

Duration: 2 hours
Target: 25,000 RPS peak (167% of normal)
Traffic Pattern:
  - Ramp up: 15k → 25k RPS over 15 minutes
  - Sustain: 25k RPS for 90 minutes
  - Ramp down: 25k → 15k RPS over 15 minutes

Success Criteria:
  - Auto-scaling triggers within 2 minutes
  - P99 latency < 2000ms during peak
  - No service outages
  - Graceful degradation (queue async tasks)

Scalability Strategies

Purpose: Architectural patterns to handle growth in devices, transactions, and users.

Horizontal Scaling

Stateless Services:

All Backend Services (svc-portal, svc-smarttab, mpac-pgw)
  └─ Stateless design (no in-memory session state)
  └─ Session data stored in Redis
  └─ ECS Auto Scaling configuration:
     ├─ Min capacity: 2 tasks per service
     ├─ Max capacity: 50 tasks per service
     ├─ Target CPU: 60%
     ├─ Target memory: 70%
     └─ Scale-out: Add 2 tasks if CPU > 60% for 2 minutes
         Scale-in: Remove 1 task if CPU < 40% for 5 minutes

Auto Scaling Policy Example:

json
{
  "ServiceName": "svc-smarttab",
  "ScalableTargetAction": {
    "MinCapacity": 2,
    "MaxCapacity": 50
  },
  "PolicyConfiguration": {
    "TargetTrackingScaling": {
      "TargetValue": 60.0,
      "PredefinedMetric": {
        "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
      },
      "ScaleOutCooldown": 60,
      "ScaleInCooldown": 300
    }
  }
}

Load Balancing:

Application Load Balancer (ALB)
  └─ Target Group: svc-smarttab
     ├─ Health Check: GET /health (every 30s)
     ├─ Deregistration Delay: 30s (graceful shutdown)
     ├─ Stickiness: Disabled (stateless)
     └─ Load Balancing Algorithm: Round robin

Database Optimization

Read Replicas:

Primary Database (Write)
  └─ svc-portal-primary.rds.amazonaws.com
     ├─ Instance: db.r6g.xlarge (4 vCPU, 32 GB)
     ├─ IOPS: 12,000 (gp3)
     └─ Connections: 200 max

Read Replicas (Read-only)
  ├─ svc-portal-replica-1.rds.amazonaws.com
  ├─ svc-portal-replica-2.rds.amazonaws.com
  └─ svc-portal-replica-3.rds.amazonaws.com (reporting)
     ├─ Instance: db.r6g.large (2 vCPU, 16 GB)
     ├─ Lag: <5 seconds
     └─ Connections: 100 max per replica

Query Routing:
  - Write operations → Primary
  - Read operations (reports) → Replica 3
  - Read operations (API) → Replica 1 or 2 (round robin)

Connection Pooling (pgBouncer):

pgBouncer Configuration
  └─ Mode: Transaction pooling
  └─ Max connections per service: 100
  └─ Server connections: 200 (shared)
  └─ Pool size: 25 per database
  └─ Reserve pool: 5 (for admin)

# pgbouncer.ini
[databases]
mpac_portal = host=rds-primary port=5432 dbname=mpac_portal
mpac_smarttab = host=rds-primary port=5432 dbname=mpac_smarttab

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600

Benefits:

  • Reduced connection overhead (connection reuse)
  • Protection against connection exhaustion
  • Lower database CPU usage
  • Faster query execution (no handshake overhead)

Partitioning Strategy:

sql
-- Monthly partitions for high-volume tables
CREATE TABLE payment_transactions (
  id UUID NOT NULL,
  payment_intent_id TEXT NOT NULL,
  provider_name TEXT NOT NULL,
  amount DECIMAL(15, 2) NOT NULL,
  status TEXT NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Auto-create partitions with pg_partman
SELECT partman.create_parent(
  p_parent_table := 'public.payment_transactions',
  p_control := 'created_at',
  p_type := 'native',
  p_interval := '1 month',
  p_premake := 3  -- Pre-create 3 future partitions
);

Partitioned Tables:

  • payment_transactions - 80M+ rows/month
  • authentication_logs - High write volume (1M+/day)
  • webhook_events - Retention 90 days

Maintenance:

sql
-- Drop old partitions (automated via cron)
SELECT partman.run_maintenance('public.payment_transactions');

-- Detach partition before drop (instant)
ALTER TABLE payment_transactions
  DETACH PARTITION payment_transactions_2025_01;

-- Archive to S3 (Parquet format)
COPY payment_transactions_2025_01 TO PROGRAM
  'aws s3 cp - s3://mpac-archive/payment_transactions/2025-01.parquet'
  WITH (FORMAT parquet);

-- Drop detached partition
DROP TABLE payment_transactions_2025_01;

Indexing Best Practices:

sql
-- Example: Bills table indexing strategy
CREATE TABLE bills (
  id UUID PRIMARY KEY,
  merchant_id INT NOT NULL,
  store_id INT NOT NULL,
  device_id UUID,
  bill_number TEXT NOT NULL,
  status TEXT NOT NULL,
  total_amount DECIMAL(15, 2),
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE NOT NULL
);

-- Tenant-scoped queries (most common)
CREATE INDEX idx_bills_merchant_store ON bills(merchant_id, store_id);

-- Status queries (frequent)
CREATE INDEX idx_bills_status ON bills(status)
  WHERE status IN ('open', 'pending');  -- Partial index

-- Time-range queries (reports)
CREATE INDEX idx_bills_created_at ON bills(created_at DESC);

-- Composite for complex filters
CREATE INDEX idx_bills_store_status_created
  ON bills(store_id, status, created_at DESC)
  WHERE status = 'closed';  -- Reports on closed bills

-- Device lookup (rare, but critical)
CREATE INDEX idx_bills_device_id ON bills(device_id);

Index Monitoring:

sql
-- Find unused indexes
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan AS scans,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 100  -- Less than 100 scans
  AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes (manual review)
DROP INDEX idx_bills_rarely_used;

Caching Strategy

Redis Configuration:

ElastiCache Redis
  └─ Cluster Mode: Enabled (3 shards)
  └─ Replication: 2 replicas per shard
  └─ Instance: cache.r6g.large (13.07 GB memory)
  └─ Eviction Policy: allkeys-lru
  └─ Max Memory: 12 GB (with 1 GB reserved)

Cache Usage by Service:

Data TypeTTLInvalidationHit Rate Target
User session tokens15 minutesOn logout98%
Device tokens90 secondsOn expiry95%
Rate limiting counters1 minuteTime-based100%
Idempotency keys24 hoursOn expiry80%
Merchant/store config5 minutesOn update99%
JWT public keys5 minutesOn rotation99.9%
Payment provider configs10 minutesOn change95%

Application-Level Caching:

python
# JWT public key cache (in-memory with TTL)
from cachetools import TTLCache
import asyncio

jwt_key_cache = TTLCache(maxsize=10, ttl=300)  # 5 minutes

async def get_jwt_public_key(key_id: str) -> str:
    """Get JWT public key with in-memory caching."""
    if key_id in jwt_key_cache:
        return jwt_key_cache[key_id]

    # Cache miss: fetch from database
    key = await db.query(
        "SELECT public_key_pem FROM jwt_keys WHERE key_id = :key_id",
        {"key_id": key_id}
    )

    if key:
        jwt_key_cache[key_id] = key.public_key_pem
        return key.public_key_pem

    raise KeyError(f"JWT key not found: {key_id}")

Cache Invalidation:

python
# Merchant config cache invalidation on update
async def update_merchant(merchant_id: int, data: dict):
    """Update merchant and invalidate cache."""
    # Update database
    await db.execute(
        "UPDATE merchants SET name = :name, updated_at = NOW() WHERE id = :id",
        {"id": merchant_id, "name": data["name"]}
    )

    # Invalidate cache
    cache_key = f"merchant:{merchant_id}"
    await redis.delete(cache_key)

    # Sync to PGW
    await pgw_client.sync_merchant(merchant_id)

Message Queue (Future)

NATS Configuration (Planned):

NATS JetStream Cluster
  └─ 3 nodes for high availability
  └─ Streams:
     ├─ receipts.delivery (email/SMS delivery)
     ├─ settlements.generation (daily batch)
     ├─ reports.generation (async reports)
     └─ devices.sync (device config updates)

Stream Configuration:
  - Retention: Work queue (delete after ack)
  - Max age: 24 hours (unprocessed messages)
  - Replicas: 3 (for durability)
  - Max consumers: 10 per stream

Use Cases:

  • Async receipt delivery (email/SMS)
  • Settlement generation (daily batch)
  • Report generation (large reports)
  • Device sync events (config updates)

Monitoring & Observability (mpac-obs)

Purpose: Comprehensive visibility into system health, performance, and business metrics.

Logging (Loki)

Log Format:

json
{
  "timestamp": "2026-01-28T11:05:00.123Z",
  "level": "INFO",
  "service": "svc-smarttab",
  "correlation_id": "req_abc123",
  "trace_id": "01HXYZ...",
  "span_id": "span_456",
  "caller": {
    "type": "device",
    "id": "device_uuid",
    "merchant_id": 1,
    "store_id": 2
  },
  "message": "Order created successfully",
  "metadata": {
    "order_id": "ORDER-123",
    "amount": 100000,
    "duration_ms": 45
  }
}

Log Aggregation:

All Services
  └─ Structured JSON logs to stdout
  └─ Alloy collector (DaemonSet on ECS)
     ├─ Parse JSON logs
     ├─ Add labels: service, environment, region
     ├─ Filter sensitive data (PII, credentials)
     └─ Forward to Loki (OTLP HTTP)

Loki Storage
  └─ Retention: 14 days (configurable)
  └─ Compression: gzip
  └─ Indexing: By service, level, correlation_id

Correlation ID Propagation:

python
# Middleware injects correlation ID
@app.middleware("http")
async def correlation_id_middleware(request: Request, call_next):
    correlation_id = request.headers.get("X-Correlation-ID", str(uuid4()))
    request.state.correlation_id = correlation_id

    # Propagate to response
    response = await call_next(request)
    response.headers["X-Correlation-ID"] = correlation_id

    return response

# Logger includes correlation ID
logger.info(
    "Order created",
    extra={"correlation_id": request.state.correlation_id}
)

Metrics (Prometheus)

Metric Categories:

1. HTTP Metrics:

python
from prometheus_client import Counter, Histogram

http_requests_total = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

http_request_duration_seconds = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0]
)

# Usage in middleware
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start_time = time.time()

    response = await call_next(request)

    duration = time.time() - start_time
    http_requests_total.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()
    http_request_duration_seconds.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)

    return response

2. Database Metrics:

python
db_query_duration_seconds = Histogram(
    "db_query_duration_seconds",
    "Database query latency",
    ["operation", "table"],
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0]
)

db_connection_pool_size = Gauge(
    "db_connection_pool_size",
    "Current database connection pool size"
)

db_connection_pool_available = Gauge(
    "db_connection_pool_available",
    "Available database connections in pool"
)

3. Redis Metrics:

python
redis_hit_rate = Gauge(
    "redis_hit_rate",
    "Redis cache hit rate (percentage)"
)

redis_operations_total = Counter(
    "redis_operations_total",
    "Total Redis operations",
    ["operation", "result"]  # result: hit, miss, error
)

4. Business Metrics:

python
payments_total = Counter(
    "payments_total",
    "Total payments processed",
    ["payment_method", "status"]
)

payment_amount_total = Counter(
    "payment_amount_total",
    "Total payment amount processed (JPY)",
    ["payment_method"]
)

devices_online = Gauge(
    "devices_online",
    "Number of devices currently online"
)

Metric Retention:

  • High-resolution: 7 days (15s scrape interval)
  • Low-resolution: 30 days (5m downsampled)

Tracing (Tempo)

OpenTelemetry Instrumentation:

python
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Initialize tracer
tracer = trace.get_tracer(__name__)

# Auto-instrument FastAPI
FastAPIInstrumentor.instrument_app(app)

# Configure OTLP exporter
trace_exporter = OTLPSpanExporter(
    endpoint="http://alloy:4317",
    insecure=True
)

# Manual span creation
async def create_order(order_data: dict):
    with tracer.start_as_current_span("create_order") as span:
        span.set_attribute("order.amount", order_data["amount"])
        span.set_attribute("order.merchant_id", order_data["merchant_id"])

        # Database operation
        with tracer.start_as_current_span("db.insert_order"):
            order_id = await db.insert_order(order_data)

        # External API call
        with tracer.start_as_current_span("pgw.create_payment_intent"):
            payment_intent = await pgw_client.create_payment_intent(order_id)

        return order_id

Trace Context Propagation:

python
# Propagate trace context to downstream services
async def call_pgw(data: dict):
    headers = {}

    # Inject trace context into headers
    from opentelemetry.propagate import inject
    inject(headers)

    # Make HTTP request with trace headers
    response = await httpx.post(
        "https://api.pgw.mpac-cloud.com/v1/payment_intents",
        json=data,
        headers=headers  # Contains traceparent header
    )

    return response.json()

Trace Retention:

  • Full traces: 7 days
  • Sampled traces: 30 days (10% sample rate)

Dashboards (Grafana)

Pre-built Dashboards:

1. Service Health Dashboard

  • Request rate (RPS)
  • Latency (P50, P95, P99)
  • Error rate (5xx responses)
  • Active connections
  • CPU and memory usage

2. Payment Processing Dashboard

  • Payment intent creation rate
  • Payment confirmation latency
  • Payment success/failure rates by provider
  • Webhook processing time
  • Idempotency key hit rate

3. Device Fleet Dashboard

  • Devices online count
  • Device authentication rate
  • WebSocket connections active
  • Device token refresh rate
  • Offline devices (by merchant/store)

4. Database Performance Dashboard

  • Query latency (P95, P99)
  • Connection pool usage
  • Slow query count (>100ms)
  • Table size growth
  • Index hit rate

5. Business Metrics Dashboard

  • Daily transaction volume
  • Payment method breakdown
  • Top merchants by volume
  • Average transaction amount
  • Refund rate

Alerting

Alert Routing:

PagerDuty Integration
  └─ Critical Alerts → Immediate page (24/7)
  └─ Warning Alerts → Slack notification
  └─ Info Alerts → Dashboard only

Alert Rules:

Critical (PagerDuty):

yaml
- alert: ServiceDown
  expr: up{job="svc-smarttab"} == 0
  for: 2m
  annotations:
    summary: "Service {{ $labels.instance }} is down"
    description: "Service has been down for 2 minutes"

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High error rate on {{ $labels.service }}"
    description: "Error rate is {{ $value }}% (threshold: 5%)"

- alert: DatabaseConnectionExhaustion
  expr: db_connection_pool_available < 10
  for: 2m
  annotations:
    summary: "Database connection pool nearly exhausted"
    description: "Only {{ $value }} connections available"

Warning (Slack):

yaml
- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1.0
  for: 10m
  annotations:
    summary: "High latency on {{ $labels.endpoint }}"
    description: "P95 latency is {{ $value }}s (threshold: 1s)"

- alert: LowRedisHitRate
  expr: redis_hit_rate < 90
  for: 15m
  annotations:
    summary: "Low Redis cache hit rate"
    description: "Hit rate is {{ $value }}% (threshold: 90%)"

Cross-References

  • AWS Infrastructure - ECS auto-scaling configuration
  • Load Testing - Performance test scenarios
  • Capacity Planning - Resource sizing guidelines

Previous: Security ArchitectureNext: Deployment OverviewUp: Technical Architecture Index

MPAC — MP-Solution Advanced Cloud Service