Performance & Scalability

Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28

Overview

This document defines performance targets, scalability strategies, and observability requirements for the MPAC platform. The architecture is designed to support 400,000+ concurrent devices, 80M daily transactions, and 15,000 RPS sustained throughput with 99.9% availability. The system leverages horizontal scaling, database optimization, caching strategies, and comprehensive monitoring through the mpac-obs observability stack.

Performance Targets
Scalability Strategies
Monitoring & Observability (mpac-obs)

Performance Targets

Purpose: Quantifiable metrics for system performance and reliability.

Key Performance Indicators

Metric	Target	Measurement	Alert Threshold
API Latency (P50)	<200ms	CloudWatch/Prometheus	>300ms
API Latency (P95)	<500ms	CloudWatch/Prometheus	>800ms
API Latency (P99)	<1000ms	CloudWatch/Prometheus	>1500ms
PGW Latency (P95)	<200ms	Tempo tracing	>400ms
Database Query (P95)	<50ms	pgBadger, Prometheus	>100ms
Throughput	15,000 RPS	Load testing, Prometheus	<12,000 RPS
Concurrent Devices	400,000+	Active WebSocket connections	N/A
Daily Transactions	80M	Business metrics	N/A
Availability	99.9%	Uptime monitoring	<99.5%
Error Rate	<0.5%	Prometheus	>1%
Database Connections	<80% pool	pgBouncer metrics	>90% pool
Redis Hit Rate	>95%	Redis INFO stats	<90%

Latency Breakdown by Endpoint Category

Endpoint Category	P50 Target	P95 Target	P99 Target
Authentication (JWT issue)	<100ms	<200ms	<500ms
Device Token Request	<150ms	<300ms	<600ms
Order Creation	<200ms	<500ms	<1000ms
Payment Intent Creation	<150ms	<300ms	<800ms
Payment Confirmation	<200ms	<400ms	<1000ms
Report Generation (sync)	<500ms	<2000ms	<5000ms
Report Generation (async)	<50ms	<100ms	<200ms (queue time)

Load Testing Scenarios

Scenario 1: Peak Transaction Volume

Duration: 1 hour
Target: 15,000 RPS sustained
Traffic Mix:
  - 40% Order Creation (6,000 RPS)
  - 30% Payment Confirmation (4,500 RPS)
  - 20% Bill Retrieval (3,000 RPS)
  - 10% Device Token Requests (1,500 RPS)

Success Criteria:
  - P95 latency < 500ms
  - Error rate < 0.5%
  - No database connection exhaustion
  - CPU < 70% per container

Scenario 2: Device Fleet Connectivity

Duration: 30 minutes
Target: 400,000 concurrent WebSocket connections
Traffic Mix:
  - 400,000 devices maintain WebSocket
  - 5,000 devices/min reconnect (network churn)
  - Heartbeat every 30 seconds

Success Criteria:
  - WebSocket connection success rate > 99%
  - Reconnection time < 5 seconds
  - Memory usage stable (no leaks)
  - Network bandwidth < 100 Mbps

Scenario 3: Burst Load (Black Friday)

Duration: 2 hours
Target: 25,000 RPS peak (167% of normal)
Traffic Pattern:
  - Ramp up: 15k → 25k RPS over 15 minutes
  - Sustain: 25k RPS for 90 minutes
  - Ramp down: 25k → 15k RPS over 15 minutes

Success Criteria:
  - Auto-scaling triggers within 2 minutes
  - P99 latency < 2000ms during peak
  - No service outages
  - Graceful degradation (queue async tasks)

Scalability Strategies

Purpose: Architectural patterns to handle growth in devices, transactions, and users.

Horizontal Scaling

Stateless Services:

All Backend Services (svc-portal, svc-smarttab, mpac-pgw)
  └─ Stateless design (no in-memory session state)
  └─ Session data stored in Redis
  └─ ECS Auto Scaling configuration:
     ├─ Min capacity: 2 tasks per service
     ├─ Max capacity: 50 tasks per service
     ├─ Target CPU: 60%
     ├─ Target memory: 70%
     └─ Scale-out: Add 2 tasks if CPU > 60% for 2 minutes
         Scale-in: Remove 1 task if CPU < 40% for 5 minutes

Auto Scaling Policy Example:

json

{
  "ServiceName": "svc-smarttab",
  "ScalableTargetAction": {
    "MinCapacity": 2,
    "MaxCapacity": 50
  },
  "PolicyConfiguration": {
    "TargetTrackingScaling": {
      "TargetValue": 60.0,
      "PredefinedMetric": {
        "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
      },
      "ScaleOutCooldown": 60,
      "ScaleInCooldown": 300
    }
  }
}

Load Balancing:

Application Load Balancer (ALB)
  └─ Target Group: svc-smarttab
     ├─ Health Check: GET /health (every 30s)
     ├─ Deregistration Delay: 30s (graceful shutdown)
     ├─ Stickiness: Disabled (stateless)
     └─ Load Balancing Algorithm: Round robin

Database Optimization

Read Replicas:

Primary Database (Write)
  └─ svc-portal-primary.rds.amazonaws.com
     ├─ Instance: db.r6g.xlarge (4 vCPU, 32 GB)
     ├─ IOPS: 12,000 (gp3)
     └─ Connections: 200 max

Read Replicas (Read-only)
  ├─ svc-portal-replica-1.rds.amazonaws.com
  ├─ svc-portal-replica-2.rds.amazonaws.com
  └─ svc-portal-replica-3.rds.amazonaws.com (reporting)
     ├─ Instance: db.r6g.large (2 vCPU, 16 GB)
     ├─ Lag: <5 seconds
     └─ Connections: 100 max per replica

Query Routing:
  - Write operations → Primary
  - Read operations (reports) → Replica 3
  - Read operations (API) → Replica 1 or 2 (round robin)

Connection Pooling (pgBouncer):

pgBouncer Configuration
  └─ Mode: Transaction pooling
  └─ Max connections per service: 100
  └─ Server connections: 200 (shared)
  └─ Pool size: 25 per database
  └─ Reserve pool: 5 (for admin)

# pgbouncer.ini
[databases]
mpac_portal = host=rds-primary port=5432 dbname=mpac_portal
mpac_smarttab = host=rds-primary port=5432 dbname=mpac_smarttab

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600

Benefits:

Reduced connection overhead (connection reuse)
Protection against connection exhaustion
Lower database CPU usage
Faster query execution (no handshake overhead)

Partitioning Strategy:

sql

-- Monthly partitions for high-volume tables
CREATE TABLE payment_transactions (
  id UUID NOT NULL,
  payment_intent_id TEXT NOT NULL,
  provider_name TEXT NOT NULL,
  amount DECIMAL(15, 2) NOT NULL,
  status TEXT NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Auto-create partitions with pg_partman
SELECT partman.create_parent(
  p_parent_table := 'public.payment_transactions',
  p_control := 'created_at',
  p_type := 'native',
  p_interval := '1 month',
  p_premake := 3  -- Pre-create 3 future partitions
);

Partitioned Tables:

payment_transactions - 80M+ rows/month
authentication_logs - High write volume (1M+/day)
webhook_events - Retention 90 days

Maintenance:

sql

-- Drop old partitions (automated via cron)
SELECT partman.run_maintenance('public.payment_transactions');

-- Detach partition before drop (instant)
ALTER TABLE payment_transactions
  DETACH PARTITION payment_transactions_2025_01;

-- Archive to S3 (Parquet format)
COPY payment_transactions_2025_01 TO PROGRAM
  'aws s3 cp - s3://mpac-archive/payment_transactions/2025-01.parquet'
  WITH (FORMAT parquet);

-- Drop detached partition
DROP TABLE payment_transactions_2025_01;

Indexing Best Practices:

sql

-- Example: Bills table indexing strategy
CREATE TABLE bills (
  id UUID PRIMARY KEY,
  merchant_id INT NOT NULL,
  store_id INT NOT NULL,
  device_id UUID,
  bill_number TEXT NOT NULL,
  status TEXT NOT NULL,
  total_amount DECIMAL(15, 2),
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE NOT NULL
);

-- Tenant-scoped queries (most common)
CREATE INDEX idx_bills_merchant_store ON bills(merchant_id, store_id);

-- Status queries (frequent)
CREATE INDEX idx_bills_status ON bills(status)
  WHERE status IN ('open', 'pending');  -- Partial index

-- Time-range queries (reports)
CREATE INDEX idx_bills_created_at ON bills(created_at DESC);

-- Composite for complex filters
CREATE INDEX idx_bills_store_status_created
  ON bills(store_id, status, created_at DESC)
  WHERE status = 'closed';  -- Reports on closed bills

-- Device lookup (rare, but critical)
CREATE INDEX idx_bills_device_id ON bills(device_id);

Index Monitoring:

sql

-- Find unused indexes
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan AS scans,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 100  -- Less than 100 scans
  AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes (manual review)
DROP INDEX idx_bills_rarely_used;

Caching Strategy

Redis Configuration:

ElastiCache Redis
  └─ Cluster Mode: Enabled (3 shards)
  └─ Replication: 2 replicas per shard
  └─ Instance: cache.r6g.large (13.07 GB memory)
  └─ Eviction Policy: allkeys-lru
  └─ Max Memory: 12 GB (with 1 GB reserved)

Cache Usage by Service:

Data Type	TTL	Invalidation	Hit Rate Target
User session tokens	15 minutes	On logout	98%
Device tokens	90 seconds	On expiry	95%
Rate limiting counters	1 minute	Time-based	100%
Idempotency keys	24 hours	On expiry	80%
Merchant/store config	5 minutes	On update	99%
JWT public keys	5 minutes	On rotation	99.9%
Payment provider configs	10 minutes	On change	95%

Application-Level Caching:

python

# JWT public key cache (in-memory with TTL)
from cachetools import TTLCache
import asyncio

jwt_key_cache = TTLCache(maxsize=10, ttl=300)  # 5 minutes

async def get_jwt_public_key(key_id: str) -> str:
    """Get JWT public key with in-memory caching."""
    if key_id in jwt_key_cache:
        return jwt_key_cache[key_id]

    # Cache miss: fetch from database
    key = await db.query(
        "SELECT public_key_pem FROM jwt_keys WHERE key_id = :key_id",
        {"key_id": key_id}
    )

    if key:
        jwt_key_cache[key_id] = key.public_key_pem
        return key.public_key_pem

    raise KeyError(f"JWT key not found: {key_id}")

Cache Invalidation:

python

# Merchant config cache invalidation on update
async def update_merchant(merchant_id: int, data: dict):
    """Update merchant and invalidate cache."""
    # Update database
    await db.execute(
        "UPDATE merchants SET name = :name, updated_at = NOW() WHERE id = :id",
        {"id": merchant_id, "name": data["name"]}
    )

    # Invalidate cache
    cache_key = f"merchant:{merchant_id}"
    await redis.delete(cache_key)

    # Sync to PGW
    await pgw_client.sync_merchant(merchant_id)

Message Queue (Future)

NATS Configuration (Planned):

NATS JetStream Cluster
  └─ 3 nodes for high availability
  └─ Streams:
     ├─ receipts.delivery (email/SMS delivery)
     ├─ settlements.generation (daily batch)
     ├─ reports.generation (async reports)
     └─ devices.sync (device config updates)

Stream Configuration:
  - Retention: Work queue (delete after ack)
  - Max age: 24 hours (unprocessed messages)
  - Replicas: 3 (for durability)
  - Max consumers: 10 per stream

Use Cases:

Async receipt delivery (email/SMS)
Settlement generation (daily batch)
Report generation (large reports)
Device sync events (config updates)

Monitoring & Observability (mpac-obs)

Purpose: Comprehensive visibility into system health, performance, and business metrics.

Logging (Loki)

Log Format:

json

{
  "timestamp": "2026-01-28T11:05:00.123Z",
  "level": "INFO",
  "service": "svc-smarttab",
  "correlation_id": "req_abc123",
  "trace_id": "01HXYZ...",
  "span_id": "span_456",
  "caller": {
    "type": "device",
    "id": "device_uuid",
    "merchant_id": 1,
    "store_id": 2
  },
  "message": "Order created successfully",
  "metadata": {
    "order_id": "ORDER-123",
    "amount": 100000,
    "duration_ms": 45
  }
}

Log Aggregation:

All Services
  └─ Structured JSON logs to stdout
  └─ Alloy collector (DaemonSet on ECS)
     ├─ Parse JSON logs
     ├─ Add labels: service, environment, region
     ├─ Filter sensitive data (PII, credentials)
     └─ Forward to Loki (OTLP HTTP)

Loki Storage
  └─ Retention: 14 days (configurable)
  └─ Compression: gzip
  └─ Indexing: By service, level, correlation_id

Correlation ID Propagation:

python

# Middleware injects correlation ID
@app.middleware("http")
async def correlation_id_middleware(request: Request, call_next):
    correlation_id = request.headers.get("X-Correlation-ID", str(uuid4()))
    request.state.correlation_id = correlation_id

    # Propagate to response
    response = await call_next(request)
    response.headers["X-Correlation-ID"] = correlation_id

    return response

# Logger includes correlation ID
logger.info(
    "Order created",
    extra={"correlation_id": request.state.correlation_id}
)

Metrics (Prometheus)

Metric Categories:

1. HTTP Metrics:

python

from prometheus_client import Counter, Histogram

http_requests_total = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

http_request_duration_seconds = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.2, 0.5, 1.0, 2.0, 5.0]
)

# Usage in middleware
@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start_time = time.time()

    response = await call_next(request)

    duration = time.time() - start_time
    http_requests_total.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()
    http_request_duration_seconds.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)

    return response

2. Database Metrics:

python

db_query_duration_seconds = Histogram(
    "db_query_duration_seconds",
    "Database query latency",
    ["operation", "table"],
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0]
)

db_connection_pool_size = Gauge(
    "db_connection_pool_size",
    "Current database connection pool size"
)

db_connection_pool_available = Gauge(
    "db_connection_pool_available",
    "Available database connections in pool"
)

3. Redis Metrics:

python

redis_hit_rate = Gauge(
    "redis_hit_rate",
    "Redis cache hit rate (percentage)"
)

redis_operations_total = Counter(
    "redis_operations_total",
    "Total Redis operations",
    ["operation", "result"]  # result: hit, miss, error
)

4. Business Metrics:

python

payments_total = Counter(
    "payments_total",
    "Total payments processed",
    ["payment_method", "status"]
)

payment_amount_total = Counter(
    "payment_amount_total",
    "Total payment amount processed (JPY)",
    ["payment_method"]
)

devices_online = Gauge(
    "devices_online",
    "Number of devices currently online"
)

Metric Retention:

High-resolution: 7 days (15s scrape interval)
Low-resolution: 30 days (5m downsampled)

Tracing (Tempo)

OpenTelemetry Instrumentation:

python

from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Initialize tracer
tracer = trace.get_tracer(__name__)

# Auto-instrument FastAPI
FastAPIInstrumentor.instrument_app(app)

# Configure OTLP exporter
trace_exporter = OTLPSpanExporter(
    endpoint="http://alloy:4317",
    insecure=True
)

# Manual span creation
async def create_order(order_data: dict):
    with tracer.start_as_current_span("create_order") as span:
        span.set_attribute("order.amount", order_data["amount"])
        span.set_attribute("order.merchant_id", order_data["merchant_id"])

        # Database operation
        with tracer.start_as_current_span("db.insert_order"):
            order_id = await db.insert_order(order_data)

        # External API call
        with tracer.start_as_current_span("pgw.create_payment_intent"):
            payment_intent = await pgw_client.create_payment_intent(order_id)

        return order_id

Trace Context Propagation:

python

# Propagate trace context to downstream services
async def call_pgw(data: dict):
    headers = {}

    # Inject trace context into headers
    from opentelemetry.propagate import inject
    inject(headers)

    # Make HTTP request with trace headers
    response = await httpx.post(
        "https://api.pgw.mpac-cloud.com/v1/payment_intents",
        json=data,
        headers=headers  # Contains traceparent header
    )

    return response.json()

Trace Retention:

Full traces: 7 days
Sampled traces: 30 days (10% sample rate)

Dashboards (Grafana)

Pre-built Dashboards:

1. Service Health Dashboard

Request rate (RPS)
Latency (P50, P95, P99)
Error rate (5xx responses)
Active connections
CPU and memory usage

2. Payment Processing Dashboard

Payment intent creation rate
Payment confirmation latency
Payment success/failure rates by provider
Webhook processing time
Idempotency key hit rate

3. Device Fleet Dashboard

Devices online count
Device authentication rate
WebSocket connections active
Device token refresh rate
Offline devices (by merchant/store)

4. Database Performance Dashboard

Query latency (P95, P99)
Connection pool usage
Slow query count (>100ms)
Table size growth
Index hit rate

5. Business Metrics Dashboard

Daily transaction volume
Payment method breakdown
Top merchants by volume
Average transaction amount
Refund rate

Alerting

Alert Routing:

PagerDuty Integration
  └─ Critical Alerts → Immediate page (24/7)
  └─ Warning Alerts → Slack notification
  └─ Info Alerts → Dashboard only

Alert Rules:

Critical (PagerDuty):

yaml

- alert: ServiceDown
  expr: up{job="svc-smarttab"} == 0
  for: 2m
  annotations:
    summary: "Service {{ $labels.instance }} is down"
    description: "Service has been down for 2 minutes"

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High error rate on {{ $labels.service }}"
    description: "Error rate is {{ $value }}% (threshold: 5%)"

- alert: DatabaseConnectionExhaustion
  expr: db_connection_pool_available < 10
  for: 2m
  annotations:
    summary: "Database connection pool nearly exhausted"
    description: "Only {{ $value }} connections available"

Warning (Slack):

yaml

- alert: HighLatency
  expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1.0
  for: 10m
  annotations:
    summary: "High latency on {{ $labels.endpoint }}"
    description: "P95 latency is {{ $value }}s (threshold: 1s)"

- alert: LowRedisHitRate
  expr: redis_hit_rate < 90
  for: 15m
  annotations:
    summary: "Low Redis cache hit rate"
    description: "Hit rate is {{ $value }}% (threshold: 90%)"

Cross-References

Payment Gateway - Payment processing performance requirements
Device Management - WebSocket scalability
Reporting & Analytics - Report generation performance

Communication Patterns - Rate limiting and throttling
Database Architecture - Database optimization strategies
Security Architecture - Security overhead on performance

AWS Infrastructure - ECS auto-scaling configuration
Load Testing - Performance test scenarios
Capacity Planning - Resource sizing guidelines

Previous: Security ArchitectureNext: Deployment OverviewUp: Technical Architecture Index

Performance & Scalability

Overview

Table of Contents

Performance Targets

Key Performance Indicators

Latency Breakdown by Endpoint Category

Load Testing Scenarios

Scalability Strategies

Horizontal Scaling

Database Optimization

Caching Strategy

Message Queue (Future)

Monitoring & Observability (mpac-obs)

Logging (Loki)

Metrics (Prometheus)

Tracing (Tempo)

Dashboards (Grafana)

Alerting

Cross-References

Navigation

Performance & Scalability ​

Overview ​

Table of Contents ​

Performance Targets ​

Key Performance Indicators ​

Latency Breakdown by Endpoint Category ​

Load Testing Scenarios ​

Scalability Strategies ​

Horizontal Scaling ​

Database Optimization ​

Caching Strategy ​

Message Queue (Future) ​

Monitoring & Observability (mpac-obs) ​

Logging (Loki) ​

Metrics (Prometheus) ​

Tracing (Tempo) ​

Dashboards (Grafana) ​

Alerting ​

Cross-References ​

Related Domains ​

Related Technical Sections ​

Related Deployment Sections ​

Navigation ​

Performance & Scalability

Overview

Table of Contents

Performance Targets

Key Performance Indicators

Latency Breakdown by Endpoint Category

Load Testing Scenarios

Scalability Strategies

Horizontal Scaling

Database Optimization

Caching Strategy

Message Queue (Future)

Monitoring & Observability (mpac-obs)

Logging (Loki)

Metrics (Prometheus)

Tracing (Tempo)

Dashboards (Grafana)

Alerting

Cross-References

Related Domains

Related Technical Sections

Related Deployment Sections

Navigation