Skip to content

Deployment Strategy

Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28


Overview

This document defines the deployment strategy for the MPAC SmartPOS Cloud Platform. The strategy emphasizes zero-downtime deployments, safe rollback mechanisms, and backward-compatible database migrations. The blue-green deployment pattern ensures that new versions can be thoroughly tested before switching production traffic, with instant rollback capability if issues arise.

Table of Contents


Blue-Green Deployment

Purpose: Enable zero-downtime deployments with instant rollback capability.

Deployment Flow

Blue Environment (current production)
  └─ Running v2.0.0
  └─ Handling 100% production traffic
  └─ Target Group: TG-Blue

Green Environment (new version)
  └─ Deploy v2.1.0
  └─ ECS tasks started with new image
  └─ Health checks pass
  └─ Run smoke tests
  └─ Run integration tests
  └─ If all pass:
     └─ Switch Route 53 weighted routing to Green (5% traffic)
     └─ Monitor for 15 minutes (error rates, latency, logs)
     └─ If stable: Increase to 50% traffic
     └─ Monitor for 30 minutes
     └─ If stable: Increase to 100% traffic
     └─ Monitor for 1 hour
     └─ If stable: Decommission Blue environment
     └─ If issues: Rollback to Blue (instant)

Deployment Phases

Phase 1: Green Environment Preparation

  1. Provision Green Environment:

    • Create new ECS task definitions with v2.1.0 image
    • Start ECS tasks in Green target group
    • Wait for health checks to pass (all tasks healthy)
  2. Pre-Production Validation:

    • Run smoke tests against Green environment
    • Verify database connectivity
    • Check external API integrations
    • Validate authentication flows

Phase 2: Traffic Shifting (Gradual Cutover)

  1. 5% Traffic (Canary):

    • Update Route 53 weighted routing policy:
      yaml
      Blue: Weight 95
      Green: Weight 5
    • Monitor for 15 minutes:
      • Error rate < 0.1%
      • P95 latency < 500ms
      • No 5xx errors
    • Alert on anomalies
  2. 50% Traffic:

    • Update Route 53 weighted routing policy:
      yaml
      Blue: Weight 50
      Green: Weight 50
    • Monitor for 30 minutes:
      • Compare Blue vs Green metrics
      • Check for memory leaks
      • Validate transaction success rates
  3. 100% Traffic (Full Cutover):

    • Update Route 53 weighted routing policy:
      yaml
      Blue: Weight 0
      Green: Weight 100
    • Monitor for 1 hour:
      • Sustained performance
      • No anomalies in logs
      • Business metrics stable

Phase 3: Blue Environment Decommission

  1. Wait Period: Keep Blue running for 1 hour after 100% cutover
  2. Verification: Confirm no rollback needed
  3. Decommission:
    • Stop Blue ECS tasks
    • Preserve Blue task definition for emergency rollback
    • Archive Blue environment configuration

Traffic Shifting Automation

CloudFormation/Terraform Configuration:

yaml
# Route 53 Weighted Routing Policy
Route53RecordBlue:
  Type: AWS::Route53::RecordSet
  Properties:
    HostedZoneId: !Ref HostedZoneId
    Name: api.mpac-cloud.com
    Type: A
    SetIdentifier: "blue-environment"
    Weight: !Ref BlueWeight  # Initially 100
    AliasTarget:
      DNSName: !GetAtt ALBBlue.DNSName
      HostedZoneId: !GetAtt ALBBlue.CanonicalHostedZoneID

Route53RecordGreen:
  Type: AWS::Route53::RecordSet
  Properties:
    HostedZoneId: !Ref HostedZoneId
    Name: api.mpac-cloud.com
    Type: A
    SetIdentifier: "green-environment"
    Weight: !Ref GreenWeight  # Initially 0
    AliasTarget:
      DNSName: !GetAtt ALBGreen.DNSName
      HostedZoneId: !GetAtt ALBGreen.CanonicalHostedZoneID

Monitoring During Deployment

Key Metrics to Watch:

yaml
Metrics:
  - ErrorRate:
      Threshold: < 0.1%
      Alert: error_rate > 0.5%

  - Latency:
      P50: < 100ms
      P95: < 500ms
      P99: < 1000ms
      Alert: p95 > 1000ms

  - Success Rate:
      Threshold: > 99.9%
      Alert: success_rate < 99.5%

  - Database Connections:
      Alert: connection_errors > 10

  - Memory Usage:
      Alert: memory_usage > 85%

Rollback Triggers

Automatic Rollback Conditions:

  • Error rate > 1% for 5 consecutive minutes
  • P95 latency > 2000ms for 5 consecutive minutes
  • Critical service unavailable (database, Redis)
  • More than 10 5xx errors in 1 minute

Manual Rollback Decision:

  • Business metric anomalies (transaction volume drop)
  • Customer reports of issues
  • Security concerns
  • Data integrity issues

Database Migrations

Purpose: Safe, backward-compatible schema changes that don't block deployments or require downtime.

Backward-Compatible Migrations Only

Principle: All migrations must support both old and new code running simultaneously during blue-green deployment.

Two-Phase Deploy

Phase 1: Additive Schema Changes

Deploy schema changes that ADD new structures without removing old ones:

  1. Add new columns (nullable or with defaults):

    sql
    -- Migration 001: Add new column
    ALTER TABLE merchants
      ADD COLUMN external_id TEXT NULL;
    
    -- Add index for new column
    CREATE INDEX idx_merchants_external_id ON merchants(external_id);
  2. Deploy code changes:

    • New code v2.1.0 uses external_id column
    • Old code v2.0.0 ignores external_id column
    • Both versions work correctly
  3. Gradual cutover:

    • Blue (v2.0.0) and Green (v2.1.0) run simultaneously
    • No conflicts because migration is additive

Phase 2: Deprecation and Cleanup

After full cutover, remove deprecated columns in a separate migration:

  1. Wait period: 24 hours after 100% traffic on Green
  2. Verify: No Blue environment running
  3. Deploy cleanup migration:
    sql
    -- Migration 002: Remove deprecated column (after v2.1.0 rollout)
    ALTER TABLE merchants
      DROP COLUMN old_column_name;

Migration Examples

Example 1: Renaming a Column

Avoid this (breaks old code):

sql
-- BAD: Old code will break
ALTER TABLE payments
  RENAME COLUMN amount TO total_amount;

Do this instead (two-phase):

sql
-- Phase 1: Add new column, populate from old column
ALTER TABLE payments
  ADD COLUMN total_amount DECIMAL(15, 2);

UPDATE payments
  SET total_amount = amount
  WHERE total_amount IS NULL;

-- Deploy new code that reads/writes total_amount
-- Old code still reads/writes amount

-- Phase 2: After full cutover (24h later)
ALTER TABLE payments
  DROP COLUMN amount;

Example 2: Changing Column Type

Avoid this (breaks old code):

sql
-- BAD: Type change breaks old code
ALTER TABLE devices
  ALTER COLUMN device_id TYPE UUID USING device_id::UUID;

Do this instead (two-phase):

sql
-- Phase 1: Add new column with new type
ALTER TABLE devices
  ADD COLUMN device_uuid UUID;

-- Populate new column
UPDATE devices
  SET device_uuid = device_id::UUID
  WHERE device_uuid IS NULL;

-- Deploy new code that uses device_uuid
-- Old code continues using device_id

-- Phase 2: After full cutover (24h later)
ALTER TABLE devices
  DROP COLUMN device_id;

ALTER TABLE devices
  RENAME COLUMN device_uuid TO device_id;

Example 3: Adding NOT NULL Constraint

Safe approach:

sql
-- Phase 1: Add column as nullable with default
ALTER TABLE stores
  ADD COLUMN timezone TEXT DEFAULT 'Asia/Tokyo';

-- Backfill existing rows
UPDATE stores
  SET timezone = 'Asia/Tokyo'
  WHERE timezone IS NULL;

-- Phase 2: After verification (24h later)
ALTER TABLE stores
  ALTER COLUMN timezone SET NOT NULL;

Migration Tools

svc-portal (Python/Alembic)

bash
# Create new migration
cd mpac-smartpos/svc-portal
uv run alembic revision --autogenerate -m "Add external_id to merchants"

# Review generated migration
# Edit migration to ensure backward compatibility

# Apply migration
uv run alembic upgrade head

# Rollback if needed
uv run alembic downgrade -1

svc-smarttab and mpac-pgw (Go/golang-migrate)

bash
# Create new migration
cd mpac-smartpos/svc-smarttab
migrate create -ext sql -dir db/migrations -seq add_external_id_to_merchants

# Edit migration files:
# - 000001_add_external_id_to_merchants.up.sql
# - 000001_add_external_id_to_merchants.down.sql

# Apply migration
migrate -path db/migrations -database "postgresql://user:pass@localhost/smarttab" up

# Rollback if needed
migrate -path db/migrations -database "postgresql://user:pass@localhost/smarttab" down 1

Migration Validation

Pre-Deployment Checklist:

  • [ ] Migration tested in local development environment
  • [ ] Migration tested in staging with production-like data volume
  • [ ] Rollback migration tested and verified
  • [ ] Both old and new code tested with migration applied
  • [ ] No DROP statements for columns still in use
  • [ ] No ALTER TYPE without dual-column approach
  • [ ] Indexes added for new columns
  • [ ] Migration time estimated (< 5 minutes for production)

Rollback Procedure

Purpose: Quick recovery from failed deployments or issues discovered post-deployment.

Instant Rollback (Traffic Shift)

When to Use: Issues detected during or immediately after deployment.

Steps:

  1. Detect Issue:

    • Automated alerts (error rate, latency)
    • Manual identification (customer reports, monitoring)
  2. Initiate Rollback:

    bash
    # Update Route 53 weights back to Blue
    aws route53 change-resource-record-sets \
      --hosted-zone-id Z123456 \
      --change-batch file://rollback-to-blue.json
    json
    {
      "Changes": [
        {
          "Action": "UPSERT",
          "ResourceRecordSet": {
            "Name": "api.mpac-cloud.com",
            "Type": "A",
            "SetIdentifier": "blue-environment",
            "Weight": 100,
            "AliasTarget": {
              "DNSName": "blue-alb.us-east-1.elb.amazonaws.com",
              "HostedZoneId": "Z1234567890ABC"
            }
          }
        },
        {
          "Action": "UPSERT",
          "ResourceRecordSet": {
            "Name": "api.mpac-cloud.com",
            "Type": "A",
            "SetIdentifier": "green-environment",
            "Weight": 0,
            "AliasTarget": {
              "DNSName": "green-alb.us-east-1.elb.amazonaws.com",
              "HostedZoneId": "Z1234567890ABC"
            }
          }
        }
      ]
    }
  3. Verify Rollback:

    • Monitor error rates drop
    • Verify traffic routing to Blue
    • Check health metrics stabilize
  4. Post-Rollback Actions:

    • Investigate root cause
    • Fix issues in Green environment
    • Test fix in staging
    • Retry deployment when ready

Timeline: 30-60 seconds for traffic to fully shift back to Blue.

Database Migration Rollback

When to Use: Schema changes caused issues or need to be reverted.

Important: Only rollback migrations if:

  • No data has been written to new columns
  • Old schema is still compatible
  • Tested rollback in staging first

Steps:

  1. Stop Green Environment:

    bash
    # Scale down Green ECS tasks to 0
    aws ecs update-service \
      --cluster mpac-prod \
      --service svc-portal-green \
      --desired-count 0
  2. Rollback Migration (svc-portal):

    bash
    cd mpac-smartpos/svc-portal
    uv run alembic downgrade -1
  3. Rollback Migration (svc-smarttab):

    bash
    cd mpac-smartpos/svc-smarttab
    migrate -path db/migrations -database "$DB_URL" down 1
  4. Verify Database State:

    sql
    -- Check schema version
    SELECT * FROM alembic_version;  -- svc-portal
    SELECT * FROM schema_migrations; -- svc-smarttab
    
    -- Verify tables
    \d merchants
  5. Restart Blue Environment:

    • Ensure Blue uses old code compatible with rolled-back schema

Warning: Database rollbacks are risky. Prefer forward-fixes when possible.

Feature Flag Rollback

When to Use: Disable specific features without full deployment rollback.

Mechanism:

python
# Feature flag service (LaunchDarkly or custom)
from feature_flags import is_enabled

@app.post("/v1/payments/create")
async def create_payment(request: PaymentRequest):
    if is_enabled("new_payment_flow", user=request.user):
        return await new_payment_flow(request)
    else:
        return await legacy_payment_flow(request)

Rollback Steps:

  1. Identify Problematic Feature:

    • Correlate errors with feature flag usage
  2. Disable Feature Flag:

    bash
    # Via LaunchDarkly UI or API
    curl -X PATCH https://app.launchdarkly.com/api/v2/flags/default/new_payment_flow \
      -H "Authorization: api-key-xyz" \
      -d '{"variations": [{"value": false}]}'
  3. Verify Rollback:

    • Monitor metrics improve
    • Confirm feature disabled for all users
  4. Timeline: Instant (< 5 seconds for flag propagation)

Rollback Decision Matrix

Issue TypeRollback MethodTimelineRisk
High Error RateTraffic shift to Blue30-60sLow
Performance DegradationTraffic shift to Blue30-60sLow
Feature BugFeature flag disable< 5sVery Low
Database CorruptionDatabase rollback + traffic shift5-10minHigh
Security VulnerabilityImmediate traffic shift< 30sLow

Cross-References


Previous: AWS InfrastructureNext: EnvironmentsUp: Deployment Index

MPAC — MP-Solution Advanced Cloud Service