Skip to content

CI/CD Pipeline

Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28


Overview

This document defines the CI/CD pipeline for the MPAC SmartPOS Cloud Platform. The pipeline automates testing, building, and deployment processes using GitHub Actions. The workflow ensures code quality through automated checks, builds Docker images for each service, and orchestrates deployments to staging and production environments with appropriate approvals and monitoring.

Table of Contents


GitHub Actions Workflow

Purpose: Automated continuous integration and deployment pipeline triggered by code changes.

Workflow Triggers

yaml
on:
  push:
    branches:
      - main        # Production deployments
      - develop     # Staging deployments
  pull_request:
    branches:
      - main
      - develop

Trigger Behavior:

  • Pull Request: Run all test and build jobs, no deployment
  • Push to develop: Run all jobs + deploy to staging
  • Push to main: Run all jobs + deploy to production (with approval)

Complete Workflow Definition

yaml
name: MPAC CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

env:
  AWS_REGION: ap-northeast-1
  ECR_REGISTRY: 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com

jobs:
  # ============================================
  # Test Jobs (run in parallel)
  # ============================================

  test-svc-portal:
    name: Test svc-portal (Python)
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh

      - name: Install dependencies
        working-directory: mpac-smartpos/svc-portal
        run: |
          uv venv
          source .venv/bin/activate
          uv pip install -e ".[dev]"

      - name: Lint with ruff
        working-directory: mpac-smartpos/svc-portal
        run: |
          source .venv/bin/activate
          ruff check .

      - name: Type check with mypy
        working-directory: mpac-smartpos/svc-portal
        run: |
          source .venv/bin/activate
          mypy app/

      - name: Run unit tests
        working-directory: mpac-smartpos/svc-portal
        run: |
          source .venv/bin/activate
          pytest tests/unit --cov=app --cov-report=xml --cov-report=term

      - name: Check coverage
        run: |
          source .venv/bin/activate
          coverage report --fail-under=80

      - name: Upload coverage report
        uses: codecov/codecov-action@v3
        with:
          files: ./mpac-smartpos/svc-portal/coverage.xml
          flags: svc-portal

  test-svc-smarttab:
    name: Test svc-smarttab (Go)
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.24'

      - name: Install dependencies
        working-directory: mpac-smartpos/svc-smarttab
        run: go mod download

      - name: Lint with golangci-lint
        uses: golangci/golangci-lint-action@v3
        with:
          version: latest
          working-directory: mpac-smartpos/svc-smarttab

      - name: Run unit tests
        working-directory: mpac-smartpos/svc-smarttab
        run: go test -v -coverprofile=coverage.out ./...

      - name: Check coverage
        working-directory: mpac-smartpos/svc-smarttab
        run: |
          go tool cover -func=coverage.out
          # Fail if coverage < 80%
          COVERAGE=$(go tool cover -func=coverage.out | grep total | awk '{print $3}' | sed 's/%//')
          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
            echo "Coverage $COVERAGE% is below 80%"
            exit 1
          fi

      - name: Upload coverage report
        uses: codecov/codecov-action@v3
        with:
          files: ./mpac-smartpos/svc-smarttab/coverage.out
          flags: svc-smarttab

  test-mpac-pgw:
    name: Test mpac-pgw (Go)
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.24'

      - name: Install dependencies
        working-directory: mpac-pgw
        run: go mod download

      - name: Lint
        uses: golangci/golangci-lint-action@v3
        with:
          version: latest
          working-directory: mpac-pgw

      - name: Run unit tests
        working-directory: mpac-pgw
        run: go test -v -coverprofile=coverage.out ./...

      - name: Check coverage
        working-directory: mpac-pgw
        run: |
          go tool cover -func=coverage.out
          COVERAGE=$(go tool cover -func=coverage.out | grep total | awk '{print $3}' | sed 's/%//')
          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
            echo "Coverage $COVERAGE% is below 80%"
            exit 1
          fi

      - name: Upload coverage report
        uses: codecov/codecov-action@v3
        with:
          files: ./mpac-pgw/coverage.out
          flags: mpac-pgw

  test-frontend:
    name: Test Frontend (React/TypeScript)
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        working-directory: mpac-frontend
        run: pnpm install --frozen-lockfile

      - name: Lint with ESLint
        working-directory: mpac-frontend
        run: pnpm lint

      - name: Type check with TypeScript
        working-directory: mpac-frontend
        run: pnpm check-types

      - name: Run unit tests
        working-directory: mpac-frontend
        run: pnpm test --coverage

      - name: Build check
        working-directory: mpac-frontend
        run: pnpm build

      - name: Upload coverage report
        uses: codecov/codecov-action@v3
        with:
          files: ./mpac-frontend/coverage/coverage-final.json
          flags: frontend

  # ============================================
  # Build and Push Docker Images
  # ============================================

  build-and-push:
    name: Build and Push Docker Images
    runs-on: ubuntu-latest
    needs: [test-svc-portal, test-svc-smarttab, test-mpac-pgw, test-frontend]
    if: github.event_name == 'push'
    outputs:
      image-tag: ${{ steps.image-tag.outputs.tag }}
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GithubActionsRole
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Generate image tag
        id: image-tag
        run: |
          SHORT_SHA=$(echo ${{ github.sha }} | cut -c1-7)
          BRANCH_NAME=$(echo ${{ github.ref_name }} | sed 's/\//-/g')
          TAG="${BRANCH_NAME}-${SHORT_SHA}-$(date +%Y%m%d%H%M%S)"
          echo "tag=$TAG" >> $GITHUB_OUTPUT

      - name: Build and push svc-portal
        working-directory: mpac-smartpos/svc-portal
        run: |
          docker build -t $ECR_REGISTRY/mpac-svc-portal:${{ steps.image-tag.outputs.tag }} .
          docker push $ECR_REGISTRY/mpac-svc-portal:${{ steps.image-tag.outputs.tag }}

      - name: Build and push svc-smarttab
        working-directory: mpac-smartpos/svc-smarttab
        run: |
          docker build -t $ECR_REGISTRY/mpac-svc-smarttab:${{ steps.image-tag.outputs.tag }} .
          docker push $ECR_REGISTRY/mpac-svc-smarttab:${{ steps.image-tag.outputs.tag }}

      - name: Build and push mpac-pgw
        working-directory: mpac-pgw
        run: |
          docker build -t $ECR_REGISTRY/mpac-pgw:${{ steps.image-tag.outputs.tag }} .
          docker push $ECR_REGISTRY/mpac-pgw:${{ steps.image-tag.outputs.tag }}

      - name: Build and push frontend
        working-directory: mpac-frontend
        run: |
          docker build -t $ECR_REGISTRY/mpac-frontend:${{ steps.image-tag.outputs.tag }} .
          docker push $ECR_REGISTRY/mpac-frontend:${{ steps.image-tag.outputs.tag }}

  # ============================================
  # Deploy to Staging
  # ============================================

  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.ref == 'refs/heads/develop'
    environment:
      name: staging
      url: https://mpac-cloud-stg.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GithubActionsRole
          aws-region: ${{ env.AWS_REGION }}

      - name: Update ECS task definitions
        run: |
          # Update svc-portal task definition
          aws ecs register-task-definition \
            --cli-input-json file://infra/ecs/task-definitions/svc-portal-staging.json \
            --container-definitions '[{
              "name": "svc-portal",
              "image": "${{ env.ECR_REGISTRY }}/mpac-svc-portal:${{ needs.build-and-push.outputs.image-tag }}"
            }]'

          # Update svc-smarttab task definition
          aws ecs register-task-definition \
            --cli-input-json file://infra/ecs/task-definitions/svc-smarttab-staging.json \
            --container-definitions '[{
              "name": "svc-smarttab",
              "image": "${{ env.ECR_REGISTRY }}/mpac-svc-smarttab:${{ needs.build-and-push.outputs.image-tag }}"
            }]'

          # Update mpac-pgw task definition
          aws ecs register-task-definition \
            --cli-input-json file://infra/ecs/task-definitions/mpac-pgw-staging.json \
            --container-definitions '[{
              "name": "mpac-pgw",
              "image": "${{ env.ECR_REGISTRY }}/mpac-pgw:${{ needs.build-and-push.outputs.image-tag }}"
            }]'

      - name: Deploy to ECS
        run: |
          # Update svc-portal service
          aws ecs update-service \
            --cluster mpac-staging \
            --service svc-portal \
            --task-definition svc-portal-staging:latest \
            --force-new-deployment

          # Update svc-smarttab service
          aws ecs update-service \
            --cluster mpac-staging \
            --service svc-smarttab \
            --task-definition svc-smarttab-staging:latest \
            --force-new-deployment

          # Update mpac-pgw service
          aws ecs update-service \
            --cluster mpac-staging \
            --service mpac-pgw \
            --task-definition mpac-pgw-staging:latest \
            --force-new-deployment

      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster mpac-staging \
            --services svc-portal svc-smarttab mpac-pgw

      - name: Run smoke tests
        run: |
          # Health check
          curl -f https://mpac-cloud-stg.com/health || exit 1

          # API smoke test
          curl -f https://api.mpac-cloud-stg.com/v1/health || exit 1

      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          webhook-url: ${{ secrets.SLACK_WEBHOOK }}
          payload: |
            {
              "text": "Staging Deployment: ${{ job.status }}",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Staging Deployment*\nStatus: ${{ job.status }}\nCommit: ${{ github.sha }}\nBranch: ${{ github.ref_name }}\nImage: ${{ needs.build-and-push.outputs.image-tag }}"
                  }
                }
              ]
            }

  # ============================================
  # Deploy to Production
  # ============================================

  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://mpac-cloud.com
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GithubActionsRole
          aws-region: ${{ env.AWS_REGION }}

      - name: Manual approval required
        uses: trstringer/manual-approval@v1
        with:
          secret: ${{ github.TOKEN }}
          approvers: devops-team
          minimum-approvals: 2

      - name: Blue-green deployment
        run: |
          # This is a placeholder for blue-green deployment logic
          # Actual implementation would involve:
          # 1. Create new task definitions with new image
          # 2. Deploy to "green" target group
          # 3. Run smoke tests on green
          # 4. Gradually shift traffic (5% -> 50% -> 100%)
          # 5. Monitor for 1 hour
          # 6. Decommission blue if stable
          echo "Executing blue-green deployment..."
          ./scripts/deploy-blue-green.sh \
            --image-tag ${{ needs.build-and-push.outputs.image-tag }} \
            --cluster mpac-prod \
            --services svc-portal,svc-smarttab,mpac-pgw

      - name: Monitor for 1 hour
        run: |
          # Monitor key metrics for 1 hour after deployment
          ./scripts/monitor-deployment.sh \
            --duration 3600 \
            --alert-on-error-rate 0.5 \
            --alert-on-latency-p95 1000

      - name: Auto-rollback on errors
        if: failure()
        run: |
          echo "Deployment failed or errors detected. Rolling back..."
          ./scripts/rollback-deployment.sh \
            --cluster mpac-prod \
            --services svc-portal,svc-smarttab,mpac-pgw

      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          webhook-url: ${{ secrets.SLACK_WEBHOOK }}
          payload: |
            {
              "text": "Production Deployment: ${{ job.status }}",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Production Deployment*\nStatus: ${{ job.status }}\nCommit: ${{ github.sha }}\nBranch: ${{ github.ref_name }}\nImage: ${{ needs.build-and-push.outputs.image-tag }}\n@channel"
                  }
                }
              ]
            }

Pipeline Stages

Purpose: Break down the CI/CD pipeline into distinct stages with clear responsibilities.

Stage 1: Test Stage

Purpose: Validate code quality, type safety, and correctness through automated tests.

test-svc-portal (Python/FastAPI)

Steps:

  1. Lint with ruff:

    • Check Python code style
    • Enforce import ordering
    • Detect unused imports and variables
    • Exit on any linting errors
  2. Type check with mypy:

    • Verify type annotations
    • Check for type safety violations
    • Ensure all functions have type hints
    • Strict mode enabled
  3. Unit tests with pytest:

    • Run all tests in tests/unit/
    • Generate coverage report
    • Require 80% code coverage minimum
    • Fail if coverage drops below threshold
  4. Coverage report:

    • Upload to Codecov
    • Track coverage trends over time
    • Comment on PRs with coverage changes

Success Criteria:

  • All linting checks pass
  • No type errors
  • All tests pass
  • Coverage ≥ 80%

test-svc-smarttab (Go/Gin)

Steps:

  1. Lint with golangci-lint:

    • Run multiple linters (gofmt, govet, staticcheck)
    • Detect common Go mistakes
    • Enforce code style consistency
  2. Unit tests with go test:

    • Run all tests in all packages
    • Generate coverage report
    • Race detector enabled (-race flag)
  3. Coverage check:

    • Calculate total coverage percentage
    • Fail if coverage < 80%
    • Upload to Codecov

Success Criteria:

  • All linting checks pass
  • All tests pass
  • Coverage ≥ 80%
  • No race conditions detected

test-mpac-pgw (Go/Gin)

Steps: Same as test-svc-smarttab

Success Criteria:

  • All linting checks pass
  • All tests pass
  • Coverage ≥ 80%

test-frontend (React/TypeScript)

Steps:

  1. Lint with ESLint:

    • Check JavaScript/TypeScript code style
    • Enforce React best practices
    • Detect unused variables and imports
  2. Type check with TypeScript:

    • Compile all TypeScript files
    • Check for type errors
    • Strict mode enabled
  3. Unit tests with Vitest:

    • Run all component and utility tests
    • Generate coverage report
    • Require 80% coverage (configurable per package)
  4. Build check:

    • Build all apps and packages
    • Ensure no build errors
    • Verify bundle sizes

Success Criteria:

  • All linting checks pass
  • No TypeScript errors
  • All tests pass
  • Build succeeds

Stage 2: Build and Push

Purpose: Build production-ready Docker images and push to Amazon ECR.

Steps:

  1. Configure AWS credentials:

    • Use OIDC to assume IAM role
    • No long-lived credentials stored in GitHub
    • Role has ECR push permissions only
  2. Login to Amazon ECR:

    • Authenticate Docker with ECR registry
    • Valid for 12 hours
  3. Generate image tag:

    • Format: {branch}-{short-sha}-{timestamp}
    • Example: main-a1b2c3d-20260128103045
    • Ensures unique, traceable image tags
  4. Build Docker images:

    • Build each service's Dockerfile
    • Multi-stage builds for smaller images
    • Cache layers for faster builds
  5. Push to ECR:

    • Tag images with commit SHA tag
    • Also tag with latest for branch (e.g., main-latest)
    • Upload to ECR registry

Output:

  • Image tag: {branch}-{short-sha}-{timestamp}
  • All services built and pushed
  • Images available in ECR

Docker Image Optimization:

dockerfile
# Example: svc-portal Dockerfile
FROM python:3.11-slim AS builder
WORKDIR /app
RUN pip install uv
COPY pyproject.toml .
RUN uv pip install --system -e .

FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY app/ app/
EXPOSE 8002
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002"]

Stage 3: Deploy to Staging

Purpose: Automatically deploy to staging environment for testing.

Trigger: Push to develop branch

Steps:

  1. Update ECS task definitions:

    • Register new task definition with new image tag
    • Keep existing configuration (CPU, memory, env vars)
    • Create new revision number
  2. Deploy to ECS:

    • Update ECS service to use new task definition
    • Force new deployment (stop old tasks, start new)
    • ECS performs rolling update:
      • Start new task
      • Wait for health check
      • Stop old task
      • Repeat for all tasks
  3. Wait for deployment:

    • Use aws ecs wait services-stable
    • Timeout: 15 minutes
    • Fail if deployment doesn't stabilize
  4. Run smoke tests:

    • Health check endpoints
    • Basic API functionality
    • Authentication flow
    • Database connectivity
  5. Notify team:

    • Send Slack notification with deployment status
    • Include commit SHA, branch, image tag
    • Link to staging environment

Success Criteria:

  • All services deployed successfully
  • All health checks pass
  • Smoke tests pass
  • Team notified

Stage 4: Deploy to Production

Purpose: Deploy to production with blue-green strategy and manual approval.

Trigger: Push to main branch

Steps:

  1. Manual approval:

    • Require approval from 2 members of devops-team
    • Prevents accidental production deployments
    • Approval via GitHub UI
  2. Blue-green deployment:

    • Execute blue-green deployment script
    • Create "green" environment with new images
    • Run smoke tests on green
    • Gradually shift traffic (5% → 50% → 100%)
    • See Deployment Strategy for details
  3. Monitor for 1 hour:

    • Watch key metrics (error rate, latency, throughput)
    • Alert on anomalies
    • Auto-rollback if thresholds exceeded
  4. Auto-rollback on errors:

    • If any step fails, trigger rollback
    • Switch traffic back to blue environment
    • Alert devops team
  5. Notify team:

    • Send Slack notification with deployment status
    • Tag @channel for production deployments
    • Include commit SHA, image tag, deployment duration

Success Criteria:

  • Approvals obtained
  • Green environment healthy
  • Traffic shifted successfully
  • Metrics stable for 1 hour
  • Team notified

Deployment Automation

Purpose: Automated scripts for deployment orchestration and monitoring.

Blue-Green Deployment Script

Script: scripts/deploy-blue-green.sh

Purpose: Orchestrate blue-green deployment with gradual traffic shifting.

Usage:

bash
./scripts/deploy-blue-green.sh \
  --image-tag main-a1b2c3d-20260128103045 \
  --cluster mpac-prod \
  --services svc-portal,svc-smarttab,mpac-pgw

Implementation:

bash
#!/bin/bash
set -e

IMAGE_TAG=$1
CLUSTER=$2
SERVICES=$3

echo "Starting blue-green deployment..."
echo "Image tag: $IMAGE_TAG"
echo "Cluster: $CLUSTER"
echo "Services: $SERVICES"

# 1. Create green task definitions
for SERVICE in ${SERVICES//,/ }; do
  echo "Creating green task definition for $SERVICE..."
  aws ecs register-task-definition \
    --cli-input-json file://infra/ecs/task-definitions/${SERVICE}-prod.json \
    --family ${SERVICE}-green \
    --container-definitions "[{
      \"name\": \"${SERVICE}\",
      \"image\": \"${ECR_REGISTRY}/${SERVICE}:${IMAGE_TAG}\"
    }]"
done

# 2. Create green target groups
for SERVICE in ${SERVICES//,/ }; do
  echo "Creating green target group for $SERVICE..."
  aws elbv2 create-target-group \
    --name ${SERVICE}-green-tg \
    --protocol HTTP \
    --port 8002 \
    --vpc-id $VPC_ID \
    --health-check-path /health
done

# 3. Deploy to green environment
for SERVICE in ${SERVICES//,/ }; do
  echo "Deploying $SERVICE to green environment..."
  aws ecs update-service \
    --cluster $CLUSTER \
    --service ${SERVICE}-green \
    --task-definition ${SERVICE}-green:latest \
    --force-new-deployment
done

# 4. Wait for green to be healthy
echo "Waiting for green environment to be healthy..."
aws ecs wait services-stable --cluster $CLUSTER --services ${SERVICES//,/-green }

# 5. Run smoke tests on green
echo "Running smoke tests on green environment..."
./scripts/smoke-tests.sh --target green

# 6. Shift traffic gradually
echo "Shifting traffic to green..."
shift_traffic 5   # 5% to green
sleep 900         # Monitor for 15 minutes
shift_traffic 50  # 50% to green
sleep 1800        # Monitor for 30 minutes
shift_traffic 100 # 100% to green
sleep 3600        # Monitor for 1 hour

# 7. Decommission blue
echo "Decommissioning blue environment..."
for SERVICE in ${SERVICES//,/ }; do
  aws ecs update-service \
    --cluster $CLUSTER \
    --service ${SERVICE}-blue \
    --desired-count 0
done

echo "Blue-green deployment completed successfully!"

Monitoring Script

Script: scripts/monitor-deployment.sh

Purpose: Monitor key metrics during deployment and trigger rollback if needed.

Usage:

bash
./scripts/monitor-deployment.sh \
  --duration 3600 \
  --alert-on-error-rate 0.5 \
  --alert-on-latency-p95 1000

Implementation:

bash
#!/bin/bash
set -e

DURATION=$1
ERROR_RATE_THRESHOLD=$2
LATENCY_THRESHOLD=$3

START_TIME=$(date +%s)
END_TIME=$((START_TIME + DURATION))

echo "Monitoring deployment for $DURATION seconds..."
echo "Error rate threshold: $ERROR_RATE_THRESHOLD%"
echo "Latency threshold: $LATENCY_THRESHOLD ms"

while [ $(date +%s) -lt $END_TIME ]; do
  # Query CloudWatch metrics
  ERROR_RATE=$(aws cloudwatch get-metric-statistics \
    --namespace MPAC \
    --metric-name ErrorRate \
    --start-time $(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 300 \
    --statistics Average \
    --query 'Datapoints[0].Average' \
    --output text)

  LATENCY_P95=$(aws cloudwatch get-metric-statistics \
    --namespace MPAC \
    --metric-name Latency \
    --start-time $(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 300 \
    --statistics p95 \
    --query 'Datapoints[0].p95' \
    --output text)

  echo "Current metrics: Error rate: $ERROR_RATE%, Latency P95: $LATENCY_P95 ms"

  # Check thresholds
  if (( $(echo "$ERROR_RATE > $ERROR_RATE_THRESHOLD" | bc -l) )); then
    echo "ERROR: Error rate ($ERROR_RATE%) exceeds threshold ($ERROR_RATE_THRESHOLD%)"
    exit 1
  fi

  if (( $(echo "$LATENCY_P95 > $LATENCY_THRESHOLD" | bc -l) )); then
    echo "ERROR: Latency P95 ($LATENCY_P95 ms) exceeds threshold ($LATENCY_THRESHOLD ms)"
    exit 1
  fi

  sleep 60  # Check every minute
done

echo "Monitoring completed. Metrics stable."

Rollback Script

Script: scripts/rollback-deployment.sh

Purpose: Instantly rollback to previous version by shifting traffic back to blue.

Usage:

bash
./scripts/rollback-deployment.sh \
  --cluster mpac-prod \
  --services svc-portal,svc-smarttab,mpac-pgw

Implementation:

bash
#!/bin/bash
set -e

CLUSTER=$1
SERVICES=$2

echo "ROLLBACK: Reverting to blue environment..."

# Shift all traffic back to blue
aws route53 change-resource-record-sets \
  --hosted-zone-id $HOSTED_ZONE_ID \
  --change-batch file://rollback-to-blue.json

echo "Traffic shifted back to blue environment."

# Scale down green environment
for SERVICE in ${SERVICES//,/ }; do
  echo "Scaling down $SERVICE green environment..."
  aws ecs update-service \
    --cluster $CLUSTER \
    --service ${SERVICE}-green \
    --desired-count 0
done

echo "Rollback completed. System restored to previous version."

# Send alert
./scripts/send-alert.sh \
  --message "ROLLBACK: Production deployment rolled back to blue environment" \
  --severity critical

Cross-References


Previous: EnvironmentsUp: Deployment Index

MPAC — MP-Solution Advanced Cloud Service