Deployment Guide

How to deploy the mpac-obs observability stack locally and to AWS.

Local Development

Prerequisites

Docker Desktop (macOS/Windows) or Docker Engine + Docker Compose v2 (Linux)
At least 4 GB of RAM allocated to Docker

Start the Stack

From the repository root:

bash

make up

Or manually:

bash

cd stack
docker compose up -d

Verify Health

bash

make status

All five services should show as "running" (or "healthy" for those with health checks):

Service	Health Check URL
Prometheus	http://localhost:9090/-/healthy
Loki	http://localhost:3100/ready
Tempo	http://localhost:3200/ready
Alloy	http://localhost:12345/-/ready
Grafana	http://localhost:3000/api/health

Access Points

Service	URL	Credentials
Grafana	http://localhost:3000	admin / admin
Prometheus	http://localhost:9090	-
Loki	http://localhost:3100	-
Tempo	http://localhost:3200	-
Alloy UI	http://localhost:12345	-

Stop the Stack

bash

make down

Clean Restart (Remove All Data)

bash

make clean
make up

AWS Deployment

Architecture

In AWS, the observability stack runs on ECS Fargate with the following topology:

Alloy: ECS Fargate service, receives OTLP from application services
Prometheus, Loki, Tempo: ECS Fargate services (or EC2 for Dev Hybrid)
Grafana: ECS Fargate service with persistent EFS volume

Service discovery is handled by AWS Cloud Map, providing DNS names like:

alloy.mpac-obs.{env}.local:4317
prometheus.mpac-obs.{env}.local:9090

Prerequisites

AWS CLI configured with appropriate credentials
Access to the deployment IAM role
The mpac-infra repository (contains CloudFormation templates)

Deploy via GitHub Actions

Go to Actions -> CD - Deploy Observability Stack
Click Run workflow
Select the target environment (dev / staging / prod)
Optionally enable Dry run to validate without deploying
Click Run workflow

Deploy Manually

bash

# Ensure AWS credentials are configured
aws sts get-caller-identity

# Deploy infrastructure (from mpac-infra repo)
cd mpac-infra/observability
make deploy ENV=dev

# Update configs (push to S3)
aws s3 sync stack/alloy/ s3://mpac-obs-configs-dev/alloy/
aws s3 sync stack/prometheus/ s3://mpac-obs-configs-dev/prometheus/
aws s3 sync stack/loki/ s3://mpac-obs-configs-dev/loki/
aws s3 sync stack/tempo/ s3://mpac-obs-configs-dev/tempo/

# Trigger ECS service update
aws ecs update-service \
  --cluster mpac-obs-dev \
  --service alloy \
  --force-new-deployment

Environment Configuration

Parameter	Dev	Staging	Production
Prometheus retention	14 days	14 days	30 days
Loki retention	14 days	14 days	30 days
Tempo retention	14 days	14 days	30 days
Alloy batch size	1000	1000	2000
Grafana replicas	1	1	2

Rollback

To roll back to a previous configuration:

bash

# List previous task definition revisions
aws ecs list-task-definitions --family-prefix mpac-obs-alloy

# Update service to use previous revision
aws ecs update-service \
  --cluster mpac-obs-dev \
  --service alloy \
  --task-definition mpac-obs-alloy:<previous-revision>

Troubleshooting

Alloy Not Receiving Data

Check Alloy is running: curl http://localhost:12345/-/ready
Verify OTLP ports are open: nc -zv localhost 4317
Check Alloy logs: docker compose logs alloy
Verify the bearer token matches between sender and receiver

Grafana Shows No Data

Check datasource connectivity in Grafana: Settings -> Data Sources -> Test
Verify Prometheus has data: curl http://localhost:9090/api/v1/targets
Verify Loki has data: curl http://localhost:3100/loki/api/v1/labels
Check that services are sending telemetry (look for OTLP connection logs)

High Memory Usage

Loki: Reduce max_size_mb in embedded cache config
Prometheus: Reduce storage.tsdb.retention.time
Alloy: Reduce send_batch_size in batch processor

Deployment Guide ​

Local Development ​

Prerequisites ​

Start the Stack ​

Verify Health ​

Access Points ​

Stop the Stack ​

Clean Restart (Remove All Data) ​

AWS Deployment ​

Architecture ​

Prerequisites ​

Deploy via GitHub Actions ​

Deploy Manually ​

Environment Configuration ​

Rollback ​

Troubleshooting ​

Alloy Not Receiving Data ​

Grafana Shows No Data ​

High Memory Usage ​

Deployment Guide

Local Development

Prerequisites

Start the Stack

Verify Health

Access Points

Stop the Stack

Clean Restart (Remove All Data)

AWS Deployment

Architecture

Prerequisites

Deploy via GitHub Actions

Deploy Manually

Environment Configuration

Rollback

Troubleshooting

Alloy Not Receiving Data

Grafana Shows No Data

High Memory Usage