Observability Stack (mpac-obs)

Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28

Overview

mpac-obs is the centralized observability stack for the MPAC SmartPOS platform, providing comprehensive monitoring, logging, tracing, and alerting capabilities. The stack collects telemetry data from all services (svc-portal, svc-smarttab, mpac-pgw) and infrastructure components, enabling real-time visibility into system health, performance, and business metrics.

Stack Components:

Prometheus (v3.8) - Time-series metrics storage and querying
Loki (v3.6) - Log aggregation and indexing
Tempo (v2.9) - Distributed tracing backend
Grafana (v12.3) - Unified visualization and dashboards
Alloy (v1.12) - OTLP collector for metrics, logs, and traces

Deployment Models:

Local Development: Docker Compose
AWS Production: ECS Fargate with CloudFormation

Directory Structure
Local Development Setup
AWS Deployment
Service Integration
Configuration Files
Access & Credentials
Data Retention
Performance Considerations

Directory Structure

mpac-obs/
├── docker-compose.yml          # Local development stack
├── .env.example                # Environment variables template
│
├── alloy/                      # OTLP collector configuration
│   ├── config.alloy            # Main configuration file
│   └── README.md               # Alloy setup guide
│
├── prometheus/                 # Metrics storage configuration
│   ├── prometheus.yml          # Scrape configs and rules
│   ├── alerts.yml              # Alert rules
│   └── recording_rules.yml    # Recording rules for aggregations
│
├── loki/                       # Log aggregation configuration
│   ├── loki-config.yml         # Storage and ingestion config
│   └── README.md               # Loki setup guide
│
├── tempo/                      # Distributed tracing configuration
│   ├── tempo.yml               # Trace storage and query config
│   └── README.md               # Tempo setup guide
│
├── grafana/                    # Dashboards and provisioning
│   ├── provisioning/
│   │   ├── datasources/        # Auto-configure data sources
│   │   │   ├── prometheus.yml
│   │   │   ├── loki.yml
│   │   │   └── tempo.yml
│   │   └── dashboards/         # Auto-import dashboards
│   │       ├── dashboards.yml
│   │       ├── service-health.json
│   │       ├── payment-processing.json
│   │       ├── device-fleet.json
│   │       ├── database-performance.json
│   │       └── business-metrics.json
│   └── README.md               # Grafana setup guide
│
└── cloudformation/             # AWS deployment (ECS Fargate)
    ├── Makefile                # Deployment automation
    ├── parameters.json         # CloudFormation parameters
    ├── mpac-obs-stack.yml     # Main CloudFormation template
    ├── networking.yml          # VPC, subnets, security groups
    ├── ecs-cluster.yml         # ECS cluster and services
    ├── storage.yml             # EFS for persistent storage
    └── README.md               # AWS deployment guide

Local Development Setup

Prerequisites

Docker Desktop (or Docker Engine + Docker Compose)
8GB+ RAM available for Docker
Ports available: 3000 (Grafana), 9090 (Prometheus), 4317/4318 (Alloy)

Quick Start

bash

cd mpac-obs

# Copy environment template
cp .env.example .env

# Start all services
docker compose up -d

# Verify services are running
docker compose ps

# View logs
docker compose logs -f

# Stop all services
docker compose down

# Stop and remove volumes (clean slate)
docker compose down -v

Service Endpoints (Local)

Service	URL	Credentials
Grafana	http://localhost:3000	admin / admin (change on first login)
Prometheus	http://localhost:9090	-
Loki	http://localhost:3100	-
Tempo	http://localhost:3200	-
Alloy (OTLP)	grpc://localhost:4317, http://localhost:4318	-

Docker Compose Configuration

Services Defined:

yaml

version: '3.8'

services:
  # OTLP Collector
  alloy:
    image: grafana/alloy:1.12.0
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
      - "12345:12345" # Alloy UI
    volumes:
      - ./alloy/config.alloy:/etc/alloy/config.alloy
    command: run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/config.alloy
    networks:
      - mpac-obs

  # Metrics Storage
  prometheus:
    image: prom/prometheus:v3.8.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./prometheus/alerts.yml:/etc/prometheus/alerts.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=14d'
      - '--web.enable-lifecycle'
    networks:
      - mpac-obs

  # Log Aggregation
  loki:
    image: grafana/loki:3.6.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki/loki-config.yml:/etc/loki/loki-config.yml
      - loki-data:/loki
    command: -config.file=/etc/loki/loki-config.yml
    networks:
      - mpac-obs

  # Distributed Tracing
  tempo:
    image: grafana/tempo:2.9.0
    ports:
      - "3200:3200"   # Tempo HTTP
      - "4317:4317"   # OTLP gRPC (shared with Alloy)
    volumes:
      - ./tempo/tempo.yml:/etc/tempo.yml
      - tempo-data:/tmp/tempo
    command: -config.file=/etc/tempo.yml
    networks:
      - mpac-obs

  # Visualization
  grafana:
    image: grafana/grafana:12.3.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana
    networks:
      - mpac-obs

volumes:
  prometheus-data:
  loki-data:
  tempo-data:
  grafana-data:

networks:
  mpac-obs:
    driver: bridge

Verifying Local Setup

bash

# Check Prometheus is scraping targets
curl http://localhost:9090/api/v1/targets

# Send test log to Loki
curl -X POST http://localhost:3100/loki/api/v1/push \
  -H "Content-Type: application/json" \
  -d '{"streams":[{"stream":{"service":"test"},"values":[["'$(date +%s)000000000'","test log message"]]}]}'

# Query Loki logs
curl -G http://localhost:3100/loki/api/v1/query \
  --data-urlencode 'query={service="test"}'

# Send test trace to Tempo (via Alloy OTLP endpoint)
# Use OpenTelemetry SDK from application services

AWS Deployment

Architecture

┌─────────────────────────────────────────────────────────────┐
│                       AWS Region                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌───────────────────────────────────────────────────────┐ │
│  │  ECS Cluster: mpac-obs-cluster                       │ │
│  │                                                       │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐           │ │
│  │  │  Alloy   │  │Prometheus│  │   Loki   │           │ │
│  │  │(Fargate) │  │(Fargate) │  │(Fargate) │           │ │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘           │ │
│  │       │             │             │                  │ │
│  │  ┌────▼─────────────▼─────────────▼──────┐          │ │
│  │  │         Tempo (Fargate)                │          │ │
│  │  └────────────────┬───────────────────────┘          │ │
│  │                   │                                  │ │
│  │  ┌────────────────▼───────────────────────┐          │ │
│  │  │         Grafana (Fargate)              │          │ │
│  │  └────────────────┬───────────────────────┘          │ │
│  └───────────────────┼───────────────────────────────────┘ │
│                      │                                     │
│  ┌───────────────────▼───────────────────────┐            │
│  │         Application Load Balancer          │            │
│  │  - grafana.obs.mpac-cloud.com               │            │
│  │  - alloy.obs.mpac-cloud.com:4317/4318       │            │
│  └───────────────────┬───────────────────────┘            │
│                      │                                     │
│  ┌───────────────────▼───────────────────────┐            │
│  │         Amazon EFS (Persistent Storage)    │            │
│  │  - Prometheus TSDB                         │            │
│  │  - Loki chunks                             │            │
│  │  - Tempo blocks                            │            │
│  └────────────────────────────────────────────┘            │
└─────────────────────────────────────────────────────────────┘

CloudFormation Deployment

Prerequisites:

AWS CLI configured with appropriate credentials
Route 53 hosted zone for DNS (e.g., mpac-cloud.com)
ACM certificate for *.obs.mpac-cloud.com

Deployment Steps:

bash

cd mpac-obs/cloudformation

# 1. Configure parameters
cp parameters.json.example parameters.json
# Edit parameters.json with your AWS account details

# 2. Deploy full stack (networking + ECS + services)
make deploy-full PARENT_HOSTED_ZONE_ID=Z1234567890ABC

# 3. Check deployment status
make status

# 4. Show service endpoints
make show-endpoints

# 5. Update existing stack
make update

# 6. Delete stack (WARNING: destroys all data)
make delete-stack

CloudFormation Parameters:

json

{
  "Parameters": {
    "Environment": "production",
    "VpcCIDR": "10.20.0.0/16",
    "PublicSubnet1CIDR": "10.20.1.0/24",
    "PublicSubnet2CIDR": "10.20.2.0/24",
    "PrivateSubnet1CIDR": "10.20.10.0/24",
    "PrivateSubnet2CIDR": "10.20.11.0/24",
    "DomainName": "obs.mpac-cloud.com",
    "ParentHostedZoneId": "Z1234567890ABC",
    "CertificateArn": "arn:aws:acm:ap-southeast-1:123456789012:certificate/abc-def-123",
    "PrometheusRetentionDays": "14",
    "LokiRetentionDays": "14",
    "TempoRetentionDays": "7"
  }
}

Service Configuration:

Service	Task Definition	Memory	CPU	Replicas
Alloy	alloy:1.12.0	2GB	1 vCPU	2
Prometheus	prometheus:3.8.0	4GB	2 vCPU	2
Loki	loki:3.6.0	4GB	2 vCPU	2
Tempo	tempo:2.9.0	4GB	2 vCPU	2
Grafana	grafana:12.3.0	2GB	1 vCPU	2

AWS Endpoints (Production)

Service	URL	Access
Grafana	https://grafana.obs.mpac-cloud.com	SSO / IAM auth
Alloy (OTLP)	grpc://alloy.obs.mpac-cloud.com:4317	VPC internal
Prometheus	http://prometheus.obs.mpac-cloud.internal:9090	VPC internal
Loki	http://loki.obs.mpac-cloud.internal:3100	VPC internal
Tempo	http://tempo.obs.mpac-cloud.internal:3200	VPC internal

Service Integration

Application Configuration

Environment Variables for Services:

bash

# svc-portal, svc-smarttab, mpac-pgw
export OTLP_ENDPOINT="http://alloy.obs.mpac-cloud.internal:4318"  # HTTP
export OTLP_ENDPOINT_GRPC="http://alloy.obs.mpac-cloud.internal:4317"  # gRPC
export OTEL_SERVICE_NAME="svc-smarttab"
export OTEL_RESOURCE_ATTRIBUTES="environment=production,region=ap-southeast-1"

Service Auto-Discovery (AWS)

Alloy automatically discovers services from ECS task metadata:

hcl

// alloy/config.alloy
discovery.ecs "services" {
  region = "ap-southeast-1"

  // Discover tasks with prometheus.io/scrape=true label
  filter {
    name = "tag:prometheus.io/scrape"
    values = ["true"]
  }
}

// Scrape discovered services
prometheus.scrape "ecs_services" {
  targets    = discovery.ecs.services.targets
  forward_to = [prometheus.remote_write.default.receiver]
}

ECS Task Definition Labels:

json

{
  "containerDefinitions": [{
    "name": "svc-smarttab",
    "dockerLabels": {
      "prometheus.io/scrape": "true",
      "prometheus.io/port": "8080",
      "prometheus.io/path": "/metrics"
    }
  }]
}

Instrumentation Examples

Python (svc-portal):

python

# requirements.txt
opentelemetry-api==1.20.0
opentelemetry-sdk==1.20.0
opentelemetry-instrumentation-fastapi==0.41b0
opentelemetry-exporter-otlp==1.20.0

# main.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Initialize tracing
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(
    endpoint=os.getenv("OTLP_ENDPOINT_GRPC", "http://localhost:4317"),
    insecure=True
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(otlp_exporter)
)

# Auto-instrument FastAPI
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)

# Expose Prometheus metrics
from prometheus_client import make_asgi_app
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

Go (svc-smarttab, mpac-pgw):

// go.mod
require (
    go.opentelemetry.io/otel v1.20.0
    go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.20.0
    go.opentelemetry.io/otel/sdk v1.20.0
    github.com/prometheus/client_golang v1.17.0
)

// main.go
package main

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/trace"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func initTracing() error {
    exporter, err := otlptracegrpc.New(
        context.Background(),
        otlptracegrpc.WithEndpoint("alloy.obs.mpac-cloud.internal:4317"),
        otlptracegrpc.WithInsecure(),
    )
    if err != nil {
        return err
    }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
    )
    otel.SetTracerProvider(tp)
    return nil
}

func main() {
    initTracing()

    // Expose Prometheus metrics
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Configuration Files

Alloy Configuration (OTLP Collector)

File: alloy/config.alloy

hcl

// OTLP Receiver for logs, metrics, traces
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }

  http {
    endpoint = "0.0.0.0:4318"
  }

  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch processor (improve performance)
otelcol.processor.batch "default" {
  timeout = "5s"
  send_batch_size = 1000

  output {
    metrics = [otelcol.exporter.prometheus.default.input]
    logs    = [otelcol.exporter.loki.default.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}

// Export metrics to Prometheus
otelcol.exporter.prometheus "default" {
  endpoint = "http://prometheus:9090/api/v1/write"
}

// Export logs to Loki
otelcol.exporter.loki "default" {
  endpoint = "http://loki:3100/loki/api/v1/push"
}

// Export traces to Tempo
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    insecure = true
  }
}

Prometheus Configuration

File: prometheus/prometheus.yml

yaml

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'mpac-production'
    region: 'ap-southeast-1'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'alertmanager:9093'

# Load alert rules
rule_files:
  - 'alerts.yml'
  - 'recording_rules.yml'

# Scrape configurations
scrape_configs:
  # Self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Service discovery for ECS tasks (AWS)
  - job_name: 'ecs-services'
    ec2_sd_configs:
      - region: ap-southeast-1
        port: 8080
        filters:
          - name: tag:prometheus.io/scrape
            values: ['true']
    relabel_configs:
      - source_labels: [__meta_ec2_tag_Name]
        target_label: service_name

  # Static targets (local development)
  - job_name: 'svc-portal'
    static_configs:
      - targets: ['svc-portal:8002']
        labels:
          service: 'svc-portal'

  - job_name: 'svc-smarttab'
    static_configs:
      - targets: ['svc-smarttab:8080']
        labels:
          service: 'svc-smarttab'

  - job_name: 'mpac-pgw'
    static_configs:
      - targets: ['mpac-pgw:8080']
        labels:
          service: 'mpac-pgw'

Loki Configuration

File: loki/loki-config.yml

yaml

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 0.0.0.0
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  chunk_idle_period: 5m
  chunk_retain_period: 30s

schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: filesystem
  filesystem:
    directory: /loki/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h  # 7 days
  retention_period: 336h  # 14 days

chunk_store_config:
  max_look_back_period: 336h  # 14 days

table_manager:
  retention_deletes_enabled: true
  retention_period: 336h  # 14 days

Tempo Configuration

File: tempo/tempo.yml

yaml

server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

ingester:
  trace_idle_period: 10s
  max_block_bytes: 1_000_000
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 168h  # 7 days

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/blocks
    wal:
      path: /tmp/tempo/wal
    pool:
      max_workers: 100
      queue_depth: 10000

Access & Credentials

Local Development

Service	Default Credentials	Change Method
Grafana	admin / admin	Change on first login
Prometheus	No auth	-
Loki	No auth	-
Tempo	No auth	-

AWS Production

Service	Authentication	Authorization
Grafana	AWS SSO / SAML	Role-based (Viewer, Editor, Admin)
Internal Services	VPC security groups	Network-level isolation

Grafana SSO Configuration (AWS):

bash

# Environment variables for Grafana container
GF_AUTH_GENERIC_OAUTH_ENABLED=true
GF_AUTH_GENERIC_OAUTH_NAME=AWS SSO
GF_AUTH_GENERIC_OAUTH_CLIENT_ID=${SSO_CLIENT_ID}
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET=${SSO_CLIENT_SECRET}
GF_AUTH_GENERIC_OAUTH_SCOPES=openid profile email
GF_AUTH_GENERIC_OAUTH_AUTH_URL=https://portal.sso.ap-southeast-1.amazonaws.com/oauth2/authorize
GF_AUTH_GENERIC_OAUTH_TOKEN_URL=https://portal.sso.ap-southeast-1.amazonaws.com/oauth2/token
GF_AUTH_GENERIC_OAUTH_API_URL=https://portal.sso.ap-southeast-1.amazonaws.com/oauth2/userInfo

Data Retention

Component	Default Retention	Configurable	Storage Location
Prometheus	14 days	Yes (via --storage.tsdb.retention.time)	EFS: /mnt/prometheus
Loki	14 days	Yes (retention_period in config)	EFS: /mnt/loki/chunks
Tempo	7 days	Yes (block_retention in config)	EFS: /mnt/tempo/blocks
Grafana Dashboards	Permanent	-	EFS: /var/lib/grafana

Adjusting Retention (AWS):

bash

# Update CloudFormation parameters
{
  "PrometheusRetentionDays": "30",  # Increase to 30 days
  "LokiRetentionDays": "30",
  "TempoRetentionDays": "14"
}

# Deploy update
make update

Estimated Storage Requirements:

Retention Period	Prometheus	Loki	Tempo	Total
7 days	50 GB	100 GB	200 GB	350 GB
14 days	100 GB	200 GB	400 GB	700 GB
30 days	200 GB	400 GB	800 GB	1.4 TB

Performance Considerations

Resource Sizing

Local Development (Docker):

Minimum: 8 GB RAM, 4 CPU cores
Recommended: 16 GB RAM, 8 CPU cores
Storage: 50 GB SSD

AWS Production (Per Service):

Service	Memory	CPU	Storage (EFS)	Cost/Month
Alloy	2 GB	1 vCPU	-	~$30
Prometheus	4 GB	2 vCPU	100 GB	~$90
Loki	4 GB	2 vCPU	200 GB	~$110
Tempo	4 GB	2 vCPU	400 GB	~$150
Grafana	2 GB	1 vCPU	10 GB	~$35
Total	16 GB	8 vCPU	710 GB	~$415/month

Query Performance

Prometheus Query Best Practices:

promql

# Good: Specific time range
rate(http_requests_total[5m])

# Bad: Large time range (slow)
rate(http_requests_total[1d])

# Good: Pre-aggregated recording rules
job:http_requests:rate5m

# Good: Limit cardinality
sum by (service, status) (http_requests_total)

Loki Query Best Practices:

logql

# Good: Use labels for filtering
{service="svc-smarttab", level="error"} |= "payment failed"

# Bad: Full text search without labels (very slow)
{} |= "payment failed"

# Good: Time-bounded queries
{service="svc-smarttab"} [5m]

Scaling Considerations

Horizontal Scaling (AWS):

Prometheus: Use federation or remote write to multiple instances
Loki: Deploy multiple ingesters with consistent hashing
Tempo: Scale distributors independently from ingesters
Grafana: Use RDS PostgreSQL for dashboard/user storage

Vertical Scaling:

Increase ECS task memory/CPU when query latency increases
Monitor memory usage: scale up at 80% utilization
Monitor CPU usage: scale up at 70% utilization

Observability Stack (mpac-obs)

Overview

Table of Contents

Directory Structure

Local Development Setup

Prerequisites

Quick Start

Service Endpoints (Local)

Docker Compose Configuration

Verifying Local Setup

AWS Deployment

Architecture

CloudFormation Deployment

AWS Endpoints (Production)

Service Integration

Application Configuration

Service Auto-Discovery (AWS)

Instrumentation Examples

Configuration Files

Alloy Configuration (OTLP Collector)

Prometheus Configuration

Loki Configuration

Tempo Configuration

Access & Credentials

Local Development

AWS Production

Data Retention

Performance Considerations

Resource Sizing

Query Performance

Scaling Considerations

See Also

Observability Stack (mpac-obs) ​

Overview ​

Table of Contents ​

Directory Structure ​

Local Development Setup ​

Prerequisites ​

Quick Start ​

Service Endpoints (Local) ​

Docker Compose Configuration ​

Verifying Local Setup ​

AWS Deployment ​

Architecture ​

CloudFormation Deployment ​

AWS Endpoints (Production) ​

Service Integration ​

Application Configuration ​

Service Auto-Discovery (AWS) ​

Instrumentation Examples ​

Configuration Files ​

Alloy Configuration (OTLP Collector) ​

Prometheus Configuration ​

Loki Configuration ​

Tempo Configuration ​

Access & Credentials ​

Local Development ​

AWS Production ​

Data Retention ​

Performance Considerations ​

Resource Sizing ​

Query Performance ​

Scaling Considerations ​

See Also ​

Observability Stack (mpac-obs)

Overview

Table of Contents

Directory Structure

Local Development Setup

Prerequisites

Quick Start

Service Endpoints (Local)

Docker Compose Configuration

Verifying Local Setup

AWS Deployment

Architecture

CloudFormation Deployment

AWS Endpoints (Production)

Service Integration

Application Configuration

Service Auto-Discovery (AWS)

Instrumentation Examples

Configuration Files

Alloy Configuration (OTLP Collector)

Prometheus Configuration

Loki Configuration

Tempo Configuration

Access & Credentials

Local Development

AWS Production

Data Retention

Performance Considerations

Resource Sizing

Query Performance

Scaling Considerations

See Also