mpac-obs Observability Architecture

Overview

mpac-obs is the centralized observability stack for the MPAC platform, providing:

Logs - Aggregated via Loki
Metrics - Stored in Prometheus
Traces - Distributed tracing via Tempo

Deployment Topology

Where is Alloy Deployed?

Alloy runs inside the observability cluster (mpac-obs), NOT as sidecars in application clusters.

+-----------------------------------------------------------------------------+
|                        APPLICATION CLUSTERS                                  |
+---------------------+---------------------+---------------------------------+
|   mpac-pgw cluster   |   mpac-smartpos cluster    |      other clusters             |
|   +-----------+     |   +-----------+     |                                 |
|   |  mpac-pgw  |     |   |svc-portal |     |                                 |
|   |   :8080   |     |   |  :8002    |     |                                 |
|   +-----+-----+     |   +-----+-----+     |                                 |
|         |           |         |           |                                 |
|   +-----+-----+     |   +-----+-----+     |                                 |
|   |svc-smarttab|    |   |             |   |                                 |
|   |   :8081   |     |   +-----------+     |                                 |
|   +-----+-----+     |                     |                                 |
+---------+-----------+----------+----------+---------------------------------+
          |                      |
          |   OTLP (4317/4318)   |
          |                      |
          v                      v
+-----------------------------------------------------------------------------+
|                    MPAC-OBS CLUSTER (Observability)                       |
|                                                                              |
|  +-------------------------------------------------------------------+      |
|  |                       ALLOY (Collector)                            |      |
|  |                       :4317 (gRPC) :4318 (HTTP)                    |      |
|  |  +-----------+    +-----------+    +-----------+                   |      |
|  |  |   OTLP    |    |   Batch   |    |  Export   |                   |      |
|  |  | Receiver  |--->| Processor |--->|  Router   |                   |      |
|  |  +-----------+    +-----------+    +-----+-----+                   |      |
|  +------------------------------------------+------------------------+      |
|                                             |                                |
|              +------------------------------+-------------------+            |
|              |                              |                   |            |
|              v                              v                   v            |
|  +-------------------+     +-------------------+     +-----------------+     |
|  |   PROMETHEUS      |     |      LOKI         |     |     TEMPO       |     |
|  |   (Metrics)       |     |     (Logs)        |     |    (Traces)     |     |
|  |   :9090           |     |     :3100         |     |    :3200        |     |
|  +---------+---------+     +---------+---------+     +--------+--------+     |
|            |                         |                        |              |
|            +-------------------------+------------------------+              |
|                                      |                                       |
|                                      v                                       |
|                          +-------------------+                               |
|                          |     GRAFANA       |                               |
|                          |   (Dashboards)    |                               |
|                          |     :3000         |                               |
|                          +-------------------+                               |
+-----------------------------------------------------------------------------+

Why Centralized Alloy (Not Sidecars)?

Approach	Pros	Cons
Centralized (Current)	Single config, easier management, lower resource overhead	Single point of failure, network hop
Sidecar per service	Resilient, lower latency	Config sprawl, higher resource usage

For MPAC platform scale (~15,000 RPS), centralized Alloy is sufficient. Consider sidecars only if:

Network latency between clusters exceeds 10ms
You need per-service buffering for reliability
Service teams need independent collector configs

Data Flow

1. Traces (OTLP -> Tempo)

Application (OpenTelemetry SDK)
    |
    | OTLP gRPC/HTTP
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.processor.batch (5s, 1000 spans)
    | otelcol.exporter.otlp
    v
Tempo:4317
    |
    | stores traces
    v
Grafana (TraceQL queries via :3200)

2. Metrics (OTLP -> Prometheus)

Application (OpenTelemetry SDK)
    |
    | OTLP metrics
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.processor.batch
    | otelcol.exporter.prometheus
    | prometheus.remote_write
    v
Prometheus:9090/api/v1/write
    |
    | stores metrics (TSDB, 14d retention)
    v
Grafana (PromQL queries)

3. Logs (Two Paths)

Path A: OTLP Logs (from applications)

Application (OpenTelemetry SDK)
    |
    | OTLP logs
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.exporter.loki
    v
Loki:3100/loki/api/v1/push

Path B: Docker Logs (container stdout/stderr) - Local Only

Docker Container (stdout/stderr)
    |
    | Docker socket
    v
Alloy (discovery.docker)
    |
    | loki.source.docker
    | loki.process.json_logs (parse JSON, extract fields)
    v
Loki:3100/loki/api/v1/push

Service Discovery

Local Development (Docker Compose)

Services communicate via Docker network DNS:

prometheus:9090
loki:3100
tempo:4317
alloy:4317

Alloy discovers containers via Docker socket mount:

yaml

volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro

AWS Production (ECS Fargate)

Services use AWS Cloud Map for DNS:

prometheus.mpac-obs.{env}.local:9090
loki.mpac-obs.{env}.local:3100
tempo.mpac-obs.{env}.local:4317
alloy.mpac-obs.{env}.local:4317

Application services connect to Alloy via:

OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.{env}.local:4317

Port Reference

Service	Port	Protocol	Purpose
Alloy	4317	gRPC	OTLP receiver (apps send here)
Alloy	4318	HTTP	OTLP receiver (apps send here)
Alloy	12345	HTTP	Alloy UI/health
Prometheus	9090	HTTP	Metrics API, remote write
Loki	3100	HTTP	Log push/query API
Tempo	3200	HTTP	Trace query API (Grafana)
Tempo	4317	gRPC	OTLP receiver (internal, from Alloy)
Grafana	3000	HTTP	Dashboard UI

Configuration Files

mpac-obs/
├── stack/
│   ├── alloy/
│   │   └── config.alloy          # Alloy pipeline configuration
│   ├── prometheus/
│   │   └── prometheus.yml        # Prometheus scrape config
│   ├── loki/
│   │   └── config.yaml           # Loki storage/ingestion config
│   ├── tempo/
│   │   └── config.yaml           # Tempo storage/receiver config
│   ├── grafana/
│   │   ├── provisioning/
│   │   │   └── datasources/      # Auto-configured datasources
│   │   └── dashboards/           # Pre-built dashboards
│   └── docker-compose.yml        # Local stack definition
└── ...

Application Integration

Go Services (mpac-pgw, svc-smarttab)

import "go.opentelemetry.io/otel"

// Configure OTLP exporter
exporter, _ := otlptracegrpc.New(ctx,
    otlptracegrpc.WithEndpoint("localhost:4317"),
    otlptracegrpc.WithInsecure(),
)

Environment variables:

bash

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317  # Local
OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.dev.local:4317  # AWS
OTEL_SERVICE_NAME=mpac-pgw.backend

Python Services (svc-portal)

python

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(
    endpoint="localhost:4317",
    insecure=True,
)

Correlation: Traces <-> Logs <-> Metrics

All telemetry is correlated via:

trace_id / span_id - Links logs to specific traces
service label - Groups all telemetry by service name
request_id / corr_id - Application-level correlation

Alloy extracts these fields from JSON logs:

trace_id, span_id, request_id, corr_id, merchant_id, store_id

In Grafana:

Click trace -> "Logs for this trace"
Click log -> "View trace"
Dashboard variables filter all panels by service

mpac-obs Observability Architecture ​

Overview ​

Deployment Topology ​

Where is Alloy Deployed? ​

Why Centralized Alloy (Not Sidecars)? ​

Data Flow ​

1. Traces (OTLP -> Tempo) ​

2. Metrics (OTLP -> Prometheus) ​

3. Logs (Two Paths) ​

Service Discovery ​

Local Development (Docker Compose) ​

AWS Production (ECS Fargate) ​

Port Reference ​

Configuration Files ​

Application Integration ​

Go Services (mpac-pgw, svc-smarttab) ​

Python Services (svc-portal) ​

Correlation: Traces <-> Logs <-> Metrics ​