Skip to main content

Wallet Monitoring & Observability

Key Metrics

Business Metrics

MetricDescriptionAlert Threshold
wallet.topup.totalTotal top-ups per hourN/A (trending)
wallet.topup.amountTotal amount topped upN/A (trending)
wallet.auto_topup.success_rateAuto top-up success percentage< 95%
wallet.transaction.countTransactions per minuteN/A (trending)
wallet.balance.totalTotal wallet balances across systemN/A (trending)

Technical Metrics

MetricDescriptionAlert Threshold
wallet.api.latency_p9999th percentile API latency> 500ms
wallet.api.error_rateAPI error percentage> 1%
wallet.outbox.pending_countUnprocessed outbox entries> 100
wallet.outbox.age_secondsOldest pending outbox entry> 300s
wallet.pass.sync_lagTime since last pass sync> 3600s
wallet.queue.depthBullMQ job queue depth> 1000

Dashboards

Wallet Overview Dashboard

  • Total active wallets
  • Top-ups today (count and amount)
  • Auto top-up success rate (24h)
  • Transaction volume (by type)
  • Balance distribution histogram

Operations Dashboard

  • API latency (p50, p95, p99)
  • Error rate by endpoint
  • Outbox processing lag
  • Queue depths
  • Failed jobs

Pass Provider Dashboard

  • Passes installed (by provider)
  • Pass sync success rate
  • Push notification delivery rate
  • Pass uninstall rate

Alerting

Critical Alerts (PagerDuty)

AlertConditionRunbook
wallet-api-downHealth check fails 3xSee Operations team
wallet-outbox-stuckPending > 100 for 10minSee Operations team
wallet-balance-discrepancyReconciliation mismatchBalance Discrepancy

Warning Alerts (Slack)

AlertConditionAction
wallet-topup-failures-highAuto top-up < 90% successInvestigate payment provider
wallet-api-slowP99 > 300ms for 5minCheck database/cache
wallet-pass-sync-delayedSync lag > 1hrCheck pass provider status

Logging

Log Levels

LevelUse For
ERRORUnexpected failures, exceptions
WARNRecoverable issues, degraded service
INFOSignificant business events
DEBUGDetailed operation tracing

Key Log Events

wallet.topup.initiated    - Member started top-up
wallet.topup.completed - Top-up successful
wallet.topup.failed - Top-up failed (with reason)
wallet.auto_topup.triggered - Auto top-up threshold hit
wallet.transaction.posted - Transaction recorded
wallet.pass.installed - Digital card added
wallet.pass.updated - Pass synced with new data
wallet.reconcile.started - Reconciliation job started
wallet.reconcile.mismatch - Balance discrepancy found

Log Search Queries

# Find all failed top-ups for a member
memberId="abc-123" level=ERROR wallet.topup.failed

# Auto top-up failures in last hour
wallet.auto_topup event=failed timestamp>=now-1h

# Reconciliation mismatches
wallet.reconcile.mismatch

Health Checks

Endpoints

EndpointPurpose
GET /health/liveKubernetes liveness
GET /health/readyKubernetes readiness
GET /health/walletWallet-specific health

Health Check Components

{
"status": "healthy",
"components": {
"database": "healthy",
"redis": "healthy",
"paymentProvider": "healthy",
"outboxProcessor": "healthy",
"autoTopupWorker": "healthy"
},
"metrics": {
"pendingOutbox": 5,
"queueDepth": 12,
"lastReconciliation": "2024-01-15T10:00:00Z"
}
}

Tracing

Trace Context

All wallet operations include trace context:

  • traceId - Distributed trace identifier
  • spanId - Current operation span
  • walletId - Wallet being operated on
  • memberId - Member identifier
  • tenantId - Tenant context

Key Trace Spans

Runbooks

Quick reference for common issues: