Wallet Monitoring & Observability
Key Metrics
Business Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
wallet.topup.total | Total top-ups per hour | N/A (trending) |
wallet.topup.amount | Total amount topped up | N/A (trending) |
wallet.auto_topup.success_rate | Auto top-up success percentage | < 95% |
wallet.transaction.count | Transactions per minute | N/A (trending) |
wallet.balance.total | Total wallet balances across system | N/A (trending) |
Technical Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
wallet.api.latency_p99 | 99th percentile API latency | > 500ms |
wallet.api.error_rate | API error percentage | > 1% |
wallet.outbox.pending_count | Unprocessed outbox entries | > 100 |
wallet.outbox.age_seconds | Oldest pending outbox entry | > 300s |
wallet.pass.sync_lag | Time since last pass sync | > 3600s |
wallet.queue.depth | BullMQ job queue depth | > 1000 |
Dashboards
Wallet Overview Dashboard
- Total active wallets
- Top-ups today (count and amount)
- Auto top-up success rate (24h)
- Transaction volume (by type)
- Balance distribution histogram
Operations Dashboard
- API latency (p50, p95, p99)
- Error rate by endpoint
- Outbox processing lag
- Queue depths
- Failed jobs
Pass Provider Dashboard
- Passes installed (by provider)
- Pass sync success rate
- Push notification delivery rate
- Pass uninstall rate
Alerting
Critical Alerts (PagerDuty)
| Alert | Condition | Runbook |
|---|---|---|
wallet-api-down | Health check fails 3x | See Operations team |
wallet-outbox-stuck | Pending > 100 for 10min | See Operations team |
wallet-balance-discrepancy | Reconciliation mismatch | Balance Discrepancy |
Warning Alerts (Slack)
| Alert | Condition | Action |
|---|---|---|
wallet-topup-failures-high | Auto top-up < 90% success | Investigate payment provider |
wallet-api-slow | P99 > 300ms for 5min | Check database/cache |
wallet-pass-sync-delayed | Sync lag > 1hr | Check pass provider status |
Logging
Log Levels
| Level | Use For |
|---|---|
| ERROR | Unexpected failures, exceptions |
| WARN | Recoverable issues, degraded service |
| INFO | Significant business events |
| DEBUG | Detailed operation tracing |
Key Log Events
wallet.topup.initiated - Member started top-up
wallet.topup.completed - Top-up successful
wallet.topup.failed - Top-up failed (with reason)
wallet.auto_topup.triggered - Auto top-up threshold hit
wallet.transaction.posted - Transaction recorded
wallet.pass.installed - Digital card added
wallet.pass.updated - Pass synced with new data
wallet.reconcile.started - Reconciliation job started
wallet.reconcile.mismatch - Balance discrepancy found
Log Search Queries
# Find all failed top-ups for a member
memberId="abc-123" level=ERROR wallet.topup.failed
# Auto top-up failures in last hour
wallet.auto_topup event=failed timestamp>=now-1h
# Reconciliation mismatches
wallet.reconcile.mismatch
Health Checks
Endpoints
| Endpoint | Purpose |
|---|---|
GET /health/live | Kubernetes liveness |
GET /health/ready | Kubernetes readiness |
GET /health/wallet | Wallet-specific health |
Health Check Components
{
"status": "healthy",
"components": {
"database": "healthy",
"redis": "healthy",
"paymentProvider": "healthy",
"outboxProcessor": "healthy",
"autoTopupWorker": "healthy"
},
"metrics": {
"pendingOutbox": 5,
"queueDepth": 12,
"lastReconciliation": "2024-01-15T10:00:00Z"
}
}
Tracing
Trace Context
All wallet operations include trace context:
traceId- Distributed trace identifierspanId- Current operation spanwalletId- Wallet being operated onmemberId- Member identifiertenantId- Tenant context
Key Trace Spans
Runbooks
Quick reference for common issues: