← Back to Business DAG Overview

Infrastructure Audit — 共享基础设施审计

RBAC | Messaging | Audit Log | Multi-Tenancy | Agent System
2026-03-22 | Spec 098-business-dag-audit
Pass (fully covered)
Partial (gaps exist)
Fail (missing / broken)

Overall Coverage

30
Total Checks
18
Pass
9
Partial
3
Fail

1. RBAC & Permission System

All protected API routes must use permission-middleware.js (not deprecated auth-middleware.js). Checks resource:action pairs via requirePermission / requireAnyPermission / requireAllPermissions.

655 permission middleware calls
77 API files with RBAC guards
~1197 total route definitions (153 files)
Status Check Item Verification Result
Core API RBAC coverage 77 of ~106 non-public API files have requirePermission guards. Major modules (appointments, employees, schedules, marketing, reports, admin, stores, payroll, loyalty, settings) all covered.
Payment routes permission guard payment/index.js applies createExplicitRoutePermissionGuard with 40+ explicit route-permission rules. All JWT-protected payment routes go through this guard before handler execution.
POS device authentication POS device routes (pos-device-routes.js) use authenticateDevice() from device-auth.js middleware. These are intentionally before the permission guard (device-token auth, not JWT).
Public routes intentionally unprotected api/public/booking.js (21 routes) and api/public/ipad-showcase.js (1 route) are intentionally public, protected by rate limiters (publicGeneralLimiter, publicSlotsLimiter, publicConfirmLimiter) and optional guest JWT where needed.
Webhook routes intentionally unprotected Webhook endpoints (webhooks/stripe.js, webhooks/twilio-sms.js, webhooks/resend-email.js) validate via provider-specific signature verification (Stripe signature, Twilio signature). No user JWT needed.
Agent API RBAC coverage api/agent/content.js (8), api/agent/seo.js (9), api/agent/oauth.js (5), api/agent/review.js (7) all have requirePermission guards.
Guest auth routes api/guest-auth.js has 16 route definitions but only 4 use tenantDb/platformDb. Guest auth routes are intentionally public-facing (login, register, send-code) but some internal management endpoints may lack permission checks. Needs manual review.
Deprecated auth-middleware usage auth-middleware.js still exports permission-related methods (with deprecation warnings). Some older files may still import from it. Codebase search shows most have migrated, but legacy references remain in a few modules.

2. Messaging & Circuit Breakers

All external service calls must be wrapped with CircuitBreaker. Breakers must not be bypassed. Multi-provider services must auto-failover when primary circuit opens.

16 files with CircuitBreaker
4 messaging clients with send-guard
13+ registered breakers
Status Check Item Verification Result
SMS circuit breakers twilio-client.js has twilioBreaker (CircuitBreaker 'twilio'). telnyx-client.js has telnyxBreaker (CircuitBreaker 'telnyx'). Both exported and checked in sms-sender.js for failover.
SMS auto-failover sms-sender.js getActiveClient() checks both readiness AND breaker.isOpen(). If preferred provider circuit is open, automatically falls back to alternate provider.
Email circuit breakers ses-client.js has sesBreaker (CircuitBreaker 'ses'). resend-client.js has resendBreaker (CircuitBreaker 'resend').
Email auto-failover email-sender.js only initializes sesClient. Unlike SMS sender, there is no automatic failover from SES to Resend when SES circuit opens. Resend client exists but is not wired into the high-level sender as a fallback.
Send-guard coverage (SMS) Both twilio-client.js (line 203) and telnyx-client.js (line 215) call checkPhone() from send-guard.js before sending. All SMS sends go through the guard.
Send-guard coverage (Email) Both ses-client.js (line 290) and resend-client.js (line 169) call checkEmail() from send-guard.js before sending. All email sends go through the guard.
AI service circuit breakers All AI services wrapped: ai-service.js ('openai-email-ai'), deepseek-template-service.js ('deepseek-template-ai'), insight-llm-service.js ('insight-llm'), llm-processor.js ('openai-voice-processor'), intent-handler.js ('openai-voice-intent').
Acquisition service circuit breakers gbp-sync-service.js ('gbp_api'), gbp-review-poller.js ('gbp_api'), review-reply-service.js ('anthropic-sonnet'), content-publisher.js ('meta_api' + 'gbp_posts_api'). All external API calls wrapped.
Voice/Voicebot circuit breakers retell-client.js ('retell-ai'), llm-processor.js ('openai-voice-processor'). External voice AI calls wrapped.
Holiday API circuit breaker holidayService.js has nager-holiday-api breaker for external Nager.Date API calls.
CodePay payment gateway circuit breaker pos-payment-service.js and codepay-query-service.js make HTTP calls to CodePay API but have NO CircuitBreaker wrapper. This is a constitution violation for a critical payment path.
Circuit breaker registry circuit-breaker-registry.js provides global registry with getAllStatuses(), hasOpenCircuit(), getOpenCircuits(). Pre-registers gbp_api, anthropic_sonnet, google_places. Monitoring endpoint: /api/monitoring/circuits.

3. Audit Log System

Critical operations (payment, refund, permission changes, auth events) must be logged to the audit trail. Global audit middleware captures write operations automatically.

457 routes in audit-route-map
51 files reference auditLogger
Status Check Item Verification Result
Global audit middleware server.js (line 1159) loads createAuditMiddleware from middleware/audit-middleware.js. Applied globally to all routes. Automatically logs POST/PUT/PATCH/DELETE operations.
Audit route map completeness config/audit-route-map.js maps 457 write operations with explicit { resource, action, summary } metadata. Covers auth, appointments, payments, permissions, employees, stores, marketing, and more.
Permission change audit services/permission-audit-service.js and api/admin/permissions.js both reference audit logging. Role/permission changes are tracked.
Payment/refund explicit audit Payment routes in api/payment/ have no explicit auditLogger calls. Relies entirely on global audit middleware auto-capture. While functional, explicit audit calls with enriched context (amount, transaction ID, refund reason) would provide higher-quality audit trails for financial operations.
Audit alert system services/audit/audit-alert-service.js + jobs/auditAlertJob.js provide automated alerting on suspicious audit patterns. admin/audit-alerts.js API for alert management.
Tenant-level audit isolation services/audit/tenant-audit.js ensures audit logs are stored within tenant schemas. Cross-tenant audit leakage prevented by schema isolation.

4. Multi-Tenancy & Data Access Layer

All database queries must use the unified data access layer (tenantDb.query / platformDb.query). Legacy pool.query calls violate tenant isolation guarantees.

1849 tenantDb/platformDb calls (213 files)
103 files with legacy pool.query
9 API files still using pool.query
Status Check Item Verification Result
Unified data access layer exists database/data-access/ provides tenantDb and platformDb with schema-aware query routing, transaction support, and queryWithTenant() for background jobs. 1849 usages across 213 files.
Legacy pool.query elimination 103 non-test files still use pool.query directly. Breakdown: 9 API files, ~40 service files, ~16 job files, plus scripts, middleware, and utilities. Major offenders include sms-sender.js (4 calls), email-sender.js (4 calls), booking-service.js, sync-engine.js (21 calls), multiple loyalty services.
API layer migration 9 API files still use pool.query: public/booking.js (64 calls - largest offender), dev.js (9), voice/outbound.js (9), voice/streaming-stt.js (7), logs.js (6), webhooks/resend-email.js (5), tenant-activate.js (1), tenants-validate.js (1), webhooks/twilio-sms.js (2).
Service layer migration 40 service files still use pool.query. Includes critical paths: booking-service.js, sms-sender.js, email-sender.js, sync-engine.js, checkinService.js, multiple loyalty services, voice services. These risk cross-tenant data leakage.
Background job migration 16 job files use pool.query. Includes appointmentReminderJob.js, benchmarkCalculationJob.js, subscriptionExpirationJob.js, trial-expiration-job.js (3 calls), etc. Jobs should use tenantDb.queryWithTenant().
Tenant context middleware middleware/tenant-context.js extracts tenant from JWT and injects into req.tenant. Applied globally before route handlers.

5. Agent System

Agent routing via store-network.js and orchestrator-network.js. 3-level routing: deterministic ROUTE_MAP, SOP escalation rules, LLM classification (stub). Escalation via escalation.js utility.

5 agent definitions
12 event types in ROUTE_MAP
3 acquisition agents not in network
Status Check Item Verification Result
Store agent coverage Store network agents array includes 4 agents: appointment-agent, day-sop-agent, supervisor-agent, finance-agent. All have corresponding agent files in agents/agents/.
Acquisition agents in store network ROUTE_MAP references review_agent, content_agent, local_seo_agent for 5 event types, but these 3 agents are NOT in the agents: [...] array of storeNetwork. Route resolution will fail: network.agents.find(a => a.name === 'review_agent') returns undefined because the acquisition agents are not registered.
Orchestrator network orchestrator-network.js routes all multi-store commands through orchestratorAgent. Single-agent routing via network.agents[0]. Simple and correct.
SOP definitions loaded 5 SOP definitions loaded at module level: appointment-checkout, day-start, day-end, walkin-to-appointment, daily-reconciliation. Used for Level 2 escalation rule matching.
Escalation rules coverage escalation.js provides generic createEscalationContext() and getEscalationPrompt(). It is a utility library, not a rule engine. Escalation scenarios are embedded in agent system prompts via getEscalationPrompt(). No structured rule definitions exist for specific failure scenarios (e.g., payment fails, SMS fails, appointment conflict).
Agent permission checker agents/lib/permission-checker.js exists for agent-level permission validation. Uses tenantDb for queries.
Agent event logging agents/lib/log-decision.js and agents/lib/event-emitter.js provide structured logging and event emission for agent decisions. Uses tenantDb.
ROUTE_MAP multi-target support TODO comment in store-network.js: checkout.completed needs multi-target routing (both review_agent and appointment-agent). Current architecture only supports single agent per event type. Workaround: routes to review_agent only.

Loose Points Summary

Critical (must fix)

Warnings (should fix)

Verification Commands

Run these commands to reproduce the audit findings:

What to check Command
RBAC permission middleware usage count grep -r "requirePermission\|requireAnyPermission\|requireAllPermissions" backend/api/ | wc -l
Files with CircuitBreaker grep -rl "CircuitBreaker" backend/services/
Legacy pool.query in non-test files grep -rl "pool\.query" backend/ --include="*.js" | grep -v test | grep -v node_modules | wc -l
Send-guard integration grep -rn "checkPhone\|checkEmail" backend/services/
Audit route map entry count grep -E "^ '(POST|PUT|PATCH|DELETE)" backend/config/audit-route-map.js | wc -l
Agent ROUTE_MAP entries grep -A1 "ROUTE_MAP" backend/agents/networks/store-network.js
RBAC permission-map test cd backend && npm run test:permission-map
Explicit permission guard test cd backend && npm run test:explicit-permission