AI Decision Architecture

Design Rationale, Competitive Research & Implementation Notes

2026-03-19 · Internal Document
Table of Contents
  1. Three-Layer Decision Architecture
  2. Override Flywheel β€” Corrected Design
  3. Data Retrieval Strategy
  4. Sparse Reward Problem
  5. Cross-Industry Competitive Research
  6. SaaS Value Layer Analysis

1. Three-Layer Decision Architecture

The agent system uses a three-layer hierarchy for decision-making, not a binary rules-vs-LLM split:

LayerCoverageMechanismCost
L1: Rules Engine~80%Deterministic SOP state machines, configurable per store$0
L2: Contextual Bandit~15%Thompson Sampling, cross-store shared model, learns from Yes/No + business outcomes$0 (CPU-only)
L3: LLM~5%Claude Sonnet for truly novel/unknown situations~$12/store/month

Why Contextual Bandit (L2) Instead of Direct LLM?

Candidate Pre-Filtering (Spotify Pattern)

The rules engine first narrows candidates (e.g., 3–5 eligible technicians based on skill, availability, certification). The Bandit then selects among qualified candidates only. This limits the negative impact of exploration β€” the Bandit cannot recommend someone unqualified.

Reference: Spotify pre-selects 100 most relevant items before contextual bandit explores, limiting UX impact. Result: 36.6% improvement in impression efficiency.

Two-Layer Reward Signal

reward = w1 Γ— immediate_signal + w2 Γ— delayed_signal

immediate_signal: store manager Yes=1, No=0 (available in seconds)
delayed_signal:   business outcome metrics (batch-computed daily)
  - customer satisfaction score
  - service duration vs. expected
  - rebooking rate
  - same-day revenue impact

Why Two Layers Matter

Immediate = YesImmediate = No (Override)
Delayed = Good outcomeStrong positive β€” AI correct, human agreesWeak negative β€” override worked, but AI may also have been fine
Delayed = Bad outcomeMost valuable case β€” AI wrong, human missed it too (blind spot)Strong negative β€” AI wrong, override also didn't help

The lower-left quadrant (accepted but bad outcome) is the most valuable training signal β€” it reveals systematic blind spots in both AI and human judgment.

Reference: DoorDash uses daily batch reward computation rather than instant feedback. Stitch Fix uses stylist curation (immediate) + customer keep/return (delayed) as two-layer signal.

Cross-Store Sharing

All stores contribute to a single Bandit model with store_id as a context feature. This effectively multiplies training signal by the number of active stores.

Reference: ServiceTitan uses industry benchmarks as priors for first 3–4 months, then transitions to per-company models. Toast leverages 130K+ locations for cross-location insights.

2. Override Flywheel β€” Corrected Design

The original product-definition.md stated: "Every time an employee rejects an AI suggestion β†’ system learns a new rule." This was oversimplified. A single override is noise, not signal. The corrected design below was established 2026-03-19.

What Overrides Are NOT

What Overrides ARE

The Correct Flywheel

Stage 1: Data Collection (continuous, zero AI cost)
  Agent makes decision β†’ Employee Yes/No β†’ Store as raw structured record
  β†’ Bandit distributions update immediately

Stage 2: Bandit Learning (automatic)
  Distributions converge over many observations
  β†’ Recommendations improve incrementally
  β†’ No explicit analysis needed

Stage 3: Convergence Detection (automatic)
  When a distribution becomes highly concentrated
  (one action dominates with high confidence, across multiple stores)
  β†’ System generates "suggested rule" event

Stage 4: Human-in-the-Loop Rule Graduation
  Super Admin reviews suggested rule with supporting data:
  - Frequency and consistency across stores
  - Business outcome correlation
  - Which stores contributed data
  β†’ Admin adopts / modifies / dismisses

Stage 5: Cost Reduction
  Adopted rules move from L2 (Bandit) to L1 (Rules Engine)
  β†’ Zero marginal cost, deterministic execution
  β†’ The system gets cheaper over time
The flywheel compounds on three axes: accuracy (more data β†’ better Bandit), cost (rule graduation β†’ fewer Bandit/LLM calls), and switching cost (accumulated decision data is non-portable).

3. Data Retrieval Strategy

Core Principle: No Vector Search Needed

Override data is highly structured (service type, time, employee, action). Use SQL, not embeddings.

Progressive Relaxation Query

When the Bandit has insufficient data for a specific context (cold start for a new combination), the system retrieves historical overrides using progressively relaxed SQL queries:

Round 1: Exact match
  service=gel_extension AND day=friday AND hour=19
  β†’ 3 results β†’ sufficient, use these

Round 1: Exact match (different case)
  service=acrylic_fullset AND day=thursday AND hour=15
  β†’ 0 results β†’ relax

Round 2: Relax service dimension, match on duration
  estimated_minutes > 60 AND day=thursday AND hour=15
  β†’ 0 results β†’ relax further

Round 3: Relax time dimension, match on traffic tier
  estimated_minutes > 60 AND is_peak_hour=true
  β†’ 8 results β†’ sufficient

Each round is a SQL query (< 10ms). Results are aggregated into a compact summary (< 100 tokens) before being fed to the LLM as few-shot context.

Why NOT Vector Search

Context Compression for LLM

❌ Wrong: Feed 20 raw override records (wastes tokens)
"3/7 Amy→Lisa, 3/14 Mike→Chen, 3/21 Amy→Lisa..."

βœ… Right: Pre-aggregate in SQL, feed summary
"Past 60 days, Fri 18-21 gel extension walk-in assignment
 overridden 14 times, 86% junior→senior.
 Across 3 stores, 4 managers. Last: 3/21 Queens #3."

Result: < 100 tokens, same information density

4. Sparse Reward Problem

The Challenge

With ~1M daily operations across 50 stores, only a tiny fraction contain valuable learning signals. Processing every record with AI is wasteful.

Solution: Let Signal Emerge Through Bandit Convergence

The Contextual Bandit naturally solves the sparse signal problem:

Previous approaches considered and rejected:

5. Cross-Industry Competitive Research

Researched: 2026-03-19

Beauty/Salon Vertical β€” No AI Decision Loops Exist

PlatformAI ApproachLearns from Feedback?
ZenotiRule-based segmentation + NLP receptionist + Smart MarketingNo
MindbodyTrigger-based automation + Attentive partnershipNo
PhorestBehavior-based triggers (Client Reconnect)No
BoulevardManual tags, no AI decision layerNo
VagaroAI receptionist onlyNo
MaSeNoneNo

Cross-Industry References

ServiceTitan β€” Dispatch Pro (Closest Analogy)

DoorDash β€” MAB Platform

Spotify β€” Contextual Bandits (Mar 2025)

Netflix β€” Artwork Personalization

Stitch Fix β€” Human-in-the-Loop Gold Standard

Industry Maturity Spectrum

ApproachWho Uses ItCeloria Relevance
Rule-based automationAll salon SaaS (Zenoti, Phorest, etc.)Our L1 β€” table stakes, not differentiator
Batch retrainingHealthcare (Viz.ai, Aidoc), UberToo heavy for our stage; relevant post-scale
Multi-armed banditsDoorDash, Netflix, Stitch FixOur L2 β€” proven at scale, lightweight
Contextual banditsSpotify, NetflixOur L2 target β€” context-aware decisions
Full RL (value iteration)Uber (matching), DoorDash (dispatch)Future consideration for multi-store orchestration
Self-improving agent loopsForethought, PrestoInteresting for SOP auto-generation

6. SaaS Value Layer Analysis

Three-Layer Value Model

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 1: UI / Interaction                  β”‚  ← AI Agents replacing this layer
β”‚  (Booking, scheduling, admin CRUD, POS)     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 2: Business Logic / Permissions      β”‚  ← Agent tools can partially replace
β”‚  (RBAC, multi-tenant, workflow automation)   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 3: Data / Intelligence / Compliance  β”‚  ← Enduring value, new access layer
β”‚  (Domain models, decision data, audit)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Celoria's Current Position (as of 2026-03-19)

"SaaS is Dead" β€” Nuanced Take

The key question for any SaaS in the AI era: "What percentage of your value is in Layer 3?" If most value is in Layers 1–2, you're vulnerable. If Layer 3 is where your differentiation lives, AI agents are your distribution channel, not your competitor.