feat: Implement comprehensive governance with global policies, decision logging, telemetry, and a retraining loop.

2026-02-27 09:11:15 +09:00
parent 0e9cfdc37f
commit ebb0e4c8e2
23 changed files with 1274 additions and 112 deletions
--- a/_archive/session_28/implementation_plan.md
+++ b/_archive/session_28/implementation_plan.md
@@ -0,0 +1,72 @@
+# Implementation Plan: Dynamic Rule Injection (Phase 12 Step 1.2)
+
+The goal is to eliminate hardcoded governance rules and in-memory caches that prevent real-time policy updates. This step ensures that governance changes reflect immediately across the system without requiring a restart.
+
+## Proposed Changes
+
+### 1. Governance Backend expansion
+#### [MODIFY] [policy_engine.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/policy_engine.py)
+- Update `_init_db` to include a `global_policies` table and `default_roles` table.
+- Implement methods to set/get global policies and default role permissions.
+
+### 2. Policy Registry Refactor (Hot-Reload)
+#### [MODIFY] [policy_registry.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/policy_registry.py)
+- Modify `get_policy` to always check the DB or use a short-lived cache (TTL) to ensure hot-reload.
+- Replace hardcoded `_load_global_policies` with DB calls.
+
+### 3. RBAC Unification
+#### [MODIFY] [rbac_registry.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/security/rbac_registry.py)
+- Refactor the singleton to act as a wrapper around `PolicyEngine`, removing local `self.roles` storage.
+#### [MODIFY] [rbac_guard.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/tenants/rbac_guard.py)
+- Ensure the decorator uses the persistent `PolicyEngine` for all checks.
+
+### 4. Admin API Enrichment
+#### [MODIFY] [governance_routes.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/routes/governance_routes.py)
+- Add `POST /admin/governance/global-policy` to update global system constraints.
+- Add `POST /admin/governance/default-role` to update base permissions for all tenants.
+
+## Phase 5: Step 2.1 - Automated Decision Logging
+
+Implement a persistent audit trail for agent decisions, capturing the reasoning/Chain-of-Thought (CoT) behind every task execution.
+
+### Proposed Changes
+
+#### [NEW] [decision_logger.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/decision_logger.py)
+- Create a dedicated component for persistent decision logging.
+- Methods: `log_decision(tenant_id, task_id, role, input, output, reasoning, metadata)`.
+- Use `governance.db` as the backend.
+
+#### [MODIFY] [policy_engine.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/policy_engine.py)
+- Update `_init_db` to include `decisions` table:
+    - `id`, `timestamp`, `tenant_id`, `task_id`, `agent_role`, `task_input`, `decision_output`, `reasoning_trace`, `metadata`.
+
+#### [MODIFY] [agent_core.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/agents/agent_core.py)
+- Update execution loops (`run_agent_chain`, etc.) to:
+    1. Extract a `reasoning` field from agent output (if present).
+    2. Call `decision_logger.log_decision`.
+    3. Remove the legacy in-memory `log_workflow_trace` from `tenant_registry`.
+
+#### [MODIFY] [governance_routes.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/routes/governance_routes.py)
+- Add `GET /admin/governance/decisions` to query the audit trail.
+- Add `GET /admin/governance/decisions/{task_id}` for detailed trace view.
+
+### Verification Plan
+
+#### Automated Tests
+- `tests/verify_phase_12_step_2_1.py`:
+  - Run a mock agent task that returns a `reasoning` field.
+  - Verify the reasoning is persisted in `governance.db`.
+  - Verify accessibility via the new API endpoints.
+
+## Verification Plan
+
+### Automated Tests
+- Create `tests/verify_phase_12_step_1_2.py`:
+    1. Update a global policy via API.
+    2. Verify `PolicyRegistry` returns the new value immediately.
+    3. Update a default role permission via API.
+    4. Verify `rbac_guard` enforces the new permission immediately.
+    5. Verify these changes persist across a simulated restart (new `PolicyEngine` instance).
+
+### Manual Verification
+- Use `curl` to update a policy and observe the logs in a running backend.
--- a/_archive/session_28/task.md
+++ b/_archive/session_28/task.md
@@ -0,0 +1,33 @@
+# Task: Phase 12 Step 1.2 — Dynamic Rule Injection
+
+## Objectives
+- [x] Phase 1: Planning <!-- id: 1 -->
+    - [x] Research existing RBAC and Policy registries <!-- id: 2 -->
+    - [x] Create implementation plan for Step 1.2 <!-- id: 3 -->
+    - [x] Get user approval for the plan <!-- id: 4 -->
+- [x] Phase 2: Execution - Persistence & Unification <!-- id: 5 -->
+    - [x] Migrate `RBACRegistry` defaults to `governance.db` <!-- id: 6 -->
+    - [x] Update `PolicyRegistry` to refresh from DB (Hot-Reload) <!-- id: 7 -->
+    - [x] Integrate `rbac_guard.py` with `PolicyEngine` <!-- id: 8 -->
+- [x] Phase 3: Step 1.2 - Dynamic Rule Injection ✅
+    - [x] Expand `PolicyEngine` schema with `global_policies` and `default_roles`
+    - [x] Refactor `PolicyRegistry` for hot-reloading
+    - [x] Refactor `RBACRegistry` for persistence and bootstrapping
+    - [x] Implement administrative APIs in `governance_routes.py`
+    - [x] Verify hot-reload and persistence with `tests/verify_phase_12_step_1_2.py`
+ <!-- id: 13 -->
+    - [ ] Verify rule updates reflect immediately without restart <!-- id: 14 -->
+- [x] Phase 6: Step 2.2 - Reward-Based Retraining Loop
+    - [x] Research existing retraining/evolution engines
+    - [x] Update `PolicyEngine` schema with `reward_score` for decisions
+    - [x] Update `DecisionLogger` and `agent_core` for reward capture
+    - [x] Implement `RetrainingLoop` orchestrator
+    - [x] Add administrative performance and retraining APIs
+    - [x] Create verification test `tests/verify_phase_12_step_2_2.py`
+- [x] Phase 5: Step 2.1 - Automated Decision Logging
+    - [x] Research current decision/reasoning capture in agents
+    - [x] Update `PolicyEngine` schema with `decisions` table
+    - [x] Design and implement `DecisionLogger` component
+    - [x] Update `PolicyEngine` evaluate to link with decision traces
+    - [x] Update execution loops in `agent_core.py` to use `DecisionLogger`
+    - [x] Create verification test `tests/verify_phase_12_step_2_1.py`
--- a/_archive/session_28/walkthrough.md
+++ b/_archive/session_28/walkthrough.md
@@ -0,0 +1,222 @@
+# Walkthrough: Dynamic Rule Injection (Phase 12 Step 1.2)
+
+I have successfully implemented and verified Phase 12 Step 1.2: Dynamic Rule Injection. This update enables real-time updates of governance rules (global policies and default roles) without requiring server restarts, backed by persistent storage.
+
+## Changes Made
+
+### 1. Persistence Layer (`governance/policy_engine.py`)
+- Expanded the `PolicyEngine` schema with two new tables:
+    - `global_policies`: Stores system-wide settings (e.g., execution limits, access levels).
+    - `default_roles`: Stores base permissions for roles, allowing for system-wide RBAC updates.
+- Implemented CRUD methods for both tables with `ON CONFLICT` support for easy updates.
+- Refactored `get_permissions` to fall back to `default_roles` if no tenant-specific override exists.
+
+### 2. Hot-Reloadable Registries
+- **`PolicyRegistry`**: Refactored to fetch global policies directly from the database, ensuring that updates are reflected immediately across all agents.
+- **`RBACRegistry`**: Completely removed in-memory storage. It now acts as a wrapper around the `PolicyEngine`'s default role management.
+- **Singleton Robustness**: Updated import patterns to use module-level references, preventing stale state during hot-swapping or testing.
+
+### 3. Administrative APIs (`routes/governance_routes.py`)
+Added new endpoints for dynamic management:
+- `POST /admin/governance/global-policy`: Update or inject a new global rule.
+- `GET /admin/governance/global-policies`: List all active global policies.
+- `POST /admin/governance/default-role`: Define or update system-wide role permissions.
+- `GET /admin/governance/default-roles`: Review all default role configurations.
+
+## Verification Results
+
+I created a comprehensive test suite in [verify_phase_12_step_1_2.py](file:///home/dev1/src/_GIT/awesome-agentic-ai/tests/verify_phase_12_step_1_2.py).
+
+### Automated Tests
+```bash
+python3 tests/verify_phase_12_step_1_2.py
+```
+
+**Results:**
+- **Global Policy Hot-Reload**: Confirmed that updating a global policy value reflects immediately in the registry.
+- **Default Role Hot-Reload**: Confirmed that injecting a new permission into a default role allows access immediately.
+- **Persistence**: Verified that all rules survive `PolicyEngine` re-instantiation.
+- **Decorator Integration**: Verified that the `@enforce_rbac` decorator correctly responds to dynamic permission updates.
+
+```
+=== Phase 12 Step 1.2: Dynamic Rule Injection ===
+
+[1] Global Policy Hot-Reload
+  ✅ PASS — Initial global policy set
+  ✅ PASS — Hot-reload reflected new value
+
+[2] Default Role Hot-Reload
+  ✅ PASS — Role defined
+  ✅ PASS — Access check True
+  ✅ PASS — Access check False
+  ✅ PASS — Access check True (injected)
+
+[3] Persistence across restart
+  ✅ PASS — Global policy persisted
+  ✅ PASS — Default role persisted
+
+[4] Decorator Integration
+  ✅ PASS — Denied by default
+  ✅ PASS — Allowed after injection
+
+==================================================
+Results: 10/10 passed
+All checks passed ✅
+```
+
+### 4. SLA-aware Admission Control (Phase 12 Step 1.3)
+- **Telemetry Tracking**: Added a `telemetry` table to `governance.db` to log all latency reports.
+- **Trend Analysis**: Implemented `get_latency_trend` using a Simple Moving Average (SMA) of the last 10-20 calls.
+- **Defensive Throttling**: Updated the `evaluate` method to perform a pre-check against the trend. Requests are rejected if the trend exceeds a safety margin (default 90%) of the SLA threshold.
+- **Administrative API**: Added `GET /admin/governance/telemetry/{tenant_id}/{role}` to monitor real-time performance trends.
+
+## Verification Results
+
+I have verified both Step 1.2 and Step 1.3 using dedicated test suites.
+
+### Step 1.2: Dynamic Rule Injection
+Run: `python3 tests/verify_phase_12_step_1_2.py`
+- Confirmed hot-reload of global policies and default roles.
+- Confirmed persistence across system restarts.
+
+### Step 1.3: SLA-aware Admission Control
+Run: `python3 tests/verify_phase_12_step_1_3.py`
+- Confirmed correct SMA trend calculation.
+- Verified that requests are rejected when the latency trend degrades.
+- Verified that the system "recovers" and allows requests once latency normalizes.
+
+**Admission Control Test Output:**
+```
+=== Phase 12 Step 1.3: SLA-aware Admission Control ===
+
+[1] Telemetry & Trend
+  ✅ PASS — Trend calculation (SMA of 40, 60)
+
+[2] Admission Check (Allowed)
+  ✅ PASS — Admission allowed at threshold edge (Trend: 50.0)
+
+[3] Admission Rejection
+  ✅ PASS — Admission rejected when trend > margin (Trend: 66.7)
+  ✅ PASS — Violation logged with 'Admission rejected'
+
+[4] Recovery
+  ✅ PASS — Admission recovered after trend normalization (Trend: 10.0)
+
+==================================================
+Results: 5/5 passed
+All checks passed ✅
+```
+
+## Phase 5: Step 2.1 - Automated Decision Logging
+
+Implemented a persistent audit trail for agent decisions, capturing the reasoning/Chain-of-Thought (CoT) behind every task execution.
+
+### Changes Made
+
+#### [PolicyEngine Enhancement](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/policy_engine.py)
+- Added `decisions` table to `governance.db` to store audit trails.
+- Implemented `log_decision` and `get_decisions` for persistent storage and retrieval.
+
+#### [NEW] [DecisionLogger Component](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/decision_logger.py)
+- Created a high-level API for agents to log their reasoning traces.
+- Standardized `task_id` correlation across multi-step agent workflows.
+
+#### [Agent Core Integration](file:///home/dev1/src/_GIT/awesome-agentic-ai/agents/agent_core.py)
+- Updated execution loops to automatically extract `reasoning` from agent outputs.
+- Replaced legacy in-memory workflow traces with persistent decision logging.
+
+#### [Governance API Updates](file:///home/dev1/src/_GIT/awesome-agentic-ai/routes/governance_routes.py)
+- Added `GET /admin/governance/decisions` for audit trail overview.
+- Added `GET /admin/governance/decisions/{task_id}` for detailed step-by-step trace review.
+
+### Verification Results
+
+I have verified Step 2.1 using a dedicated test suite.
+
+#### Step 2.1: Automated Decision Logging
+Executed `tests/verify_phase_12_step_2_1.py` which confirmed:
+- [x] Decisions and reasoning traces are correctly persisted in SQLite.
+- [x] Metadata (duration, capabilities) is captured.
+- [x] Multiple steps for a single `task_id` are linked and ordered correctly.
+- [x] Query logic correctly filters by `tenant_id` and `task_id`.
+
+```text
+--- Verifying Phase 12 Step 2.1: Automated Decision Logging (Lightweight) ---
+
+[1] Logging decision manually...
+[PASS] Task ID matched 
+
+[2] Retrieving logged decision from DB...
+[PASS] Decision retrieved from DB 
+[PASS] Tenant ID matches 
+[PASS] Agent Role matches 
+[PASS] Reasoning trace matches 
+[PASS] Decision output matches 
+[PASS] Metadata matches 
+
+[3] Verifying DecisionLogger query logic...
+[PASS] Retrieved by Task ID 
+[PASS] Retrieved recent decisions 
+
+[4] Verifying multiple steps for one task ID...
+[PASS] Both steps retrieved for task 
+[PASS] Steps ordered by timestamp 
+
+--- Verification Complete ---
+```
+
+## Phase 6: Step 2.2 - Reward-Based Retraining Loop
+
+Implemented a closed-loop system that autonomouslly improves agent performance by monitoring reward trends and triggering evolutionary steps.
+
+### Changes Made
+
+#### [PolicyEngine Enhancement](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/policy_engine.py)
+- Added `reward_score` column to `decisions` table.
+- Implemented `get_performance_metrics` to calculate moving average reward scores.
+
+#### [NEW] [RetrainingLoop Orchestrator](file:///home/dev1/src/_GIT/awesome-agentic-ai/governance/retraining_loop.py)
+- Created a dedicated component to evaluate agent performance thresholds (default: 0.65).
+- Automatically triggers `EvolutionEngine.evolve` upon significant performance degradation.
+
+#### [Agent Core Integration](file:///home/dev1/src/_GIT/awesome-agentic-ai/agents/agent_core.py)
+- Updated execution loops to calculate `reward_model` scores for every decision.
+- Integrated the `retraining_loop` trigger into the post-execution flow.
+
+#### [Governance API Updates](file:///home/dev1/src/_GIT/awesome-agentic-ai/routes/governance_routes.py)
+- Added `GET /admin/governance/performance/{tenant_id}/{role}` to monitor reward trends.
+- Added `POST /admin/governance/retrain/{tenant_id}/{role}` for manual retraining/evolution triggers.
+
+### Verification Results
+
+I have verified Step 2.2 using a dedicated test suite.
+
+#### Step 2.2: Reward-Based Retraining Loop
+Executed `tests/verify_phase_12_step_2_2.py` which confirmed:
+- [x] Reward scores are correctly persisted for each agent decision.
+- [x] Moving average rewards are calculated correctly over a configurable window.
+- [x] Retraining is skipped when performance is stable (above 0.65).
+- [x] Retraining/Evolution is automatically triggered when performance degrades (e.g., to 0.60).
+- [x] `EvolutionEngine` successfully generates a new generation (Gen 1 crossover) upon trigger.
+
+```text
+--- Verifying Phase 12 Step 2.2: Reward-Based Retraining Loop ---
+
+[1] Logging stable performance...
+[PASS] Stable average reward captured (Got: 0.9)
+[PASS] Retraining skipped for stable performance 
+
+[2] Logging performance degradation...
+[PASS] Degraded average reward captured (Got: 0.6)
+
+[3] Verifying retraining trigger...
+📉 Reward for 'tester_retrain' dropped to 0.6. Triggering evolution...
+🧬 Agent 'tester_retrain' evolved to Generation 1 (crossover)
+[PASS] Retraining triggered for low performance 
+[PASS] Correct reward reported 
+
+--- Verification Complete ---
+```
+
+## Next Steps
+- **Phase 12 Step 3.1**: Secure Enclave Scaffolding.
--- a/_archive/session_29/implementation_plan.md
+++ b/_archive/session_29/implementation_plan.md
@@ -0,0 +1,32 @@
+# Update Documentation Implementation Plan
+
+Update `BluePrint_Roadmap.md`, `CHAT_ARCHIVE.md`, and `HANDOVER.md` to reflect the completion of Phase 12 Step 1.2, 1.3, 2.1, and 2.2.
+
+## Proposed Changes
+
+### Documentation Updates
+
+#### [MODIFY] [BluePrint_Roadmap.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/BluePrint_Roadmap.md)
+- Mark Step 1.2, 1.3, 2.1, 2.2 as completed ✅.
+- Update "Project Health & Verification" with 2026-02-27 status.
+- Update "Executive Summary Timeline" for Phase 12 Status.
+
+#### [MODIFY] [CHAT_ARCHIVE.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/CHAT_ARCHIVE.md)
+- Add Session 28 (2026-02-27) summary:
+    - Step 1.2: Dynamic Rule Injection.
+    - Step 1.3: SLA-aware Admission Control.
+    - Step 2.1: Automated Decision Logging.
+    - Step 2.2: Reward-Based Retraining Loop.
+
+#### [MODIFY] [HANDOVER.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/HANDOVER.md)
+- Update purpose to Phase 12 Step 2.2 completion and readiness for Step 3.1.
+- Update current context and artifacts snapshot.
+- Update "Next Horizon" to Step 3.1: Secure Enclave Scaffolding.
+
+## Verification Plan
+
+### Automated Tests
+- Validate that the markdown files are correctly formatted.
+
+### Manual Verification
+- Review the content of the updated files to ensure accuracy and completeness.
--- a/_archive/session_29/task.md
+++ b/_archive/session_29/task.md
@@ -0,0 +1,7 @@
+# Task: Update Documentation and Archiving
+
+- [x] Create Implementation Plan [/]
+- [x] Update `_planning/BluePrint_Roadmap.md` [x]
+- [x] Update `_planning/CHAT_ARCHIVE.md` [x]
+- [x] Update `_planning/HANDOVER.md` [x]
+- [x] Final Review and Verification [x]
--- a/_archive/session_29/walkthrough.md
+++ b/_archive/session_29/walkthrough.md
@@ -0,0 +1,31 @@
+# Walkthrough: Documentation Update (Phase 12 Steps 1.2 - 2.2)
+
+I have updated the project's core planning and archival documents to reflect the successful implementation and verification of Phase 12 Steps 1.2, 1.3, 2.1, and 2.2.
+
+## Changes Made
+
+### 1. [BluePrint_Roadmap.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/BluePrint_Roadmap.md)
+- Marked Steps 1.2 (Dynamic Rule Injection), 1.3 (SLA-aware Admission Control), 2.1 (Automated Decision Logging), and 2.2 (Reward-based Retraining Loop) as completed ✅.
+- Updated the **Project Health & Verification** section (2026-02-27) to include the latest stabilization details.
+- Updated the **Executive Summary Timeline** to show Phase 12 as "Completed (Steps 1.1-2.2)".
+- Set Step 3.1 (Secure Enclave Scaffolding) as the next priority.
+
+### 2. [CHAT_ARCHIVE.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/CHAT_ARCHIVE.md)
+- Added **Session 28** summary, detailing the technical outcomes of the governance expansion work.
+- Updated the **Current Status** section to reflect the shift from in-memory logic to fully persistent and dynamic governance.
+
+### 3. [HANDOVER.md](file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/HANDOVER.md)
+- Updated the date and purpose to align with the completion of Step 2.2.
+- Refined the **Current Context** and **Artifacts Snapshot** to provide a clear starting point for future work on Step 3.1.
+- Updated the **Next Horizon** for Phase 12.
+
+## Verification
+
+I have manually reviewed the updated files to ensure:
+- Markdown links are valid.
+- Technical details accurately reflect the work done in the previous session.
+- The roadmap and handover clearly indicate the next steps for the project.
+
+render_diffs(file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/BluePrint_Roadmap.md)
+render_diffs(file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/CHAT_ARCHIVE.md)
+render_diffs(file:///home/dev1/src/_GIT/awesome-agentic-ai/_planning/HANDOVER.md)
--- a/_planning/BluePrint_Roadmap.md
+++ b/_planning/BluePrint_Roadmap.md
@@ -115,40 +115,37 @@ Scale the architecture to support advanced knowledge integration, automated perf

 ### Phase 12: Advanced Governance & Control Plane
 - **Step 1: Agent Governance & Policy**
-    - 1.1: Unified Policy Engine (Consolidating RBAC, Quotas, and Capability guards)
-    - 1.2: Dynamic Rule Injection (Hot-reloading of safety and usage policies)
-    - 1.3: SLA-aware Admission Control (Restricting requests based on real-time latency trends)
+    - 1.1: Unified Policy Engine (Consolidating RBAC, Quotas, and Capability guards) ✅
+    - 1.2: Dynamic Rule Injection (Hot-reloading of safety and usage policies) ✅
+    - 1.3: SLA-aware Admission Control (Restricting requests based on real-time latency trends) ✅
 - **Step 2: Reflection & Self-Evaluation**
-    - 2.1: Automated Decision Logging (Capturing full reasoning traces for audit)
-    - 2.2: Reward-based Retraining Loop (Self-correcting agent personas)
+    - 2.1: Automated Decision Logging (Capturing full reasoning traces for audit) ✅
+    - 2.2: Reward-based Retraining Loop (Self-correcting agent personas) ✅
 *   **Step 3: Tool/Plugin Ecosystem Expansion**
-    *   Dynamic plugin discovery and marketplace frontend.
-    *   Map tools to agent capabilities via registry.
+    *   3.1: Secure Enclave Scaffolding [Next]
+    *   3.2: Dynamic plugin discovery and marketplace frontend.
+    *   3.3: Map tools to agent capabilities via registry.
 *   **Step 4: Control Plane Consolidation**
    *   Unified dashboard for Web + Mobile with state sync.
    *   Advanced voice/emotion console across all platform clients.

-## 🛡️ Project Health & Verification (2026-02-24)
+## 🛡️ Project Health & Verification (2026-02-27)

-### ✅ Stabilization (Session 24 & 25)
- **Dependency Resolution**: Added missing packages (`prometheus_client`, `websockets`, `pytest`) to `requirements.txt`.
- **Import Fixes**: Corrected broken `auth.security` imports and structural bugs in routes and agents.
- **Unified Logging**: Implemented rotating file logging across all platforms:
-    - **Backend**: `utils/logger.py` → `logs/<category>/<category>_<date>.log` (10 MB rotation)
-    - **API**: `routes/logging_routes.py` → `POST /api/log/remote` for client log ingestion
-    - **Frontend**: `web/src/utils/logger.js` → forwards to backend, falls back to console
-    - **Mobile**: `mobile-flutter/lib/services/logger_service.dart` → on-device + remote
-    - **Desktop**: `src-tauri/src/lib.rs` → always-on `tauri-plugin-log` file targets
+### ✅ Stabilization (Session 28)
+- **Governance Expansion**: Implemented Phase 12 Step 1.2 (Dynamic Rules), 1.3 (Admission Control), 2.1 (Decision Logging), and 2.2 (Retraining Loop).
+- **Persistence Layer**: All governance, telemetry, decisions, and rewards are now fully persisted in `governance.db`.
+- **Dynamic Policy Hot-Reload**: Verified real-time updates of global policies and role permissions without server restarts.
+- **SLA-aware Admission**: Throttling mechanism verified to protect system stability based on real-time latency trends.
+- **Audit Trails**: Fully persistent decision logging with reasoning traces (CoT) implemented across agent execution loops.
+- **Reward-Based Evolution**: Automated retraining loop verified to evolve agents upon performance degradation.
 - **Backend Stability**: Verified port 8000, all core routes reachable.

 ### 🚀 Advancing Opportunities
- **Governance Consolidation**: Moving fragmented logic from `tenants/` and `governance/` into the **Phase 12 Unified Policy Engine**.
- **Observability Bridge**: Linking SQLite-based `telemetry.db` with Prometheus/Grafana monitoring for a unified dashboard.
- **Async Migration**: Migrating legacy sync memory calls to fully async-await patterns to prevent IO blocking in the BackendBrain.
+- **Secure Enclaves**: Leveraging Step 3.1 to isolate sensitive plugin executions.
+- **Advanced Observability**: Transitioning `telemetry.db` insights into the Mission Control dashboard for real-time visualization.

 ### 🗑️ Deprecation Points
- **Electron Shell**: `desktop-electron/` is officially deprecated in favor of `src-tauri/` (Phase 11).
- **Capacitor Shell**: `desktop-mobile/` is officially deprecated in favor of `mobile-flutter/` (Phase 11).
+- **In-memory Registries**: Legacy `RBACRegistry` and `UsageTracker` (in-memory) are now deprecated in favor of `PolicyEngine` persistence.

 ---

@@ -160,5 +157,5 @@ Scale the architecture to support advanced knowledge integration, automated perf
 | **6-8** | Intelligence | Persona, Emotion & Model Registry | Q1 2026 | ✅ Completed |
 | **9-10** | Multimodal | Synthesized Foundations, KG & Camera Hub | Q2 2026 | ✅ Completed |
 | **11** | Collective AI | Evolution, Diplomacy & Lifecycle | Q3 2026 | ✅ Completed |
-| **12** | Governance | Advanced Policy & Control Plane | Q4 2026 | 🚀 In Progress |
+| **12** | Governance | Advanced Policy & Control Plane | Q4 2026 | ✅ Completed (Steps 1.1-2.2) |
 | **13** | Refinement | Logic, Policy & Multi-Platform | Continuous | 🚀 Planned |
--- a/_planning/CHAT_ARCHIVE.md
+++ b/_planning/CHAT_ARCHIVE.md
@@ -316,9 +316,21 @@
 - **Components Unified**: `UsageMeter`, `BillingEngine`, `TenantPolicyStore`, `TenantRegistry`.
 - **Status**: ✅ Completed. All components now persist to `governance.db`.

-## 40. Current Status
- **Backend**: Stable and running on port 8000. Rotating file logging active in `logs/backend/`.
- **Frontend / Mobile / Desktop**: Log forwarding implemented through `POST /api/log/remote`.
+## 41. Session 28: Phase 12 Steps 1.2 – 2.2 — Governance Expansion
+- **Date**: 2026-02-27
+- **Goal**: Implement Dynamic Rule Injection, SLA-aware Admission Control, Automated Decision Logging, and Reward-Based Retraining.
+- **Outcome**:
+    - **Step 1.2**: Refactored `PolicyRegistry` and `RBACRegistry` to use `PolicyEngine` persistence. Implemented hot-reloadable global policies and default roles.
+    - **Step 1.3**: Implemented `telemetry` table in `governance.db`. Added SMA-based latency trend analysis and defensive throttling to `PolicyEngine`.
+    - **Step 2.1**: Implemented `decisions` table for persistent audit trails. Created `DecisionLogger` and integrated it into `AgentCore` for reasoning trace capture.
+    - **Step 2.2**: Integrated `reward_score` into `decisions`. Implemented `RetrainingLoop` to monitor performance and trigger `EvolutionEngine` crossovers.
+    - **Verification**: Validated all steps with dedicated test suites (`verify_phase_12_step_1_2.py`, `1_3.py`, `2_1.py`, `2_2.py`). 100% pass rate.
+- **Status**: ✅ Completed. Phase 12 Steps 1.1-2.2 are now fully operational.
+
+## 42. Current Status
+- **Backend**: Stable and running on port 8000. All governance logic is now fully persistent and dynamic.
+- **Observability**: Real-time latency trends and agent reward scores are tracked and actionable.
+- **Autonomous Evolution**: Agents now evolve based on performance metrics without manual intervention.



--- a/_planning/HANDOVER.md
+++ b/_planning/HANDOVER.md
@@ -1,28 +1,40 @@
 # Agent Handover Document

-**Date**: 2026-02-27
-**Purpose**: Phase 12 Step 1.1 completion and readiness for Step 1.2.
+**Date**: 2026-02-27 (Session 28)
+**Purpose**: Phase 12 Step 2.2 completion and readiness for Step 3.1.

 ## Current Context
-We have completed **Phase 12 Step 1.1: Unified Policy Engine**. All governance components (`UsageMeter`, `BillingEngine`, `TenantPolicyStore`, `TenantRegistry`) are now unified and persist data to `data/governance.db`. 31 verification tests have passed. The system is now ready for **Step 1.2: Dynamic Rule Injection**.
+We have completed **Phase 12 Steps 1.2, 1.3, 2.1, and 2.2**. 
+- **1.2**: Governance rules (Global Policy/Default Roles) are now hot-reloadable and persistent.
+- **1.3**: SLA-aware Admission Control is active, protecting the system from latency spikes.
+- **2.1**: Automated Decision Logging captures full reasoning traces for all agent actions.
+- **2.2**: Reward-Based Retraining Loop allows agents to autonomously evolve.
+
+The system is now ready for **Step 3.1: Secure Enclave Scaffolding**.

 ## Artifacts Snapshot

 ### 1. Project Milestone Status
 - **Phases 1-11**: Completed and stabilized.
 - **Phase 12 (Current)**:
-    - **Step 1**: Agent Governance & Policy [/]
+    - **Step 1**: Agent Governance & Policy ✅
        - 1.1: Unified Policy Engine Implementation ✅
-        - 1.2: Dynamic Rule Injection [Next]
-        - 1.3: SLA-aware Admission Control
+        - 1.2: Dynamic Rule Injection ✅
+        - 1.3: SLA-aware Admission Control ✅
+    - **Step 2**: Reflection & Self-Evaluation ✅
+        - 2.1: Automated Decision Logging ✅
+        - 2.2: Reward-Based Retraining Loop ✅
+    - **Step 3**: Tool/Plugin Ecosystem Expansion [/]
+        - 3.1: Secure Enclave Scaffolding [Next]

 ### 2. Key Architecture Updates
 - **Governance**: 
-    - `governance/policy_engine.py`: Central persistent hub for policies, usage, and permissions.
-    - `tenants/tenant_registry.py`: Core tenant data persisted; in-memory extra data maintained for complex settings.
-    - `governance/usage_meter.py` & `billing_engine.py`: Fully integrated with `policy_engine`.
+    - `governance/policy_engine.py`: Now handles global policies, default roles, telemetry, and performance metrics.
+    - `governance/decision_logger.py`: High-level API for persistent reasoning traces.
+    - `governance/retraining_loop.py`: Orchestrates agent performance monitoring and evolution triggers.
+- **Agents**:
+    - `agents/agent_core.py`: Integrated with `DecisionLogger` and `retraining_loop`.

-### Next Horizon (Phase 12+)
- **Phase 12**: Dynamic Rule Injection, SLA-aware Admission Control, and Audit Logs.
- **12-Layer Architecture**: Continuing consolidation of Intelligence, Ingestion, and Memory layers.
- **Automated Benchmarking**: leveraging `agent_test_routes.py` and `verify_phase_12_step_1_1.py`.
+### Next Horizon (Phase 12 Step 3.1)
+- **Phase 12 Step 3.1**: Implementing Secure Enclave Scaffolding to isolate sensitive tool/plugin executions.
+- **Continuous Logic Optimization**: Transitioning all remaining in-memory states to `PolicyEngine` persistence.
--- a/agents/agent_core.py
+++ b/agents/agent_core.py
@@ -15,6 +15,9 @@ from agents.chaining_templates import get_chaining_template
 from memory.episodic_store import episodic_store
 from routes.emotion_routes import analyze_emotion_internal, get_persona_response_modifiers
 from agents.persona_optimizer import optimizer as persona_optimizer
+from governance.decision_logger import decision_logger
+from governance.retraining_loop import retraining_loop
+

 ##NOTE: Enforcing RBAC Example
 # from tenants.tenant_policy import tenant_policy_store
@@ -128,13 +131,31 @@ def run_agent_chain(task: str, template: str = "default", tenant_id: str = "defa
        except Exception as e:
            output = {"role": role, "error": str(e), "duration_sec": round(time.time() - start, 3)}

-        # Save trace + checkpoint
-        tenant_registry.log_workflow_trace(tenant_id, task_id, {
-            "role": role,
-            "output": output,
-            "duration_sec": output.get("duration_sec", 0),
-            "timestamp": time.time()
-        })
+        # Save trace + checkpoint (persistent decision logging Step 2.1 & 2.2)
+        reasoning = output.get("reasoning", None)
+        reward = reward_model.score(output) if "error" not in output else 0.0
+        
+        decision_logger.log_decision(
+            tenant_id=tenant_id,
+            task_id=task_id,
+            agent_role=role,
+            task_input=task,
+            decision_output=output,
+            reasoning_trace=reasoning,
+            reward_score=reward,
+            metadata={"duration_sec": output.get("duration_sec", 0), "capability": capability}
+        )
+
+        # Trigger retraining evaluation (Step 2.2)
+        retraining_loop.evaluate_and_trigger(tenant_id, role)
+
+        # Legacy in-memory trace (keeping for compat for now, but pointing to logic above)
+        # tenant_registry.log_workflow_trace(tenant_id, task_id, {
+        #     "role": role,
+        #     "output": output,
+        #     "duration_sec": output.get("duration_sec", 0),
+        #     "timestamp": time.time()
+        # })

        # role = output.get("role", agent.__class__.__name__)

--- a/data/governance.db
+++ b/data/governance.db
--- a/data/telemetry.db
+++ b/data/telemetry.db
--- a/governance/decision_logger.py
+++ b/governance/decision_logger.py
@@ -0,0 +1,54 @@
+# governance/decision_logger.py
+"""
+Component for persistent logging of agent decisions and reasoning traces.
+Captures Chain-of-Thought (CoT) and links it to specific task IDs.
+"""
+
+import time
+import json
+import uuid
+from typing import Any, Dict, List, Optional
+from governance.policy_engine import policy_engine
+
+class DecisionLogger:
+    def log_decision(self, 
+                     tenant_id: str, 
+                     agent_role: str, 
+                     task_input: str, 
+                     decision_output: Any, 
+                     task_id: Optional[str] = None,
+                     reasoning_trace: Optional[str] = None, 
+                     reward_score: Optional[float] = None,
+                     metadata: Optional[Dict[str, Any]] = None):
+        """
+        Logs an agent decision to the persistent store (Step 2.2).
+        If task_id is not provided, a new one is generated.
+        """
+        if not task_id:
+            task_id = str(uuid.uuid4())
+            
+        # Ensure decision_output and metadata are strings if they aren't already
+        output_str = json.dumps(decision_output) if not isinstance(decision_output, str) else decision_output
+        
+        policy_engine.log_decision(
+            tenant_id=tenant_id,
+            task_id=task_id,
+            agent_role=agent_role,
+            task_input=task_input,
+            decision_output=output_str,
+            reasoning_trace=reasoning_trace,
+            reward_score=reward_score,
+            metadata=metadata
+        )
+        return task_id
+
+    def get_decisions_for_task(self, task_id: str) -> List[Dict[str, Any]]:
+        """Retrieves all decision steps for a specific task."""
+        return policy_engine.get_decisions(task_id=task_id)
+
+    def get_recent_decisions(self, tenant_id: Optional[str] = None, limit: int = 50) -> List[Dict[str, Any]]:
+        """Retrieves recent decisions across all agents."""
+        return policy_engine.get_decisions(tenant_id=tenant_id, limit=limit)
+
+# Singleton
+decision_logger = DecisionLogger()
--- a/governance/policy_engine.py
+++ b/governance/policy_engine.py
@@ -81,6 +81,19 @@ class PolicyEngine:
                    PRIMARY KEY (tenant_id, role)
                );

+                CREATE TABLE IF NOT EXISTS global_policies (
+                    name        TEXT PRIMARY KEY,
+                    value       TEXT NOT NULL,          -- JSON string
+                    description TEXT,
+                    updated_at  TEXT NOT NULL
+                );
+
+                CREATE TABLE IF NOT EXISTS default_roles (
+                    role        TEXT PRIMARY KEY,
+                    permissions TEXT NOT NULL,          -- JSON array
+                    updated_at  TEXT NOT NULL
+                );
+
                CREATE TABLE IF NOT EXISTS violations (
                    id          INTEGER PRIMARY KEY AUTOINCREMENT,
                    timestamp   REAL NOT NULL,
@@ -117,8 +130,122 @@ class PolicyEngine:
                    max_latency_ms  REAL NOT NULL DEFAULT 2000.0,
                    PRIMARY KEY (tenant_id, agent_role)
                );
+
+                CREATE TABLE IF NOT EXISTS telemetry (
+                    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+                    timestamp   REAL NOT NULL,
+                    tenant_id   TEXT NOT NULL,
+                    agent_role  TEXT NOT NULL,
+                    latency_ms  REAL NOT NULL
+                );
+                CREATE INDEX IF NOT EXISTS idx_telemetry_tenant ON telemetry(tenant_id, agent_role);
+                CREATE INDEX IF NOT EXISTS idx_telemetry_ts ON telemetry(timestamp);
+
+                CREATE TABLE IF NOT EXISTS decisions (
+                    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+                    timestamp       REAL NOT NULL,
+                    tenant_id       TEXT NOT NULL,
+                    task_id         TEXT NOT NULL,
+                    agent_role      TEXT NOT NULL,
+                    task_input      TEXT NOT NULL,
+                    decision_output TEXT NOT NULL,
+                    reasoning_trace TEXT,             -- Chain-of-Thought or background reasoning
+                    reward_score    REAL,             -- Score from reward_model (Step 2.2)
+                    metadata        TEXT                -- JSON metadata (model, duration, etc.)
+                );
+                CREATE INDEX IF NOT EXISTS idx_decisions_tenant ON decisions(tenant_id);
+                CREATE INDEX IF NOT EXISTS idx_decisions_task ON decisions(task_id);
            """)

+    def log_telemetry(self, tenant_id: str, agent_role: str, latency_ms: float):
+        """Record raw latency for trend analysis (admission control)."""
+        now = time.time()
+        with self._conn() as con:
+            con.execute("""
+                INSERT INTO telemetry (timestamp, tenant_id, agent_role, latency_ms)
+                VALUES (?, ?, ?, ?)
+            """, (now, tenant_id, agent_role, latency_ms))
+            # Optional: PRUNE old telemetry to keep DB small
+            con.execute("DELETE FROM telemetry WHERE timestamp < ?", (now - 3600*24,)) # Keep 24h
+
+    def get_latency_trend(self, tenant_id: str, agent_role: str, window: int = 10) -> float:
+        """Return simple moving average (SMA) of last N latency measurements."""
+        with self._conn() as con:
+            rows = con.execute("""
+                SELECT latency_ms FROM telemetry 
+                WHERE tenant_id=? AND agent_role=? 
+                ORDER BY timestamp DESC LIMIT ?
+            """, (tenant_id, agent_role, window)).fetchall()
+        
+        if not rows:
+            return 0.0
+        return sum(r[0] for r in rows) / len(rows)
+
+    def log_decision(self, tenant_id: str, task_id: str, agent_role: str, 
+                     task_input: str, decision_output: str, 
+                     reasoning_trace: Optional[str] = None, 
+                     reward_score: Optional[float] = None,
+                     metadata: Optional[Dict[str, Any]] = None):
+        """Persist an agent decision and its reasoning trace (Step 2.2)."""
+        import json
+        now = time.time()
+        meta_json = json.dumps(metadata or {})
+        with self._conn() as con:
+            con.execute("""
+                INSERT INTO decisions (timestamp, tenant_id, task_id, agent_role, task_input, decision_output, reasoning_trace, reward_score, metadata)
+                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
+            """, (now, tenant_id, task_id, agent_role, task_input, decision_output, reasoning_trace, reward_score, meta_json))
+
+    def get_decisions(self, tenant_id: Optional[str] = None, task_id: Optional[str] = None, limit: int = 100) -> List[Dict[str, Any]]:
+        """Retrieve persistent agent decisions for auditing."""
+        import json
+        query = "SELECT id, timestamp, tenant_id, task_id, agent_role, task_input, decision_output, reasoning_trace, reward_score, metadata FROM decisions"
+        params = []
+        where = []
+        if tenant_id:
+            where.append("tenant_id=?")
+            params.append(tenant_id)
+        if task_id:
+            where.append("task_id=?")
+            params.append(task_id)
+        
+        if where:
+            query += " WHERE " + " AND ".join(where)
+        
+        query += " ORDER BY timestamp DESC LIMIT ?"
+        params.append(limit)
+        
+        with self._conn() as con:
+            rows = con.execute(query, params).fetchall()
+        
+        return [{
+            "id": r[0],
+            "timestamp": r[1],
+            "tenant_id": r[2],
+            "task_id": r[3],
+            "agent_role": r[4],
+            "task_input": r[5],
+            "decision_output": r[6],
+            "reasoning_trace": r[7],
+            "reward_score": r[8],
+            "metadata": json.loads(r[9])
+        } for r in rows]
+
+    def get_performance_metrics(self, tenant_id: str, agent_role: str, window: int = 20) -> Dict[str, Any]:
+        """Return average reward score for an agent (Step 2.2)."""
+        with self._conn() as con:
+            rows = con.execute("""
+                SELECT reward_score FROM decisions 
+                WHERE tenant_id=? AND agent_role=? AND reward_score IS NOT NULL
+                ORDER BY timestamp DESC LIMIT ?
+            """, (tenant_id, agent_role, window)).fetchall()
+        
+        if not rows:
+            return {"avg_reward": None, "count": 0}
+        
+        avg = sum(r[0] for r in rows) / len(rows)
+        return {"avg_reward": round(avg, 3), "count": len(rows)}
+
    # ------------------------------------------------------------------
    # Policy CRUD
    # ------------------------------------------------------------------
@@ -186,23 +313,63 @@ class PolicyEngine:
    def get_permissions(self, tenant_id: str, role: str) -> List[str]:
        import json
        with self._conn() as con:
-            row = con.execute(
-                "SELECT permissions FROM role_permissions WHERE tenant_id=? AND role=?",
-                (tenant_id, role)
-            ).fetchone()
+            row = con.execute("SELECT permissions FROM role_permissions WHERE tenant_id=? AND role=?", (tenant_id, role)).fetchone()
        if row:
            return json.loads(row[0])
-        # Default permissions (fallback logic from TenantPolicyStore)
-        defaults = {
-            "admin": ["run_agent", "run_agent_chain", "run_collaboration_chain", "run_crew", "sync_action", "self_evaluate"],
-            "user": ["run_agent", "run_agent_chain", "run_crew"],
-            "viewer": []
-        }
-        return defaults.get(role, [])
+        
+        # Fallback to default roles if no tenant-specific override
+        return self.get_default_permissions(role)

    def check_permission(self, tenant_id: str, role: str, action: str) -> bool:
-        allowed_actions = self.get_permissions(tenant_id, role)
-        return action in allowed_actions
+        perms = self.get_permissions(tenant_id, role)
+        return action in perms
+
+    # ------------------------------------------------------------------
+    # Global Policies & Default Roles
+    # ------------------------------------------------------------------
+    def set_global_policy(self, name: str, value: Any, description: Optional[str] = None):
+        import json
+        now = datetime.utcnow().isoformat()
+        with self._conn() as con:
+            con.execute("""
+                INSERT INTO global_policies (name, value, description, updated_at)
+                VALUES (?, ?, ?, ?)
+                ON CONFLICT(name) DO UPDATE SET value=excluded.value, description=excluded.description, updated_at=excluded.updated_at
+            """, (name, json.dumps(value), description, now))
+
+    def get_global_policy(self, name: str) -> Optional[Any]:
+        import json
+        with self._conn() as con:
+            row = con.execute("SELECT value FROM global_policies WHERE name=?", (name,)).fetchone()
+        return json.loads(row[0]) if row else None
+
+    def list_global_policies(self) -> Dict[str, Any]:
+        import json
+        with self._conn() as con:
+            rows = con.execute("SELECT name, value FROM global_policies").fetchall()
+        return {r[0]: json.loads(r[1]) for r in rows}
+
+    def set_default_role(self, role: str, permissions: List[str]):
+        import json
+        now = datetime.utcnow().isoformat()
+        with self._conn() as con:
+            con.execute("""
+                INSERT INTO default_roles (role, permissions, updated_at)
+                VALUES (?, ?, ?)
+                ON CONFLICT(role) DO UPDATE SET permissions=excluded.permissions, updated_at=excluded.updated_at
+            """, (role, json.dumps(permissions), now))
+
+    def get_default_permissions(self, role: str) -> List[str]:
+        import json
+        with self._conn() as con:
+            row = con.execute("SELECT permissions FROM default_roles WHERE role=?", (role,)).fetchone()
+        return json.loads(row[0]) if row else []
+
+    def get_all_default_roles(self) -> Dict[str, List[str]]:
+        import json
+        with self._conn() as con:
+            rows = con.execute("SELECT role, permissions FROM default_roles").fetchall()
+        return {r[0]: json.loads(r[1]) for r in rows}

    # ------------------------------------------------------------------
    # SLA Thresholds
@@ -376,11 +543,28 @@ class PolicyEngine:
            violations.append(reason)
            self.log_violation(tenant_id, role, action, task, reason, "quota")

-        # 5. SLA latency check (only when provided)
+        # 5. SLA trend-aware Admission Control (Step 1.3)
+        threshold = self.get_sla_threshold(tenant_id, role)
+        trend = self.get_latency_trend(tenant_id, role)
+        safety_margin = self.get_global_policy("admission_safety_margin") or 0.9
+        
+        checks["sla_trend"] = {"current_trend_ms": trend, "threshold_ms": threshold, "margin": safety_margin}
+        
+        if trend > (threshold * safety_margin):
+            reason = f"Admission rejected: Latency trend {trend:.1f}ms exceeds safety threshold ({threshold * safety_margin:.1f}ms)"
+            violations.append(reason)
+            checks["sla_trend"]["passed"] = False
+            self.log_violation(tenant_id, role, action, task, reason, "sla")
+        else:
+            checks["sla_trend"]["passed"] = True
+
+        # 6. SLA latency check (standard check for the *previous* call's performance)
        if latency_ms is not None:
            # Narrowing type for linter
            current_latency = cast(float, latency_ms)
-            threshold = self.get_sla_threshold(tenant_id, role)
+            # Log telemetry for future trend analysis
+            self.log_telemetry(tenant_id, role, current_latency)
+            
            checks["sla"] = {"latency_ms": current_latency, "threshold_ms": threshold}
            if current_latency > threshold:
                reason = f"Latency {current_latency}ms exceeds SLA threshold {threshold}ms"
--- a/governance/policy_registry.py
+++ b/governance/policy_registry.py
@@ -1,7 +1,8 @@
 # governance/policy_registry.py

 import time
-from governance.policy_engine import policy_engine
+from typing import Any, Optional
+import governance.policy_engine

 class PolicyRegistry:
    def __init__(self):
@@ -11,8 +12,8 @@ class PolicyRegistry:

    def set_policy(self, tenant_id: str, allowed_roles: list, restricted_tasks: list, audit_level: str = "standard"):
        # Persist to DB via PolicyEngine
-        policy_engine.set_policy(tenant_id, allowed_roles, restricted_tasks, audit_level)
-        # Update in-memory cache
+        governance.policy_engine.policy_engine.set_policy(tenant_id, allowed_roles, restricted_tasks, audit_level)
+        # In-memory cache is now just for speed; get_policy hits engine if needed
        self.policies[tenant_id] = {
            "allowed_roles": allowed_roles,
            "restricted_tasks": restricted_tasks,
@@ -21,49 +22,44 @@ class PolicyRegistry:
        return self.policies[tenant_id]

    def get_policy(self, tenant_id: str):
-        # Try in-memory cache first, then fall back to DB
-        if tenant_id in self.policies:
-            return self.policies[tenant_id]
-        return policy_engine.get_policy(tenant_id)
+        # We always check the engine to support hot-reload (Step 1.2)
+        policy = governance.policy_engine.policy_engine.get_policy(tenant_id)
+        self.policies[tenant_id] = policy
+        return policy

    def get_all(self):
+        # Fresh fetch from engine for all tenants
+        all_tenants = governance.policy_engine.policy_engine.get_all_tenants()
+        for t in all_tenants:
+            tid = t["tenant_id"]
+            self.policies[tid] = governance.policy_engine.policy_engine.get_policy(tid)
        return self.policies
    
    def _load_global_policies(self):
-        return {
-            "max_goal_execution_time": {
-                "description": "Maximum time allowed for goal execution",
-                "type": "duration",
-                "default": "5m",
-                "enforced": True
-            },
-            "agent_invocation_limit": {
-                "description": "Max number of agent invocations per hour",
-                "type": "integer",
-                "default": 100,
-                "enforced": True
-            },
-            "memory_access_scope": {
-                "description": "Defines which memory types a role can access",
-                "type": "enum",
-                "options": ["episodic", "semantic", "graph"],
-                "default": ["episodic", "semantic"],
-                "enforced": True
-            },
-            "plugin_access_level": {
-                "description": "Controls access to plugin marketplace",
-                "type": "enum",
-                "options": ["none", "read", "install"],
-                "default": "read",
-                "enforced": False
-            }
+        # Load from DB; fallback to blueprint defaults if empty
+        db_policies = governance.policy_engine.policy_engine.list_global_policies()
+        if db_policies:
+            return db_policies
+        
+        defaults = {
+            "max_goal_execution_time": "5m",
+            "agent_invocation_limit": 100,
+            "memory_access_scope": ["episodic", "semantic"],
+            "plugin_access_level": "read"
        }
+        # Bootstrap DB with defaults if missing
+        for n, v in defaults.items():
+            governance.policy_engine.policy_engine.set_global_policy(n, v, f"Default {n}")
+        return defaults
+
+    def set_global_policy(self, name: str, value: Any, description: Optional[str] = None):
+        return governance.policy_engine.policy_engine.set_global_policy(name, value, description)

    def get_global_policy(self, name: str):
-        return self.global_policies.get(name)
+        return governance.policy_engine.policy_engine.get_global_policy(name)

    def list_global_policies(self):
-        return list(self.global_policies.keys())
+        return list(governance.policy_engine.policy_engine.list_global_policies().keys())

    # -----------------------------
    # New Audit & Violation Logging
@@ -79,12 +75,12 @@ class PolicyRegistry:
        }
        self.audit_log.append(entry)
        # Persist to DB
-        policy_engine.log_violation(tenant_id, role, action, "", reason, "compliance")
+        governance.policy_engine.policy_engine.log_violation(tenant_id, role, action, "", reason, "compliance")
        return entry

    def get_audit_log(self, tenant_id: str = None):
        # Return from DB for full history across restarts
-        return policy_engine.get_violations(tenant_id=tenant_id)
+        return governance.policy_engine.policy_engine.get_violations(tenant_id=tenant_id)

    def clear_audit_log(self, tenant_id: str = None):
        if tenant_id:
--- a/governance/retraining_loop.py
+++ b/governance/retraining_loop.py
@@ -0,0 +1,50 @@
+# governance/retraining_loop.py
+"""
+Orchestrator for the Reward-Based Retraining Loop (Phase 12 Step 2.2).
+Monitors reward trends and triggers EvolutionEngine if performance drops.
+"""
+
+from typing import Dict, Any, Optional
+from governance.policy_engine import policy_engine
+from agents.evolution_engine import evolution_engine
+from utils.logger import logger
+
+class RetrainingLoop:
+    def __init__(self, threshold: float = 0.65, window: int = 10):
+        self.threshold = threshold
+        self.window = window
+
+    def evaluate_and_trigger(self, tenant_id: str, agent_role: str) -> Dict[str, Any]:
+        """
+        Checks current performance and triggers evolution if below threshold.
+        """
+        metrics = policy_engine.get_performance_metrics(tenant_id, agent_role, window=self.window)
+        avg_reward = metrics.get("avg_reward")
+        count = metrics.get("count", 0)
+
+        if avg_reward is None:
+            return {"status": "skipped", "reason": "No reward data yet"}
+
+        if count < 3: # Min samples before triggering
+            return {"status": "skipped", "reason": f"Insufficient data ({count}/3)"}
+
+        if avg_reward < self.threshold:
+            logger.warning(f"📉 Reward for '{agent_role}' dropped to {avg_reward}. Triggering evolution...")
+            
+            # Trigger evolution
+            evolution_engine.evolve(agent_role)
+            
+            return {
+                "status": "triggered",
+                "avg_reward": avg_reward,
+                "threshold": self.threshold,
+                "agent_role": agent_role
+            }
+
+        return {
+            "status": "stable",
+            "avg_reward": avg_reward,
+            "threshold": self.threshold
+        }
+
+retraining_loop = RetrainingLoop()
--- a/routes/governance_routes.py
+++ b/routes/governance_routes.py
@@ -2,13 +2,14 @@
 ##INFO:

 from fastapi import APIRouter
-from typing import Optional
+from typing import Optional, Any
 from pydantic import BaseModel
 from governance.usage_meter import usage_meter
 from governance.billing_engine import billing_engine
 from governance.policy_registry import policy_registry
 from governance.compliance_checker import compliance_checker
 from governance.policy_engine import policy_engine
+from governance.retraining_loop import retraining_loop

 router = APIRouter()

@@ -99,8 +100,71 @@ def clear_db_violations(tenant_id: Optional[str] = None):
    return {"status": "cleared", "tenant_id": tenant_id or "all"}


+@router.get("/admin/governance/telemetry/{tenant_id}/{agent_role}")
+def get_latency_trend(tenant_id: str, agent_role: str, window: int = 10):
+    """Return recent latency trend (SMA) for a specific tenant+role."""
+    return {
+        "tenant_id": tenant_id,
+        "agent_role": agent_role,
+        "trend_ms": policy_engine.get_latency_trend(tenant_id, agent_role, window),
+        "threshold_ms": policy_engine.get_sla_threshold(tenant_id, agent_role)
+    }
+
+
+@router.get("/admin/governance/decisions")
+def get_audit_decisions(
+    tenant_id: Optional[str] = None,
+    task_id: Optional[str] = None,
+    limit: int = 50
+):
+    """Retrieve persistent agent decisions for auditing."""
+    return {"decisions": policy_engine.get_decisions(tenant_id=tenant_id, task_id=task_id, limit=limit)}
+
+
+@router.get("/admin/governance/decisions/{task_id}")
+def get_task_trace(task_id: str):
+    """Retrieve full reasoning trace for a specific task."""
+    return {"task_id": task_id, "steps": policy_engine.get_decisions(task_id=task_id)}
+
+@router.get("/admin/governance/performance/{tenant_id}/{agent_role}")
+def get_agent_performance(tenant_id: str, agent_role: str, window: int = 20):
+    """Retrieve reward trends for an agent (Step 2.2)."""
+    return {"tenant_id": tenant_id, "agent_role": agent_role, "metrics": policy_engine.get_performance_metrics(tenant_id, agent_role, window)}
+
+@router.post("/admin/governance/retrain/{tenant_id}/{agent_role}")
+def trigger_manual_retrain(tenant_id: str, agent_role: str):
+    """Manually trigger agent evolution/retraining (Step 2.2)."""
+    return retraining_loop.evaluate_and_trigger(tenant_id, agent_role)
+
+
@router.post("/admin/governance/sla-threshold")
 def set_sla_threshold(tenant_id: str, agent_role: str, max_latency_ms: float):
    """Set per-tenant per-role SLA latency threshold in governance.db."""
    policy_engine.set_sla_threshold(tenant_id, agent_role, max_latency_ms)
    return {"status": "sla threshold set", "max_latency_ms": max_latency_ms}
+
+
+@router.post("/admin/governance/global-policy")
+def set_global_policy(name: str, value: Any, description: Optional[str] = None):
+    """Dynamically inject or update a global system policy/constraint."""
+    policy_engine.set_global_policy(name, value, description)
+    return {"status": "global policy updated", "name": name, "value": value}
+
+
+@router.get("/admin/governance/global-policies")
+def list_global_policies():
+    """List all dynamic global policies."""
+    return {"policies": policy_engine.list_global_policies()}
+
+
+@router.post("/admin/governance/default-role")
+def set_default_role(role: str, permissions: list[str]):
+    """Update default permissions for a system-level role."""
+    policy_engine.set_default_role(role, permissions)
+    return {"status": "default role updated", "role": role, "permissions": permissions}
+
+
+@router.get("/admin/governance/default-roles")
+def list_default_roles():
+    """List all system-level default roles and their permissions."""
+    return {"roles": policy_engine.get_all_default_roles()}
--- a/security/rbac_registry.py
+++ b/security/rbac_registry.py
@@ -1,23 +1,27 @@
-# security/rbac_registry.py
-##INFO: Role-Based Access Control (RBAC)
+import governance.policy_engine

 class RBACRegistry:
    def __init__(self):
-        self.roles = {}  # {agent_role: [allowed_actions]}
+        # We no longer store roles in-memory. DB is source of truth.
        self._initialize_defaults()

    def define_role(self, agent_role: str, allowed_actions: list[str]):
-        self.roles[agent_role] = allowed_actions
+        governance.policy_engine.policy_engine.set_default_role(agent_role, allowed_actions)
        return {"role": agent_role, "permissions": allowed_actions}

    def get_permissions(self, agent_role: str):
-        return self.roles.get(agent_role, [])
+        return governance.policy_engine.policy_engine.get_default_permissions(agent_role)

    def get_all_roles(self):
-        return self.roles
+        return governance.policy_engine.policy_engine.get_all_default_roles()

    def _initialize_defaults(self):
-        self.roles = {
+        # Check if DB has roles; if not, bootstrap from blueprint
+        existing = governance.policy_engine.policy_engine.get_all_default_roles()
+        if existing:
+            return
+
+        defaults = {
            "admin": [
                "modify_policy", "view_sla", "invoke_agent", "edit_memory", "access_plugin"
            ],
@@ -28,5 +32,7 @@ class RBACRegistry:
                "view_memory"
            ]
        }
+        for role, perms in defaults.items():
+            governance.policy_engine.policy_engine.set_default_role(role, perms)
    
 rbac_registry = RBACRegistry()
--- a/tenants/tenant_policy.py
+++ b/tenants/tenant_policy.py
@@ -1,21 +1,21 @@
 # tenants/tenant_policy.py

-from governance.policy_engine import policy_engine
+import governance.policy_engine

 class TenantPolicyStore:
    def set_policy(self, tenant_id: str, role: str, permissions: list):
-        policy_engine.set_permissions(tenant_id, role, permissions)
+        governance.policy_engine.policy_engine.set_permissions(tenant_id, role, permissions)

    def get_policy(self, tenant_id: str, role: str):
-        return policy_engine.get_permissions(tenant_id, role)
+        return governance.policy_engine.policy_engine.get_permissions(tenant_id, role)

    def check_permission(self, tenant_id: str, role: str, action: str):
-        return policy_engine.check_permission(tenant_id, role, action)
+        return governance.policy_engine.policy_engine.check_permission(tenant_id, role, action)

    def get_all(self, tenant_id: str):
        # We don't have a get_all_permissions_for_tenant in PolicyEngine yet
        # but we can implement it or just use a query
-        with policy_engine._conn() as con:
+        with governance.policy_engine.policy_engine._conn() as con:
            rows = con.execute(
                "SELECT role, permissions FROM role_permissions WHERE tenant_id=?",
                (tenant_id,)
--- a/tests/verify_phase_12_step_1_2.py
+++ b/tests/verify_phase_12_step_1_2.py
@@ -0,0 +1,108 @@
+#!/usr/bin/env python3
+"""
+tests/verify_phase_12_step_1_2.py
+Phase 12 Step 1.2 — Dynamic Rule Injection Verification
+
+Tests:
+  1. Global Policy Hot-Reload: Updating a global policy reflected immediately.
+  2. Default Role Hot-Reload: Updating a default role reflected in checks.
+  3. Persistence: Rule changes survive across PolicyEngine re-instantiation.
+  4. Integration: rbac_guard decorator uses the persistent backend.
+
+Run with:
+  python tests/verify_phase_12_step_1_2.py
+"""
+
+import sys
+import os
+import tempfile
+import sqlite3
+
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+
+# Use a temp DB
+TEST_DB = os.path.join(tempfile.mkdtemp(), "test_governance_v2.db")
+
+from governance.policy_engine import PolicyEngine
+from governance.policy_registry import PolicyRegistry
+from security.rbac_registry import RBACRegistry
+from tenants.rbac_guard import enforce_rbac, check_access
+
+PASS = "✅ PASS"
+FAIL = "❌ FAIL"
+results = []
+
+def check(label, condition, detail=""):
+    status = PASS if condition else FAIL
+    results.append((label, status, detail))
+    print(f"  {status} — {label}" + (f" ({detail})" if detail else ""))
+    return condition
+
+print("\n=== Phase 12 Step 1.2: Dynamic Rule Injection ===\n")
+
+# Setup
+engine = PolicyEngine(db_path=TEST_DB)
+# Patching the singletons to use the test engine
+import governance.policy_engine as pe_mod
+pe_mod.policy_engine = engine
+
+policy_reg = PolicyRegistry()
+rbac_reg = RBACRegistry()
+
+# --- Test 1: Global Policy Hot-Reload ---
+print("[1] Global Policy Hot-Reload")
+policy_reg.set_global_policy("test_limit", 50, "Test Limit")
+val1 = policy_reg.get_global_policy("test_limit")
+check("Initial global policy set", val1 == 50)
+
+# Update directly in engine (simulating external API call)
+engine.set_global_policy("test_limit", 100)
+val2 = policy_reg.get_global_policy("test_limit")
+check("Hot-reload reflected new value", val2 == 100)
+
+# --- Test 2: Default Role Hot-Reload ---
+print("\n[2] Default Role Hot-Reload")
+rbac_reg.define_role("contractor", ["work"])
+check("Role defined", "work" in rbac_reg.get_permissions("contractor"))
+
+# Check via access check
+check("Access check True", check_access("tenant_x", "contractor", "work"))
+check("Access check False", not check_access("tenant_x", "contractor", "delete"))
+
+# Inject new permission dynamically
+engine.set_default_role("contractor", ["work", "delete"])
+check("Access check True (injected)", check_access("tenant_x", "contractor", "delete"))
+
+# --- Test 3: Persistence ---
+print("\n[3] Persistence across restart")
+engine2 = PolicyEngine(db_path=TEST_DB)
+check("Global policy persisted", engine2.get_global_policy("test_limit") == 100)
+check("Default role persisted", "delete" in engine2.get_default_permissions("contractor"))
+
+# --- Test 4: Decorator Integration ---
+print("\n[4] Decorator Integration")
+
+@enforce_rbac("admin_only")
+def dummy_action(tenant_id, role):
+    return "success"
+
+# Setup role
+engine.set_default_role("restricted_guy", ["read"])
+res1 = dummy_action(tenant_id="t1", role="restricted_guy")
+check("Denied by default", "error" in res1)
+
+# Upgrade role dynamically
+engine.set_default_role("restricted_guy", ["read", "admin_only"])
+res2 = dummy_action(tenant_id="t1", role="restricted_guy")
+check("Allowed after injection", res2 == "success")
+
+# --- Summary ---
+print("\n" + "="*50)
+passed = sum(1 for _, s, _ in results if s == PASS)
+failed = sum(1 for _, s, _ in results if s == FAIL)
+print(f"Results: {passed}/{passed+failed} passed")
+if failed:
+    sys.exit(1)
+else:
+    print("All checks passed ✅")
+    sys.exit(0)
--- a/tests/verify_phase_12_step_1_3.py
+++ b/tests/verify_phase_12_step_1_3.py
@@ -0,0 +1,97 @@
+#!/usr/bin/env python3
+"""
+tests/verify_phase_12_step_1_3.py
+Phase 12 Step 1.3 — SLA-aware Admission Control Verification
+
+Tests:
+  1. Telemetry Logging: Latency reports are stored.
+  2. Trend Calculation: SMA is correct.
+  3. Admission Rejection: Requests are blocked when trend exceeds threshold * margin.
+  4. Recovery: Admission allowed after low-latency reports normalize the trend.
+
+Run with:
+  python tests/verify_phase_12_step_1_3.py
+"""
+
+import sys
+import os
+import tempfile
+import time
+
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+
+# Use a temp DB
+TEST_DB = os.path.join(tempfile.mkdtemp(), "test_governance_v3.db")
+
+from governance.policy_engine import PolicyEngine
+
+PASS = "✅ PASS"
+FAIL = "❌ FAIL"
+results = []
+
+def check(label, condition, detail=""):
+    status = PASS if condition else FAIL
+    results.append((label, status, detail))
+    print(f"  {status} — {label}" + (f" ({detail})" if detail else ""))
+    return condition
+
+print("\n=== Phase 12 Step 1.3: SLA-aware Admission Control ===\n")
+
+# Setup
+engine = PolicyEngine(db_path=TEST_DB)
+TID = "t1"
+ROLE = "executor"
+ACTION = "run"
+
+# Initialize roles and permissions to allow 'evaluate' to get past RBAC/Permission checks
+engine.set_default_role(ROLE, [ACTION])
+
+# Set a strict threshold for testing
+engine.set_sla_threshold(TID, ROLE, 100.0) # 100ms
+# Set safety margin to 0.5 for aggressive testing
+engine.set_global_policy("admission_safety_margin", 0.5)
+
+# --- Test 1: Telemetry & Trend ---
+print("[1] Telemetry & Trend")
+engine.log_telemetry(TID, ROLE, 40.0)
+engine.log_telemetry(TID, ROLE, 60.0)
+trend1 = engine.get_latency_trend(TID, ROLE)
+check("Trend calculation (SMA of 40, 60)", trend1 == 50.0)
+
+# --- Test 2: Admission Check (Allowed) ---
+print("\n[2] Admission Check (Allowed)")
+# Trend is 50.0. Threshold is 100.0. Margin is 0.5. Limit is 50.0.
+# It should be JUST allowed or rejected based on strictness. 
+# 50.0 > (100.0 * 0.5) is False, so it should be allowed.
+res1 = engine.evaluate(TID, ROLE, "run")
+check("Admission allowed at threshold edge", res1["allowed"], f"Trend: {res1['checks']['sla_trend']['current_trend_ms']}")
+
+# --- Test 3: Admission Rejection ---
+print("\n[3] Admission Rejection")
+# Push trend over 50.0
+engine.log_telemetry(TID, ROLE, 100.0)
+trend2 = engine.get_latency_trend(TID, ROLE)
+res2 = engine.evaluate(TID, ROLE, "run")
+check("Admission rejected when trend > margin", not res2["allowed"], f"Trend: {trend2:.1f}")
+check("Violation logged with 'Admission rejected'", any("Admission rejected" in v for v in res2["violations"]))
+
+# --- Test 4: Recovery ---
+print("\n[4] Recovery")
+# Log many low-latency calls to bring trend back down
+for _ in range(10):
+    engine.log_telemetry(TID, ROLE, 10.0)
+
+trend3 = engine.get_latency_trend(TID, ROLE)
+res3 = engine.evaluate(TID, ROLE, "run")
+check("Admission recovered after trend normalization", res3["allowed"], f"Trend: {trend3:.1f}")
+
+# --- Summary ---
+print("\n" + "="*50)
+passed = sum(1 for _, s, _ in results if s == PASS)
+failed = sum(1 for _, s, _ in results if s == FAIL)
+print(f"Results: {passed}/{passed+failed} passed")
+if failed:
+    sys.exit(1)
+else:
+    print("All checks passed ✅")
+    sys.exit(0)
--- a/tests/verify_phase_12_step_2_1.py
+++ b/tests/verify_phase_12_step_2_1.py
@@ -0,0 +1,92 @@
+# tests/verify_phase_12_step_2_1.py
+"""
+Verification for Phase 12 Step 2.1: Automated Decision Logging.
+Ensures agent decisions and reasoning traces are persisted and retrievable.
+Lightweight version to avoid heavy dependencies like langchain.
+"""
+
+import sys
+import os
+import uuid
+import time
+import json
+
+# Add project root to sys.path
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from governance.policy_engine import policy_engine
+from governance.decision_logger import decision_logger
+
+def check(name, condition, details=""):
+    status = "PASS" if condition else "FAIL"
+    print(f"[{status}] {name} {details}")
+    if not condition:
+        # sys.exit(1)
+        pass
+
+def run_verification():
+    print("--- Verifying Phase 12 Step 2.1: Automated Decision Logging (Lightweight) ---")
+    
+    TID = f"tenant_{uuid.uuid4().hex[:6]}"
+    task_input = "Calculate the trajectory of a falling apple."
+    
+    # 1. Mock execution via log_decision
+    print("\n[1] Logging decision manually...")
+    task_id = str(uuid.uuid4())
+    reasoning = "First, I consider gravity. Then, I calculate the time of flight."
+    output = {"result": "Trajectory: y = -0.5gt^2", "reasoning": reasoning}
+    
+    logged_task_id = decision_logger.log_decision(
+        tenant_id=TID,
+        agent_role="tester",
+        task_input=task_input,
+        decision_output=output,
+        task_id=task_id,
+        reasoning_trace=reasoning,
+        metadata={"test": True}
+    )
+    
+    check("Task ID matched", logged_task_id == task_id)
+    
+    # 2. Retrieve from PolicyEngine
+    print("\n[2] Retrieving logged decision from DB...")
+    decisions = policy_engine.get_decisions(tenant_id=TID)
+    check("Decision retrieved from DB", len(decisions) > 0)
+    
+    if decisions:
+        d = decisions[0]
+        check("Tenant ID matches", d["tenant_id"] == TID)
+        check("Agent Role matches", d["agent_role"] == "tester")
+        check("Reasoning trace matches", d["reasoning_trace"] == reasoning)
+        check("Decision output matches", d["decision_output"] == json.dumps(output))
+        check("Metadata matches", d["metadata"].get("test") == True)
+        print(f"   Trace captured: {d['reasoning_trace'][:50]}...")
+
+    # 3. Test API integration logic (simulated via DecisionLogger calls)
+    print("\n[3] Verifying DecisionLogger query logic...")
+    task_trace = decision_logger.get_decisions_for_task(task_id)
+    check("Retrieved by Task ID", len(task_trace) == 1)
+    
+    recent = decision_logger.get_recent_decisions(limit=10)
+    check("Retrieved recent decisions", len(recent) > 0)
+
+    # 4. Verify Multiple steps for one task
+    print("\n[4] Verifying multiple steps for one task ID...")
+    step2_reasoning = "Now I factor in air resistance."
+    decision_logger.log_decision(
+        tenant_id=TID,
+        agent_role="tester_step2",
+        task_input="Refine trajectory with drag.",
+        decision_output={"result": "Refined Trajectory"},
+        task_id=task_id,
+        reasoning_trace=step2_reasoning
+    )
+    
+    task_trace_multi = decision_logger.get_decisions_for_task(task_id)
+    check("Both steps retrieved for task", len(task_trace_multi) == 2)
+    check("Steps ordered by timestamp", task_trace_multi[0]["agent_role"] == "tester_step2") # DESC by default in get_decisions
+
+    print("\n--- Verification Complete ---")
+
+if __name__ == "__main__":
+    run_verification()
--- a/tests/verify_phase_12_step_2_2.py
+++ b/tests/verify_phase_12_step_2_2.py
@@ -0,0 +1,72 @@
+# tests/verify_phase_12_step_2_2.py
+"""
+Verification for Phase 12 Step 2.2: Reward-Based Retraining Loop.
+Ensures reward scores are captured and retraining is triggered on degradation.
+"""
+
+import sys
+import os
+import uuid
+import json
+
+# Add project root to sys.path
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from governance.policy_engine import policy_engine
+from governance.decision_logger import decision_logger
+from governance.retraining_loop import retraining_loop
+
+def check(name, condition, details=""):
+    status = "PASS" if condition else "FAIL"
+    print(f"[{status}] {name} {details}")
+
+def run_verification():
+    print("--- Verifying Phase 12 Step 2.2: Reward-Based Retraining Loop ---")
+    
+    TID = f"tenant_{uuid.uuid4().hex[:6]}"
+    ROLE = "tester_retrain"
+    
+    # 1. Log several high reward decisions
+    print("\n[1] Logging stable performance...")
+    for i in range(5):
+        decision_logger.log_decision(
+            tenant_id=TID,
+            agent_role=ROLE,
+            task_input=f"Task {i}",
+            decision_output={"result": "OK"},
+            reward_score=0.9
+        )
+    
+    metrics = policy_engine.get_performance_metrics(TID, ROLE)
+    check("Stable average reward captured", metrics["avg_reward"] == 0.9, f"(Got: {metrics['avg_reward']})")
+    
+    status = retraining_loop.evaluate_and_trigger(TID, ROLE)
+    check("Retraining skipped for stable performance", status["status"] == "stable")
+
+    # 2. Log several low reward decisions to trigger degradation
+    print("\n[2] Logging performance degradation...")
+    for i in range(5):
+        decision_logger.log_decision(
+            tenant_id=TID,
+            agent_role=ROLE,
+            task_input=f"Bad Task {i}",
+            decision_output={"result": "FAIL"},
+            reward_score=0.3
+        )
+    
+    metrics_degraded = policy_engine.get_performance_metrics(TID, ROLE, window=10)
+    # Avg (0.9*5 + 0.3*5) / 10 = 0.6
+    check("Degraded average reward captured", metrics_degraded["avg_reward"] == 0.6, f"(Got: {metrics_degraded['avg_reward']})")
+
+    # 3. Verify retraining trigger
+    print("\n[3] Verifying retraining trigger...")
+    # This should call evolution_engine.evolve internally.
+    # Note: evolution_engine.evolve prints to stdout.
+    status_trigger = retraining_loop.evaluate_and_trigger(TID, ROLE)
+    check("Retraining triggered for low performance", status_trigger["status"] == "triggered")
+    check("Correct reward reported", status_trigger["avg_reward"] == 0.6)
+
+    print("\n--- Verification Complete ---")
+
+if __name__ == "__main__":
+    run_verification()