Self-Evolution
Это содержимое пока не доступно на вашем языке.
The self-evolution system is 9,806 lines of Rust across 22 modules. It enables PRX to improve its own behavior over time — not by retraining models, but by evolving the prompts, memory structures, and operational strategies that surround them.
Pipeline
Section titled “Pipeline”Self-evolution operates on a three-phase cycle:
Record (realtime) │ Every request/response pair is logged with metadata │ ▼Analyze (daily) │ Patterns extracted: failure modes, slow paths, cost outliers │ ▼Evolve (every 3 days) Generate candidate improvements, test in shadow mode, promote or rollbackRecord Phase (Realtime)
Section titled “Record Phase (Realtime)”Every interaction is recorded with:
- Request content and classified intent
- Selected model and routing score breakdown
- Response content and latency
- Outcome signals: user acceptance, retries, follow-up corrections, tool call success/failure
- Token counts and estimated cost
This data accumulates in a local store, building the evidence base for analysis.
Analyze Phase (Daily)
Section titled “Analyze Phase (Daily)”A daily analysis job processes recent records to identify:
| Pattern | Detection Method |
|---|---|
| Repeated failures on a task type | Cluster failed requests by intent + model |
| Prompt weaknesses | Identify where system prompts lead to off-target responses |
| Cost inefficiencies | Flag requests where a cheaper model would have sufficed |
| Latency outliers | Detect models or tools with degrading performance |
| Memory gaps | Find topics where the agent lacks stored knowledge |
Analysis results are stored as structured findings, each tagged with severity and actionable recommendations.
Evolve Phase (Every 3 Days)
Section titled “Evolve Phase (Every 3 Days)”Based on accumulated analysis findings, the evolution engine generates candidate changes:
Three Evolution Layers
Section titled “Three Evolution Layers”Prompt Evolution
Section titled “Prompt Evolution”Modifies system prompts to address identified weaknesses:
- Rewrites instructions that led to misinterpretations
- Adds or refines few-shot examples based on real failure cases
- Adjusts tone, verbosity, or formatting directives
- Tunes tool-use instructions when tool calls are failing
Memory Evolution
Section titled “Memory Evolution”Improves the knowledge stored in prx-memory:
- Extracts reusable patterns from successful interactions
- Consolidates fragmented knowledge entries
- Deprecates outdated or contradictory memories
- Indexes new domain knowledge discovered during operation
Strategy Evolution
Section titled “Strategy Evolution”Adjusts operational parameters:
- Router scoring weights (alpha, beta, gamma, delta, epsilon)
- Automix confidence thresholds
- Concurrency limits per channel
- Timeout budgets
- Default model preferences per intent category
Safety System
Section titled “Safety System”Self-evolution is powerful but dangerous. PRX enforces multiple safety layers to prevent regressions.
Gate Checks
Section titled “Gate Checks”Before any candidate change is applied, it must pass gate checks:
- Syntax validation — the change must produce valid configuration/prompts
- Regression test — replay a sample of recent successful interactions with the proposed change and verify outputs remain acceptable
- Scope check — the change must not exceed its permitted scope (e.g., a prompt evolution cannot modify security policy)
Shadow Mode
Section titled “Shadow Mode”Candidate changes are first deployed in shadow mode:
- The current (production) configuration handles all real requests
- The candidate configuration runs in parallel on the same inputs
- Outputs are compared but the candidate’s responses are never sent to users
- Metrics are collected: quality delta, latency delta, cost delta
Shadow mode runs for a configurable evaluation period (default: 24 hours).
Judge Model
Section titled “Judge Model”A separate judge model evaluates shadow mode results. It compares production vs. candidate outputs across a sample of requests and scores the candidate on a 0-1 scale.
- Threshold: 0.6 — candidates scoring below 0.6 are rejected
- The judge model is typically a strong reasoning model (e.g., Claude Opus, o3) different from the models being evaluated
- Judge prompts are themselves versioned and not subject to self-evolution (to prevent gaming)
Circuit Breaker
Section titled “Circuit Breaker”If a promoted change causes a spike in failures after deployment:
- The circuit breaker triggers when the failure rate exceeds 2x the baseline within a 1-hour window
- The change is automatically rolled back
- An incident record is created with the failure evidence
- The change is marked as failed and will not be retried without modification
Version Snapshots
Section titled “Version Snapshots”Every configuration state is versioned:
v1.0 ── initial configv1.1 ── prompt evolution: improved coding instructionsv1.2 ── strategy evolution: adjusted router weightsv1.3 ── ROLLED BACK (circuit breaker triggered)v1.2 ── restored (current)Snapshots include the full prompt set, memory state hash, router weights, and all configuration parameters. Any version can be restored instantly.
Experiment Tracking
Section titled “Experiment Tracking”All evolution attempts are tracked with:
| Field | Description |
|---|---|
| Candidate ID | Unique identifier for the proposed change |
| Layer | prompt / memory / strategy |
| Trigger | Analysis finding that motivated the change |
| Shadow metrics | Quality, latency, cost deltas observed |
| Judge score | Score from the judge model |
| Outcome | promoted / rejected / rolled_back |
| Duration | Time the change was active before rollback (if applicable) |
This provides a complete audit trail of how and why the system changed over time.
RollbackManager
Section titled “RollbackManager”The RollbackManager handles automatic and manual rollbacks:
- Automatic — triggered by the circuit breaker when failure rates spike
- Manual — operators can roll back to any previous version via CLI or API
- Selective — roll back only one layer (e.g., revert prompt changes but keep strategy changes)
# List evolution historyprx evolution history
# Roll back to a specific versionprx evolution rollback v1.2
# Roll back only prompt changesprx evolution rollback --layer prompt v1.1Rollbacks are instantaneous because all version snapshots are retained locally.