Design Document: Centralized Agentic Skill Registry
Subject: Transitioning from File-Based Markdown Skills to a Database-Backed Skill Registry
1. Executive Summary
Currently, most AI agent frameworks rely on localized Markdown (.md) files to define agent "skills." While Markdown is highly LLM-native and human-readable, it creates significant bottlenecks at an enterprise scale regarding strict typing, API integration, and context window limits.
This document proposes transitioning to an Agentic Control Plane (Centralized Skill Registry) backed by a database. By decoupling skill metadata, schemas, and instructions, and by utilizing dynamic routing, we will achieve hierarchical structuring, strict schema enforcement, and progressive disclosure of tools to agents.
2. Problem Statement
Managing agent skills as flat Markdown files introduces several scaling challenges:
- Lack of Strict Typing: Markdown cannot enforce data types (e.g., ensuring a parameter is an integer vs. string), leading to hallucinated or malformed tool inputs.
- Context Window Exhaustion: Loading dozens or hundreds of skill definitions at startup overwhelms the LLM context window, increasing latency, token costs, and tool-misuse.
- Static Deployments: Updating a skill or changing access permissions requires a full application redeploy.
- Poor Discoverability: Flat file structures offer no native mechanism for progressive disclosure or tool search.
3. Data Models & Formats
To solve the limitations of purely text-based skills, we will adopt a hybrid, structured format stored within a database (e.g., PostgreSQL/MongoDB). The architecture uses the right format for the right job:
- JSON Schema: Used strictly for defining parameters, inputs, and tool shapes. Natively supported by OpenAI/Anthropic/Google tool-calling APIs.
- LightAPI Description (YAML/JSON): Used to map endpoint-level API capabilities to skills across REST, JSON-RPC, gRPC, and MCP.
- OpenAPI / OpenRPC / Protobuf: Referenced by LightAPI where protocol-native specifications already exist.
- Executable Code (Python/JS) / URI: Stores the actual execution logic or the endpoint reference.
- Markdown: Retained only for the
instructionsorpromptfields, as LLMs excel at parsing markdown headers and lists for constraints and persona instructions.
LightAPI is the preferred source format for API-backed skills because it describes endpoint identity, protocol invocation, input schema, request mapping, result shape, examples, and behavior notes in one agent-oriented document. See LightAPI Description Design for the endpoint description model.
YAML and JSON are the external skill document formats. In the portal database,
they should not replace the Markdown instruction field. The normalized model is
structured columns and relationships for identity, versioning, taxonomy, tools,
and execution metadata, plus content_markdown for the LLM-facing instruction
body. If the portal later needs to persist a full structured skill document,
add a nullable JSONB skill-spec column beside content_markdown and normalize
YAML imports to JSON.
3.1 Proposed Database Schema Structure
Light Portal stores skills in structured catalog tables. Below is a representation of the skill payload:
{
"skill_id": "sk_finance_001",
"name": "generate_financial_report",
"version": "1.2.0",
"tags": ["finance", "reporting"],
"tool_schema": {
"type": "function",
"function": {
"name": "generate_financial_report",
"description": "Generates a Q3 report based on ticker symbol.",
"parameters": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "The stock ticker"}
},
"required": ["ticker"]
},
"response_schema": {
"type": "object",
"properties": {
"report_url": {"type": "string"},
"status": {"type": "string"}
}
}
}
},
"execution": {
"type": "rest_api",
"endpoint_id": "ep_finance_report_001",
"endpoint": "https://internal-api.company.com/v1/finance/report",
"method": "POST"
},
"instructions": "## Role\nYou are a financial analyst.\n## Constraints\n- Never hallucinate financial data.\n- Always return exact numbers."
}
4. Hierarchical Structure & Progressive Disclosure
Dumping 500 JSON schemas into an LLM's context window will cause system failure. The Centralized Controller will act as a mediator, enforcing hierarchy and progressive disclosure (giving the agent only the schemas it needs, exactly when it needs them).
4.1 Implementing Hierarchy & Tagging
Because JSON Schema does not have built-in folders, hierarchy and categorization are enforced via the platform's global entity management system:
- Namespacing: Tool names follow a strict convention:
[domain]_[subdomain]_[action](e.g.,aws_rds_provision). - Tags & Categories: Instead of hardcoded columns, the registry utilizes the
entity_tag_tandentity_category_ttables (withentity_type = 'skill'). This allows for unlimited flat tagging and deep hierarchical folder structures that are consistent across the entire portal. - Discovery API: Portal-query filters by these tags/categories to scoped skill sets for specific agent personas. Agents cache the effective catalog locally and reload it when runtime cache-management invalidation is triggered.
4.2 Progressive Disclosure Patterns
Agents should not load every executable tool into the LLM context. Instead, they should load their assigned skill/tool catalog from the portal API, cache it locally, and use one of the following progressive disclosure patterns:
Phase 5 starts with the Rust light-agent. The agent loads
genai-query/getEffectiveAgentCatalog, keeps a local cache keyed by
hostId + agentDefId + serviceId + envTag, ranks cached skill/tool entries with
keyword and routing-field matching, and intersects the selected tool names with
the live gateway tools/list result before giving schemas to the model.
Execution remains gateway tools/call.
Pattern A: Meta-Tools (Dynamic Injection)
The agent is booted with only two "meta-tools" designed for discovery.
- Local catalog search: Agent searches its cached assigned skills. The cache contains lightweight summaries and mapped tool names.
- Schema loading: Once the agent identifies the correct tool, it loads the schema from the local catalog cache or refreshes the cache from portal-query.
Pattern B: Semantic Tool RAG (Zero-Shot Discovery)
For highly complex systems with thousands of skills:
- Tool descriptions are embedded into a Vector Database (e.g.,
pgvector). - When the user prompts the system (e.g., "Reset my AWS password"), portal-query or the agent's local cache performs semantic search and retrieves the Top-3 most relevant JSON Schemas.
- The agent boots with only those 3 tools in its context.
Pattern C: Multi-Agent Orchestration (Supervisor / Worker)
Hierarchy is mapped to agent teams.
- A Supervisor Agent holds routing tools (e.g.,
delegate_to_finance,delegate_to_devops). - When
delegate_to_devopsis triggered, the supervisor routes to a DevOps Worker Agent, loading only the specific DevOps JSON schemas into its context.
5. Example Flow: Dynamic Loading in Action
User: "I need to provision a new database for the marketing team."
- Turn 1: Discovery
- Agent Context: Has a local cache of assigned skill summaries.
- Agent Action: Searches the local cache for
provision database.
- Turn 2: High-Level Awareness
- Local Cache Result: Returns token-efficient summaries from the portal catalog:
[{"name": "aws_rds_provision", "description": "Creates AWS RDS DB"}, {"name": "mongo_atlas_create", "description": "Creates Mongo cluster"}] - Agent Action: Decides AWS is needed and loads the cached schema for
aws_rds_provision.
- Local Cache Result: Returns token-efficient summaries from the portal catalog:
- Turn 3: Strict Execution
- Agent Catalog: Provides the full JSON schema (requiring
instance_type,storage_gb). - Agent Action: Understands parameters and safely executes
aws_rds_provisionthrough the gatewaytools/callpath.
- Agent Catalog: Provides the full JSON schema (requiring
6. Operational Benefits & Security
By centralizing skills in a database, the platform gains enterprise-grade operational capabilities:
- Dynamic Updates: API endpoints, instructions, and schemas can be updated in the database without restarting agents.
- Permission-Aware Discovery (RBAC): By linking skills to LightAPI endpoint descriptions and
api_endpoint_t, portal-query can limit catalog disclosure to the current agent or tenant, while runtime gateway policy still authorizes execution. - A/B Testing: Portal catalog metadata can route 50% of an agent's requests to
skill_v1and 50% toskill_v2to measure prompt/tool efficacy. - Audit Logging: Catalog disclosure and gateway execution can be logged separately, preserving a compliance trail without moving tool execution into the registry.
- Distilled Memory RAG: Following the "Hindsight" pattern, raw conversation history (
agent_session_history_t) is separated from RAG-optimized memory (session_memory_t). This prevents the "noisy context" problem while maintaining a perfect audit trail.
7. LightAPI As Skill Source
API-backed skills should be generated from endpoint-level LightAPI descriptions whenever possible.
The skill registry should store skill metadata, access control, grouping, and agent-facing instructions. The LightAPI description should remain the source of truth for endpoint invocation and verification details.
Recommended flow:
- Light-Portal creates or imports endpoint-level LightAPI descriptions.
- API owners enrich endpoint descriptions with examples, behavior notes, result cases, and visibility.
- Approved endpoint descriptions are published as agent skills.
- The agent loads assigned skill summaries from portal-query and caches them locally.
- When the agent selects a skill, it loads the relevant LightAPI disclosure level from the local cache or refreshes from portal-query.
- Execution goes through the gateway
tools/callpath, preserving runtime policy and downstream authorization.
This avoids manually duplicating every API endpoint as a separate hand-written skill while still giving agents strict schemas and progressive disclosure.
8. Workflow-Backed Skills
Some skills need more than instructions and a curated tool set. A skill that
must orchestrate several tools, wait for human approval, retry failed steps,
run assertions, or preserve a durable audit trail should be backed by
light-workflow.
The boundary should stay clear:
| Layer | Responsibility |
|---|---|
| Skill | Discovery metadata, taxonomy, instructions, allowed tools, and agent guidance. |
| Workflow | Ordered execution, branching, retries, assertions, human tasks, durable state, and audit events. |
| Gateway | Runtime tool execution through tools/list and tools/call. |
Workflow backing should be optional. Simple skills can stay as instructions plus
tool mappings. Durable or regulated processes should link to workflow
definitions and let light-workflow own execution.
Recommended storage:
- Keep
wf_definition_t.definitionas the canonical workflow YAML. - Keep
skill_t.content_markdownas the LLM-facing skill instruction body. - Add
skill_workflow_tto link skills to workflow definitions with a role such asprimary,validation,remediation, ortest. - Treat
skill_tool_tas the allowed tool set for a workflow-backed skill. Validation should flag workflow tool-call steps that are not linked to the skill.
The Portal Skill Workspace should embed a generic Workflow Editor instead of creating a skill-specific workflow runtime. The editor provides YAML editing, step preview, reference lookup, validation, and test runs. Skill authoring provides the surrounding context: skill metadata, taxonomy, allowed tools, effective prompt preview, and workflow link configuration.
9. Next Steps
- Complete phase 3 by adding category and tag assignment to existing skill create/update forms, backed by
entity_category_tandentity_tag_twithentity_type = 'skill'. - Save skill taxonomy through a composite skill command so the skill row and selected taxonomy associations are emitted from the same user action.
- Move the richer authoring workspace, effective prompt preview,
skill_tool_t.configformalization, workflow-backed skills, and "create skill from LightAPI/tool" flows to phase 3.5. - Build the generic Workflow Editor for YAML editing, parsed step preview, catalog references, validation, and workflow test runs.
- Complete phase 4 agent assignment by improving the
agent_skill_tUI, adding an Agent Definition assignment context, and adding a batch assignment composite command that emits oneAgentSkillCreatedEventper selected skill. - Enforce phase 4 assignment validation in command handlers and UI preflight: assigned skills must be active and must have at least one active direct
skill_tool_tlink. Workflow-backed skills still rely onskill_tool_tas the allowed tool set. - Keep live gateway
tools/listruntime executability checks as a diagnostics or governance concern, not as phase 4 persistence validation. - Complete phase 5 for the Rust agent with the
genai-querygetEffectiveAgentCatalogendpoint, claim checks againsthost,sid, andenv, local catalog caching, keyword/routing search, gatewaytools/listintersection, and controller-driven cache invalidation. - Complete phase 6 governance for the Rust agent only: normalize sensitivity
tiers to
public,internal,confidential, andrestricted; filter blocked tools before catalog disclosure; compare the effective catalog with gatewaytools/listthrough/diagnostics/tools; and keep execution through gatewaytools/call. - Enforce destructive, approval-required, and sensitivity metadata at the
gateway with debug/auditInfo fields when a call is blocked. Do not use
workflow
audit_log_tfor catalog disclosure; use auditInfo/file logging until a generic governance audit table is introduced. - Keep current active row plus aggregate version as the approval/version boundary until workflow-owned approval state is implemented.
- Add publishing from LightAPI endpoint descriptions into the skill registry.
- Migrate existing file-based skills into structured catalog payloads, keeping instructions in Markdown and converting parameters to JSON Schema.
- Implement Pattern B (Semantic Tool RAG) after indexed catalog fields and embeddings are ready for production search.