Light-Fabric
Light-Fabric is a high-performance, unified platform for managing the lifecycle, governance, and orchestration of enterprise AI services including agentic services, agents, tools, skills, memories, MCP servers, APIs, gateways and workflows.
Why Light-Fabric?
We chose the name Light-Fabric because it embodies the "Unified Governance" required for enterprise-grade AI:
- Unified Control Plane: Light-Fabric provides a single point of truth for discovering, governing, and auditing agents, MCP servers, and APIs via the
light-portal. - Enterprise Governance: It prioritizes security and policy enforcement (such as fine-grained authorization) over pure decentralized autonomy, making it safe for corporate environments.
- Integrated Ecosystem: It "weaves" together distributed components—from memory units (Hindsight) to centralized skills—into a cohesive, observable system.
- Durable Identity: The name emphasizes the platform's role as the infrastructure foundation, remaining relevant regardless of the underlying implementation details.
Technical Advantages
By building Light-Fabric on a Rust foundation, we achieve:
- Performance: Built on top of
tokioandaxumfor maximum throughput and memory safety. - Native Intelligence: Specialized crates for Hindsight memory, tool calling, and workflow orchestration.
- Production Ready: Includes robust features like retries, failover, and observability out of the box.
Core Components
The Light-Fabric is composed of modular crates, infrastructure frameworks, and reference applications:
Crates
crates/model-provider: A unified interface for multiple LLM providers (Ollama, etc.).crates/hindsight-client: Client for the Hindsight biomimetic memory system.crates/mcp-client: Implementation of the Model Context Protocol (MCP) for tool discovery and execution.crates/portal-registry: Integration with the Light-Portal for service registration and discovery.crates/light-runtime: Core runtime foundation for building agentic and microservice components.crates/light-rule: High-performance rule engine for fine-grained authorization and data filtering.crates/workflow-core&workflow-builder: Core engine and builder for complex agentic workflows.crates/config-loader: Flexible configuration management for enterprise environments.crates/asymmetric-decryptor&symmetric-decryptor: Security utilities for sensitive data handling.
Frameworks
frameworks/light-axum: A specialized microservice & agentic framework built on top of the Axum web ecosystem.frameworks/light-pingora: High-performance proxy and gateway framework built on top of Cloudflare's Pingora.
Applications
apps/light-agent: A managed AI agent capable of using tools, accessing memory, and executing complex tasks.apps/light-gateway: An enterprise-grade gateway for securing and governing API and agent traffic.apps/light-workflow: A service for orchestrating and executing long-running agentic workflows.
Getting Started with Light-Fabric
This guide will help you set up a local development environment for Light-Fabric, including the AI Gateway, Agent Engine, and the management Portal.
Prerequisites
- Rust: Latest stable version.
- Docker: For running database and backend services.
- Node.js: For running the
portal-viewUI. - Git: To clone the necessary repositories.
Local Development Setup
To run the entire ecosystem locally, we use the portal-config-loc and service-asset repositories to manage configuration and pre-built assets.
1. Initialize Workspace
Create a unified workspace directory (e.g., ~/lightapi) and clone the core management repositories:
cd ~
mkdir -p lightapi
cd lightapi
# Clone configuration and assets
git clone [email protected]:lightapi/portal-config-loc.git
git clone [email protected]:lightapi/service-asset.git
2. Deploy Local Services
Light-Fabric services are orchestrated via Docker Compose scripts in portal-config-loc. The following command starts the PostgreSQL database and the core services (including the Rust-based components):
cd ~/lightapi/portal-config-loc
./scripts/deploy-local.sh pg rust
3. Import Initial Data
Use the importer script in service-asset to populate the local database with initial events, users, and configurations:
cd ~/lightapi/service-asset
./importer.sh -f events.json
4. Update /etc/hosts
The platform uses virtual hosts for local routing. Add the following entry to your /etc/hosts file (replace with your actual local IP if necessary):
127.0.0.1 local.lightapi.net locsignin.lightapi.net
Running the Management Portal
The Light-Portal provides a unified UI for onboarding MCP servers, configuring AI Gateways, and interacting with agents.
cd ~/lightapi
git clone [email protected]:lightapi/portal-view.git
cd portal-view
npm install
npm run dev
Navigate to https://localhost:3000 and log in with your developer credentials.
Cloud Development (Coming Soon)
We are currently preparing a Cloud Development Server. This will allow developers to:
- Connect to a shared, high-performance AI Gateway.
- Onboard and test MCP servers without a full local installation.
- Collaborate on shared agentic workflows and Hindsight memory banks.
Stay tuned for the connection details and onboarding guide for the cloud environment.
Contributing to Light-Fabric
If you are developing for the Rust crates specifically:
cd ~/lightapi
git clone [email protected]:networknt/light-fabric.git
cd light-fabric
cargo build
Model Providers
Light-Fabric provides a unified, high-performance interface for interacting with diverse Large Language Model (LLM) providers. This abstraction is centered around the Provider trait, allowing applications to remain model-agnostic while leveraging advanced capabilities like native tool calling and prompt caching.
The Provider Trait
All model integrations implement the Provider trait, which supports:
- One-shot and Multi-turn Chat: Simplified APIs for simple prompts and full conversation histories.
- Structured Tool Calling: Native integration for function calling (OpenAI-style).
- Capabilities Detection: Programmatic checks for vision, native tool support, and prompt caching.
Supported Cloud Providers
Light-Fabric supports all major LLM providers. Because the Provider trait is model-agnostic, the framework is compatible with the latest flagship releases as soon as they are available.
- OpenAI: Native support for the GPT-5 series (5.4, mini, nano), the o4 reasoning models, and full legacy support for GPT-4o and GPT-4 Turbo.
- Anthropic: Support for the Claude 4 generation, including Opus 4.7, Sonnet, and Haiku.
- Google Gemini: Support for Gemini 3.1 Pro and Flash, leveraging Vertex AI or AI Studio for multi-modal and long-context tasks.
- Azure OpenAI: Enterprise-grade OpenAI deployments with support for the latest model deployments.
- AWS Bedrock: Access to the latest Claude and Titan models hosted on Amazon Web Services.
- OpenRouter: Access to hundreds of open-source and proprietary models via a single unified API.
- Telnyx: Support for models hosted on the Telnyx platform.
- GLM (Zhipu AI): Support for the ChatGLM/GLM-5 series of models.
Local & Specialized Providers
- Ollama: Seamless integration with local models running on your machine.
- OpenAI-Compatible: A generic
CompatibleProviderfor any service implementing the OpenAI REST API. - GitHub Copilot: Integration with GitHub Copilot Chat for developer-centric workflows.
Meta-Providers (Orchestration)
These providers wrap other providers to add resilient or intelligent behavior:
- ReliableProvider: Enhances any base provider with retries, exponential backoff, and automatic failover to fallback models.
- RouterProvider: Dynamically routes requests to different models based on hints or input complexity.
CLI & Tooling Integrations
Light-Fabric includes specialized integrations for developer tools and terminal environments:
- Claude Code CLI: Integration with Anthropic's Claude Code environment.
- Gemini CLI: Terminal-based access to Google's Gemini models.
- KiloCLI: Light-Fabric's native CLI integration for rapid testing and automation.
Key Capabilities
Providers can be queried for their support of advanced features:
- Native Tool Calling: Efficiently generate structured function calls.
- Vision: Process images alongside text prompts.
- Prompt Caching: Leverage provider-side caching to reduce latency and costs for long contexts.
Agentic Workflow Design
Agentic Workflow in Light-Fabric implements a hybrid orchestration model for enterprise business processes. The workflow is deterministic, auditable, and stateful, while selected steps can be executed by agents, API calls, rule engine checks, or humans.
The design goal is not to replace enterprise process control with an open-ended agent loop. The goal is to let agents work inside a managed process that has clear state, clear ownership, repeatable execution, and human approval where needed.
Enterprise Challenge
In regulated or operationally sensitive environments, a purely autonomous AI agent is not enough for long-running business work.
- Compliance requires deterministic process paths, approval records, and audit history.
- Reliability requires long-running state to survive process restarts, UI disconnects, and agent failures.
- Safety requires human-in-the-loop checkpoints for decisions with business, security, or financial impact.
- Coordination requires multiple humans and roles to participate in the same process.
- Testing requires the same workflow to run interactively with humans or headlessly with example data.
Light-Fabric solves this by separating orchestration from execution.
Hybrid Model
The workflow is the deterministic process manager. It defines the ordered steps, conditions, retries, error handling, human checkpoints, and outputs.
Agents are workers inside that process. They can reason, call tools, ask for missing data, and use skills, but they do not own the overall process state.
| Feature | Traditional Workflow | Pure Agent Loop | Light-Fabric Hybrid |
|---|---|---|---|
| Path | Fixed | Dynamic | Fixed path with flexible task execution |
| State | Durable | Often transient | Durable workflow and task state |
| Human input | Forms and approvals | Ad hoc chat | First-class waiting tasks |
| Audit | Strong | Weak | Step-level audit and agent trace |
| API calls | Built into code | Tool calls | Spec-described endpoint invocations |
| Testing | Separate test harness | Prompt replay | Same workflow can run live tests |
Core Separation
There are two related specifications:
-
Agentic Workflow Specification Describes orchestration: task order, branching, human input, assertions, API calls, retries, errors, exports, and state transitions.
-
LightAPI Description Specification Describes API capabilities at the endpoint level: how an endpoint is invoked, what inputs it accepts, what result shape it returns, examples, behavior notes, and result expectations.
This separation is important. The workflow should not duplicate every endpoint contract. It should reference endpoint descriptions and use them to invoke calls, guide agents, and verify results.
Endpoint-Level Consumption
Light-Portal manages API descriptions at the endpoint level, not only at the whole API level.
This is necessary because real workflows often combine one endpoint from one API with one endpoint from another API. For example, onboarding an API to an AI gateway may involve:
- register an API
- create an API version from a specification
- create a development API instance
- configure the API through config server
- link the API instance to a gateway instance
- select endpoints to expose as MCP tools
- create a gateway config snapshot
- reload the gateway through controller
- run MCP tests against the gateway
Each step may come from a different API surface. The workflow consumes only the endpoints it needs.
The recommended model is:
- API-level descriptions can be authored for convenience and consistency.
- Endpoint-level descriptions are published and consumed by agents and workflows.
- Endpoint descriptions inherit shared context such as authentication, environments, sources, and secrets from an API catalog.
- Agents progressively load endpoint information by disclosure level instead of receiving the entire catalog up front.
Progressive Disclosure
Endpoint descriptions should be disclosed to agents in layers:
- index: operation id, title, tags, visibility
- summary: purpose, capability group, lifecycle
- invocation: input shape, request mapping, auth, examples
- behavior: result cases, errors, edge cases, assertions
- full: complete description for debugging or generation
This allows the agent to discover capabilities cheaply, load invocation details only for selected endpoints, and load behavior details only when verification or failure analysis needs it.
Workflow Task Types
The updated workflow specification adds first-class support for the task types needed by agentic API workflows.
Ask Task
ask pauses the workflow and waits for human input. It supports prompts, choices, validation, defaults, timeouts, and sensitive input.
The task returns the user's answer as task output. The normal export block should move the answer into workflow context.
Example:
- ask-authz:
ask:
prompt: Do you want to configure endpoint authorization?
mode: choice
options:
- label: Configure authorization
value: configure
- label: Skip
value: skip
export:
as:
authzChoice: ${ .result }
Assert Task
assert validates workflow state or API results. It is used for both live tests and interactive workflows.
It supports simple comparisons, JSONPath-style checks, length checks, regex checks, and rule-engine-backed assertions for complex business logic.
Assertion failures should produce structured, catchable errors so workflows can route failures to remediation, task creation, or agent investigation. Complex business assertions can delegate to Light-Rule.
API Call Tasks
The workflow supports direct and description-backed API calls:
- HTTP / OpenAPI
- JSON-RPC
- OpenRPC
- gRPC
- MCP tool/resource/prompt calls
For direct internal calls, jsonrpc can be used with an endpoint, method, params, id, notification flag, and error policy.
For cataloged JSON-RPC, openrpc references an OpenRPC document and method.
For MCP, the workflow references a tool, resource, or prompt and passes arguments. MCP capability descriptions belong in the API description layer; the workflow only selects and invokes them.
Explanation Metadata
Tasks can include explain metadata to help an agent or UI explain what is happening.
Useful fields include:
- purpose
- visible
- before
- success
- failure
- requires
Example:
explain:
purpose: Link the API instance to the development gateway.
visible: true
requires:
- portal-command-token authentication
- apiInstanceId from prior step
Human Task State
Human-in-the-loop behavior must be represented as durable workflow state.
Recommended task states:
A = active
W = waiting for input
C = completed
F = failed
X = canceled
When an ask or approval task reaches W, the process remains active but the task is no longer picked up by the executor. A user, CLI, scheduler, or agent must complete the task through the workflow API.
Waiting tasks should carry:
- prompt
- input mode
- options
- validation rules
- default value
- sensitive flag
- assignment metadata
- explanation metadata
- timeout policy
Assignment And Worklist
Enterprise workflows need more than chat. Some tasks must be assigned to roles or users and coordinated across multiple humans.
Human tasks should support:
- assigned user
- assigned role
- candidate roles
- claimed by
- claimed timestamp
- due timestamp
- priority
- comments
- audit trail
A role-based task appears in the worklist for users with a matching role. Once claimed, it belongs to the claiming user until completed, released, delegated, or timed out.
Client Architecture
light-workflow should run as a containerized backend service alongside other portal services. It owns workflow execution and state. Portal chat, worklist, CLI, scheduler, and agents are all clients of the same workflow APIs.
The client surfaces are:
- Portal Chat: conversational guidance for a single user.
- Worklist: role-based task inbox for approvals, reviews, and coordination.
- CLI: developer, CI/CD, live test, and automation interface.
- Scheduler: periodic headless execution, such as hourly live integration tests.
- Agent: task executor that can call APIs, use skills, and report results back to the workflow.
See Workflow Client Architecture for the dedicated client design.
Workflow Service API
The workflow service should expose one stable API boundary for all clients.
Core operations:
workflow.start
workflow.getInstance
workflow.listInstances
workflow.getEvents
workflow.listTasks
workflow.getTask
workflow.claimTask
workflow.releaseTask
workflow.completeTask
workflow.delegateTask
workflow.cancelInstance
Streaming clients should subscribe to workflow events through Server-Sent Events, WebSocket, or another portal-standard event mechanism.
Important event types:
- workflow started
- task started
- task completed
- task failed
- task waiting for input
- task assigned
- task claimed
- task completed by human
- agent started
- agent completed
- workflow completed
- workflow failed
Live Testing
The same workflow runtime should support interactive runs and headless live tests.
Interactive workflows use ask tasks when decisions or missing values are needed.
Live tests should use example data from LightAPI endpoint descriptions and workflow input fixtures instead of asking the user. Assertions should verify results through assert tasks or rule-engine checks.
This lets the scheduler run workflows every hour against the latest deployed services. When a test fails, the workflow can create a task with the failure detail and assign an agent or human to investigate.
Example: API Onboarding To AI Gateway
An API onboarding workflow can guide a user through a complex multi-endpoint process without requiring a dedicated UI for every operation.
The workflow can:
- ask for or infer the API metadata
- call the register API endpoint
- create an API version from an OpenAPI specification
- create a development API instance
- configure the API
- ask whether fine-grained authorization should be configured
- route to create or select authorization rules
- link the API instance to the development AI gateway
- select endpoints to expose as MCP tools
- create a gateway config snapshot
- reload the gateway through controller
- run MCP tests through the gateway
- assert expected results
- report success or create remediation tasks
The same workflow can run interactively through portal chat, be managed through the worklist, or run headlessly with examples as a live test.
Technical Implementation
The Light-Fabric implementation is split across:
workflow-core: Rust models for the workflow specification.workflow-builder: fluent builders for programmatic workflow construction.light-workflow: runtime service and executor.light-agent: agent execution surface for delegated agent tasks.light-rule: rule engine used by workflow and assertion tasks. See Light-Rule Design.
Runtime responsibilities include:
- deserializing workflow definitions
- claiming active tasks
- executing supported task types
- storing task output
- applying exports into process context
- creating next tasks
- pausing waiting tasks
- resuming after human completion
- failing or completing process instances
- exposing workflow APIs to clients
The current executable slice supports API invocation and verification tasks such as HTTP, JSON-RPC, OpenRPC, MCP over enterprise HTTP transports, rules, assertions, and waiting human input. MCP stdio transport is intentionally not a priority for enterprise deployment.
Design Rule
There must be one workflow runtime and one task state model.
Chat, worklist, CLI, scheduler, and agents should never implement their own workflow execution. They should all use the same light-workflow service APIs.
This keeps enterprise workflow behavior auditable, testable, and consistent regardless of how a process is started, resumed, or observed.
Workflow Client Architecture
Light-Fabric workflow execution should run as a containerized backend service, not as logic embedded in a portal screen, CLI, scheduler, or agent. The workflow service owns process state, task state, audit records, API invocation, agent invocation, and human-in-the-loop transitions. Clients are thin interaction surfaces over the same service APIs.
This separation lets the same workflow instance be driven by a portal chat session, a worklist user, a CLI command, a scheduler, or an AI agent without creating multiple execution models.
Goals
- Provide one authoritative workflow runtime for long-running enterprise processes.
- Support human-in-the-loop tasks from both conversational and worklist interfaces.
- Support headless execution for live tests, scheduled runs, and CI/CD.
- Keep all clients stateless or lightly stateful; workflow state lives in
light-workflow. - Make role assignment, audit, and retry behavior consistent across UI, CLI, scheduler, and agent use.
Runtime Service
light-workflow should be deployed as a portal service in a container alongside the other portal services. It should expose APIs for workflow definitions, workflow instances, task claiming, task completion, event streaming, and operational control.
The service is responsible for:
- loading workflow definitions
- starting workflow instances
- persisting
process_info_tandtask_info_t - executing API calls and assertions
- invoking agents for agent-owned tasks
- pausing on
askand approval tasks - assigning human tasks to users or roles
- resuming workflows when a human answer is submitted
- emitting workflow and task events
- recording audit history
Clients should never execute workflow steps themselves. They should only start workflows, inspect workflow state, and complete assigned tasks.
Client Surfaces
Portal Chat
The portal chat client is the guided conversational interface for a single user working through a process. It is useful when the workflow needs to ask clarifying questions, explain the next action, or guide a user through a complex multi-endpoint operation.
Typical uses:
- API onboarding
- API endpoint publication to an AI gateway
- guided configuration
- troubleshooting and remediation workflows
- interactive approval with explanation
The chat client should call the workflow service for current state and submit answers to waiting tasks. It may stream workflow events and render agent explanations, but it should not own workflow state.
Worklist
The worklist is the enterprise task inbox. It is the right interface for multi-user coordination, role-based assignment, approvals, escalations, and audit-sensitive operations.
Typical uses:
- approval tasks
- compliance review
- operations handoff
- role-based queue processing
- task claim and release
- delegated work
- due-date and priority management
The worklist should be built around waiting human tasks. A task may have:
- assigned user
- candidate roles
- assigned role
- priority
- due time
- claim status
- comments
- completion payload
- audit trail
The worklist is especially important because many enterprise workflows are not purely conversational. They need accountable ownership and coordination between multiple humans.
CLI
The CLI is a developer and automation client. It should use the same workflow service APIs as portal-view and should not contain separate execution logic.
Typical uses:
- local workflow testing
- live parity tests
- CI/CD automation
- scheduled headless runs
- debugging stuck workflow instances
- submitting test data
- completing simple waiting tasks from scripts
Example commands:
light-workflow start portal.onboard-api --input input.yaml
light-workflow status <instance-id>
light-workflow tasks --role portal-admin
light-workflow claim <task-id>
light-workflow answer <task-id> --value approve
light-workflow logs <instance-id>
light-workflow cancel <instance-id>
The CLI should be added after the workflow APIs stabilize. It will be valuable for developers and automation, but the worklist and portal chat should drive the primary enterprise UX.
API Boundary
The workflow service should expose a stable API boundary that all clients use. The API can be HTTP, JSON-RPC, or both, but the concepts should remain the same.
Core operations:
workflow.start
workflow.getInstance
workflow.listInstances
workflow.getEvents
workflow.listTasks
workflow.getTask
workflow.claimTask
workflow.releaseTask
workflow.completeTask
workflow.delegateTask
workflow.cancelInstance
For streaming clients, the service should expose workflow events through Server-Sent Events, WebSocket, or another portal-standard event mechanism.
Important event types:
- workflow started
- task started
- task completed
- task failed
- task waiting for input
- task assigned
- task claimed
- task completed by human
- agent started
- agent completed
- workflow completed
- workflow failed
Human Task State
ask and approval-style tasks should enter a waiting state. While waiting, the workflow instance remains active, but the task is no longer executable by the worker loop until a human answer is submitted.
Recommended states:
A = active
W = waiting for input
C = completed
F = failed
X = canceled
The waiting task should include enough metadata for all clients:
- prompt
- input mode
- options
- validation rules
- default value
- sensitivity flag
- assignment metadata
- explanation metadata
- timeout policy
The completion API should validate submitted input against the task definition before resuming the workflow.
Assignment Model
Human tasks should support both direct assignment and role-based queues.
Recommended fields:
assigned_user
assigned_role
candidate_roles
claimed_by
claimed_ts
due_ts
priority
comments
A role-based task can appear in the worklist for all users with a matching role. Once a user claims it, the task becomes owned by that user until completed, released, delegated, or timed out.
Recommended Build Order
- Implement stable workflow service APIs for start, status, events, task list, task claim, and task completion.
- Harden the
askresume path and waiting task state machine. - Build the worklist because it forces the assignment, audit, and state model to be correct.
- Build the portal chat workflow interaction on top of the same task APIs.
- Add the CLI after the API shape stabilizes.
- Add scheduler integration for hourly live tests and headless workflow runs.
Design Rule
There must be one workflow runtime and one task state model. Chat, worklist, CLI, scheduler, and agents are only clients of that runtime.
This keeps enterprise workflow behavior auditable, testable, and consistent regardless of how a workflow is started or resumed.
LightAPI Description Design
LightAPI Description is the endpoint capability specification used by Light-Fabric agents, workflows, live tests, and portal API administration.
It describes how an API endpoint is discovered, invoked, explained, and verified. It is intentionally separate from the Agentic Workflow Specification. Workflow describes process orchestration. LightAPI describes endpoint capability.
Why LightAPI
OpenAPI is useful for REST APIs, and OpenRPC is useful for JSON-RPC APIs, but Light-Fabric needs a common description model across multiple enterprise protocols:
- REST / HTTP
- OpenAPI-described HTTP
- JSON-RPC 2.0
- OpenRPC-described JSON-RPC
- gRPC
- MCP tools, resources, and prompts
LightAPI provides a single agent-facing and workflow-facing description layer over these protocols.
The goal is not to replace OpenAPI or OpenRPC. The goal is to reference them where they exist and add the missing information needed by agents and workflow live tests.
API-Level Authoring, Endpoint-Level Consumption
Light-Portal may let teams author descriptions at the API level for convenience. However, workflows and agents consume descriptions at the endpoint level.
This distinction is important because real workflow processes rarely use a whole API. They usually combine selected endpoints from multiple APIs.
For example, onboarding an API to an AI gateway may consume:
- one endpoint from API registration
- one endpoint from API version management
- one endpoint from API instance management
- one endpoint from config server
- one endpoint from gateway linking
- one endpoint from controller reload
- one or more MCP tools exposed through the gateway
Each consumed operation should have an endpoint-level description with a stable endpointId.
API-level descriptions are still useful as catalogs. Endpoint-level descriptions may inherit shared API context such as:
- environments
- authentication
- secrets
- sources
- common tags
- lifecycle metadata
Relationship To Agentic Workflow
Agentic Workflow and LightAPI have different responsibilities.
| Concern | Agentic Workflow | LightAPI Description |
|---|---|---|
| Process order | Yes | No |
| Branching and retries | Yes | No |
| Human-in-the-loop | Yes | No |
| Endpoint invocation contract | Reference only | Yes |
| Input and result examples | Optional workflow fixtures | Yes |
| Result verification expectations | Calls assert | Describes expected result cases |
| Agent progressive disclosure | Uses selected endpoints | Defines disclosure levels |
| Live testing | Orchestrates execution | Supplies examples and expected results |
In live tests, the workflow should use example data from LightAPI descriptions and workflow fixtures instead of asking for user input.
In interactive runs, the workflow may ask the user for missing values, then invoke endpoints described by LightAPI.
Relationship To Centralized Agent Skills
LightAPI endpoint descriptions are a source of agent skills.
The centralized skill registry should not require every API operation to be manually rewritten as a separate skill. Instead, Light-Portal can publish selected LightAPI endpoint descriptions into the skill registry as invokable capabilities.
The skill registry adds:
- permission-aware discovery
- semantic search
- skill grouping
- agent persona scoping
- audit around skill disclosure and execution
LightAPI provides:
- endpoint identity
- protocol details
- input schema
- request mapping
- result shape
- examples
- behavior notes
- result cases
Together, they allow an agent to discover a capability as a skill, progressively load only the endpoint details it needs, and execute through the workflow or controller runtime.
See Centralized Agentic Skill Registry for the skill registry design.
Core Document Concepts
A LightAPI document should support both API-level catalogs and endpoint-level documents.
Important top-level concepts:
lightapi: specification versionprofile:apiorendpointinfo: name, title, version, namespace, owner, contactcontext: inherited catalog context for endpoint-level documentssources: OpenAPI, OpenRPC, protobuf, MCP, or raw protocol referencesenvironments: environment-specific server detailssecrets: required secret namesauthentications: reusable authentication policiesoperations: endpoint operation descriptionstestSequences: linear endpoint test sequencesagent: progressive disclosure and skill metadata
For profile: endpoint, the document should describe at most one operation.
Operation Model
Each operation represents one endpoint-level capability.
Common fields include:
operationId: local operation identifierendpointId: globally stable endpoint identifiertitlesummarydescriptionvisibilitylifecycletagscapabilityagentinputrequestresultexamples
The input section describes the logical interface the agent or workflow sees.
The request section describes how logical input maps to the wire protocol.
The result section describes expected output, result cases, and failure shapes.
Protocol Coverage
HTTP And OpenAPI
For raw HTTP, the operation describes method, endpoint, headers, query, path, and body mappings.
For OpenAPI, LightAPI references the OpenAPI document and operation, then adds agent-oriented behavior, examples, and result expectations.
JSON-RPC And OpenRPC
For direct JSON-RPC, the operation describes endpoint, method, params, id behavior, notification behavior, and error policy.
For OpenRPC, LightAPI references the OpenRPC document and method. The workflow runtime can use the OpenRPC document to validate that the method exists and that required params are present before calling it.
gRPC
For gRPC, the operation describes service, method, protobuf source, transport, metadata, request mapping, and result mapping.
For browser or gateway-mediated enterprise deployments, gRPC over WebSocket can be represented as a transport on the structured protocol operation.
MCP
For MCP, the operation describes tool, resource, or prompt invocation.
Tool listing alone is not enough. The description must also include:
- input schema
- result shape
- examples
- behavior differences for important input cases
- error cases
- verification expectations
MCP stdio is not a priority for enterprise portal deployment. HTTP and streamable HTTP transports should be the main runtime targets.
Result Cases And Verification
LightAPI should describe expected result behavior, but Agentic Workflow should execute the actual assertions.
This keeps verification orchestration in one place.
Recommended model:
- LightAPI operation result cases describe expected outputs, failure shapes, and examples.
- Workflow test steps invoke the operation.
- Workflow
asserttasks verify actual output against expected result cases. - Complex business checks can call the rule engine.
This allows the same endpoint description to support:
- agent skill usage
- workflow execution
- live integration testing
- failure diagnosis
Progressive Disclosure For Agents
A LightAPI document should support progressive disclosure so an agent can load only the information needed at each stage.
Recommended levels:
index: endpoint id, title, tags, visibilitysummary: purpose, capability group, lifecycleinvocation: input schema, request mapping, authentication, examplesbehavior: result cases, edge cases, errors, assertionsfull: complete endpoint description
The portal can expose query APIs such as:
lightapi.listOperations
lightapi.getOperation
lightapi.getCapabilityGroup
Agents should start with index or summary data, load invocation details only for selected endpoints, and load behavior details only for testing, troubleshooting, or failure repair.
Portal Publishing Flow
Light-Portal should manage endpoint descriptions as part of API endpoint administration.
Recommended flow:
- API owner creates or imports API metadata.
- Portal extracts initial endpoint descriptions from OpenAPI, OpenRPC, protobuf, MCP, or raw endpoint configuration.
- API owner enriches endpoint descriptions with examples, behavior notes, result cases, and visibility.
- Portal stores endpoint-level LightAPI descriptions.
- Authorized agents and workflows query descriptions by endpoint, tag, lifecycle, visibility, or capability.
- Selected endpoints can be published into the centralized skill registry.
- Workflow instances reference endpoint descriptions during execution and live testing.
Live Test Use
Live tests should be workflow-driven.
LightAPI supplies:
- example input data
- expected result cases
- protocol invocation details
- error behavior
Agentic Workflow supplies:
- sequence
- fixtures
- environment selection
- endpoint invocation
- assertions
- failure routing
- task creation
- agent assignment
This avoids building a second test runner model outside the workflow engine.
Design Rule
LightAPI describes endpoint capability. Agentic Workflow orchestrates endpoint use. Centralized Skills expose selected capabilities to agents.
Keeping these responsibilities separate lets Light-Fabric support API administration, agent skill discovery, workflow execution, and live integration testing without duplicating endpoint definitions across multiple systems.
Light-Rule Design
Light-Rule is the local YAML rule engine used by Light-Fabric services and workflows for deterministic business checks, transformations, authorization decisions, and workflow assertions.
It complements agentic workflow by keeping critical decisions explicit, repeatable, and auditable. Agents can propose or select rules, but the rule engine executes the deterministic logic.
Purpose
Light-Rule is designed for enterprise services that need fast local policy and transformation logic without a database call on every request.
Primary uses:
- fine-grained authorization
- request transformation
- response transformation
- workflow assertions
- business validation
- permission and filter injection
- reusable rule templates selected from Light-Portal
The rule configuration is loaded locally by the target service. When permissions or rule mappings change, the controller can trigger a config reload so the service swaps to the latest rules.
Relationship To Agentic Workflow
Agentic Workflow orchestrates process steps. Light-Rule evaluates deterministic logic inside those steps.
Workflow uses Light-Rule in two main ways:
-
Rule call task A workflow task can call a named rule to validate or mutate workflow context.
-
Assert task extension Simple checks can be handled directly by
assert, while complex business checks can delegate to Light-Rule.
This separation keeps workflows readable. The workflow says when a check happens; Light-Rule defines the reusable business logic for the check.
Example workflow responsibilities:
- decide when authorization configuration is needed
- select or create a rule
- invoke a rule during live testing
- route failures to a human or agent
Example Light-Rule responsibilities:
- evaluate role, group, position, or attribute checks
- inject endpoint permissions into the context
- compute row or column filters
- execute transformation plugins
- return pass/fail for business assertions
See Agentic Workflow Design for the workflow orchestration model.
Relationship To LightAPI
LightAPI endpoint descriptions describe endpoint invocation and expected result behavior. Light-Rule can implement complex result checks that are too business-specific for simple schema assertions.
Recommended model:
- LightAPI describes endpoint result cases and expected behavior.
- Agentic Workflow invokes the endpoint and runs
asserttasks. asserthandles simple checks directly.- Light-Rule handles complex checks, authorization logic, row filters, column filters, and reusable business policies.
See LightAPI Description Design for endpoint capability descriptions.
Rule Specification
Rules are described by the rule specification in rule-specification/schema/rule.yaml.
The top-level configuration contains:
ruleBodies: named rule definitionsendpointRules: endpoint-to-rule mappings
Each rule can contain:
ruleIdruleDescversionauthorupdatedAtconditionsactions
Each endpoint mapping can contain:
req-tra: request transformation rulesres-tra: response transformation rulesaccess-control: access control rulespermission: permission values injected into contextx-*: extension rule phases
Rule Conditions
Conditions evaluate fields in the input context.
Supported operand forms:
- direct field:
role - dotted path:
user.role - JSON Pointer:
/user/role - JSONPath-like path:
$.user.roles[0]
Supported operators:
==
!=
>
<
>=
<=
eq
ne
contains
matches
startsWith
endsWith
exists
notExists
expected is typed and may be a string, number, boolean, array, object, or null.
Flat condition arrays are evaluated left-to-right. joinCode combines the current condition with the previous result.
A AND B OR C
is evaluated as:
(A AND B) OR C
If explicit grouping is required, split logic into multiple rules and combine them through endpoint mapping or workflow orchestration.
Rule Actions
Actions execute plugin logic after conditions pass.
An action contains:
actionIdactionClassNameactionValues
actionClassName identifies the registered plugin. actionValues carries plugin-specific configuration.
Typical action plugins:
- add values to request context
- inject permission attributes
- compute filters
- transform request body
- transform response body
- call a local business function
Actions are intentionally plugin-based so the schema remains stable while implementation logic can evolve.
Endpoint Rule Phases
Endpoint mappings define when rules run.
Request Transformation
req-tra rules run before the service handles the request. They can enrich or transform request context.
Response Transformation
res-tra rules run after the service produces a response. They can filter, redact, or reshape response data.
Access Control
access-control rules validate whether a request is allowed. These rules normally run in parallel because they should not mutate shared state.
Permission Injection
permission values are injected into the evaluation context before rule execution. This lets API owners configure roles, groups, attributes, row filters, or column filters without editing the technical rule body.
Extension Phases
Custom phases must use the x-* prefix. This avoids silent typos in standard phase names while preserving controlled extensibility.
Execution Model
The Rust implementation lives in crates/light-rule.
Core components:
RuleConfig: top-level config modelRule: rule definitionRuleCondition: condition modelRuleAction: action modelRuleEngine: evaluates one ruleActionRegistry: maps action class names to pluginsMultiThreadRuleExecutor: executes rule lists and endpoint phase mappings
Sequential phases such as req-tra and res-tra should run with all semantics so transformations happen in order.
Access control can run in parallel because it should be a validation step rather than a mutation step.
Why Not Replace With Cedar Or Casbin
Cedar and Casbin are strong policy engines, but Light-Rule has a different role in this platform.
Light-Rule supports:
- local YAML configuration
- request and response transformation
- permission injection
- row and column filters
- endpoint-specific rule selection
- technical-team-authored reusable rules
- API-owner-selected rule parameters
- config reload through controller
Cedar is excellent for authorization policy, but it does not naturally cover transformation, row filter, and column filter use cases. Casbin is strong for policy enforcement, but it introduces a different policy storage and matching model.
Light-Rule should remain the native rule engine for Light-Fabric service configuration and workflow assertions. External policy engines can still be integrated as action plugins if needed.
Governance
Rule bodies should be authored and reviewed like code or controlled configuration.
Recommended governance metadata:
versionauthorupdatedAtruleDesc
Recommended operational controls:
- validate rule YAML against the schema before publishing
- reject endpoint phase typos
- keep
ruleIdequal to theruleBodiesmap key - audit rule publication and reload events
- test rules with representative input contexts
- use workflow live tests to verify rules in integrated environments
Workflow Live Testing
Light-Rule is useful in live tests because it can express business checks that are more specific than generic JSON assertions.
Example flow:
- Workflow invokes an endpoint using LightAPI description.
- Workflow captures the endpoint response.
assertverifies simple fields.- A rule task validates business-specific behavior.
- On failure, workflow creates a task for a human or agent to investigate.
This keeps live test orchestration in workflow while preserving reusable business rules in Light-Rule.
Design Rule
Use workflow for process control. Use LightAPI for endpoint capability. Use Light-Rule for deterministic business logic.
Agents may select, explain, or help author rules, but the rule engine should execute the final deterministic decision.
Design Document: Centralized Agentic Skill Registry
Subject: Transitioning from File-Based Markdown Skills to a Database-Backed Skill Registry
1. Executive Summary
Currently, most AI agent frameworks rely on localized Markdown (.md) files to define agent "skills." While Markdown is highly LLM-native and human-readable, it creates significant bottlenecks at an enterprise scale regarding strict typing, API integration, and context window limits.
This document proposes transitioning to an Agentic Control Plane (Centralized Skill Registry) backed by a database. By decoupling skill metadata, schemas, and instructions, and by utilizing dynamic routing, we will achieve hierarchical structuring, strict schema enforcement, and progressive disclosure of tools to agents.
2. Problem Statement
Managing agent skills as flat Markdown files introduces several scaling challenges:
- Lack of Strict Typing: Markdown cannot enforce data types (e.g., ensuring a parameter is an integer vs. string), leading to hallucinated or malformed tool inputs.
- Context Window Exhaustion: Loading dozens or hundreds of skill definitions at startup overwhelms the LLM context window, increasing latency, token costs, and tool-misuse.
- Static Deployments: Updating a skill or changing access permissions requires a full application redeploy.
- Poor Discoverability: Flat file structures offer no native mechanism for progressive disclosure or tool search.
3. Data Models & Formats
To solve the limitations of purely text-based skills, we will adopt a hybrid, structured format stored within a database (e.g., PostgreSQL/MongoDB). The architecture uses the right format for the right job:
- JSON Schema: Used strictly for defining parameters, inputs, and tool shapes. Natively supported by OpenAI/Anthropic/Google tool-calling APIs.
- LightAPI Description (YAML/JSON): Used to map endpoint-level API capabilities to skills across REST, JSON-RPC, gRPC, and MCP.
- OpenAPI / OpenRPC / Protobuf: Referenced by LightAPI where protocol-native specifications already exist.
- Executable Code (Python/JS) / URI: Stores the actual execution logic or the endpoint reference.
- Markdown: Retained only for the
instructionsorpromptfields, as LLMs excel at parsing markdown headers and lists for constraints and persona instructions.
LightAPI is the preferred source format for API-backed skills because it describes endpoint identity, protocol invocation, input schema, request mapping, result shape, examples, and behavior notes in one agent-oriented document. See LightAPI Description Design for the endpoint description model.
3.1 Proposed Database Schema Structure
The centralized Controller will store skills in a structured table/collection. Below is a representation of the skill payload:
{
"skill_id": "sk_finance_001",
"name": "generate_financial_report",
"version": "1.2.0",
"tags": ["finance", "reporting"],
"tool_schema": {
"type": "function",
"function": {
"name": "generate_financial_report",
"description": "Generates a Q3 report based on ticker symbol.",
"parameters": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "The stock ticker"}
},
"required": ["ticker"]
},
"response_schema": {
"type": "object",
"properties": {
"report_url": {"type": "string"},
"status": {"type": "string"}
}
}
}
},
"execution": {
"type": "rest_api",
"endpoint_id": "ep_finance_report_001",
"endpoint": "https://internal-api.company.com/v1/finance/report",
"method": "POST"
},
"instructions": "## Role\nYou are a financial analyst.\n## Constraints\n- Never hallucinate financial data.\n- Always return exact numbers."
}
4. Hierarchical Structure & Progressive Disclosure
Dumping 500 JSON schemas into an LLM's context window will cause system failure. The Centralized Controller will act as a mediator, enforcing hierarchy and progressive disclosure (giving the agent only the schemas it needs, exactly when it needs them).
4.1 Implementing Hierarchy & Tagging
Because JSON Schema does not have built-in folders, hierarchy and categorization are enforced via the platform's global entity management system:
- Namespacing: Tool names follow a strict convention:
[domain]_[subdomain]_[action](e.g.,aws_rds_provision). - Tags & Categories: Instead of hardcoded columns, the registry utilizes the
entity_tag_tandentity_category_ttables (withentity_type = 'skill'). This allows for unlimited flat tagging and deep hierarchical folder structures that are consistent across the entire portal. - Discovery API: The Controller's discovery service filters by these tags/categories to scoped skill sets for specific agent personas.
4.2 Progressive Disclosure Patterns
Agents will no longer load all skills at startup. Instead, the Controller will mediate access using one (or a combination) of the following patterns:
Pattern A: Meta-Tools (Dynamic Injection)
The agent is booted with only two "meta-tools" designed for discovery.
search_skills(query): Agent searches the DB. The Controller returns lightweight summaries (Name + Description only, no heavy schemas).load_skill_schema(skill_name): Once the agent identifies the correct tool, it calls this. The Controller dynamically injects the heavy JSON schema into the context for the next turn.
Pattern B: Semantic Tool RAG (Zero-Shot Discovery)
For highly complex systems with thousands of skills:
- Tool descriptions are embedded into a Vector Database (e.g.,
pgvector). - When the user prompts the system (e.g., "Reset my AWS password"), the Controller intercepts the prompt, performs a semantic search, and retrieves the Top-3 most relevant JSON Schemas.
- The agent boots with only those 3 tools in its context.
Pattern C: Multi-Agent Orchestration (Supervisor / Worker)
Hierarchy is mapped to agent teams.
- A Supervisor Agent holds routing tools (e.g.,
delegate_to_finance,delegate_to_devops). - When
delegate_to_devopsis triggered, the Controller spins up a DevOps Worker Agent, loading only the specific DevOps JSON schemas into its context.
5. Example Flow: Dynamic Loading in Action
User: "I need to provision a new database for the marketing team."
- Turn 1: Discovery
- Agent Context: Possesses only
search_skills(query). - Agent Action: Calls
search_skills(query="provision database").
- Agent Context: Possesses only
- Turn 2: High-Level Awareness
- Controller Response: Returns token-efficient summaries from the DB:
[{"name": "aws_rds_provision", "description": "Creates AWS RDS DB"}, {"name": "mongo_atlas_create", "description": "Creates Mongo cluster"}] - Agent Action: Decides AWS is needed. Calls
load_skill_schema("aws_rds_provision").
- Controller Response: Returns token-efficient summaries from the DB:
- Turn 3: Strict Execution
- Controller Response: Injects the full JSON schema (requiring
instance_type,storage_gb). - Agent Action: Understands parameters and safely executes
aws_rds_provisionvia the Controller's execution engine.
- Controller Response: Injects the full JSON schema (requiring
6. Operational Benefits & Security
By centralizing skills in a database, the platform gains enterprise-grade operational capabilities:
- Dynamic Updates: API endpoints, instructions, and schemas can be updated in the database without restarting agents.
- Permission-Aware Discovery (RBAC): By linking skills to LightAPI endpoint descriptions and
api_endpoint_t, the Controller ensures that an agent only "discovers" tools that the current user/agent session is authorized to execute based on their roles. - A/B Testing: The Controller can route 50% of an agent's requests to
skill_v1and 50% toskill_v2to measure prompt/tool efficacy. - Audit Logging: Every tool injection and execution is logged at the Controller level, establishing a single pane of glass for multi-agent compliance.
- Distilled Memory RAG: Following the "Hindsight" pattern, raw conversation history (
agent_session_history_t) is separated from RAG-optimized memory (session_memory_t). This prevents the "noisy context" problem while maintaining a perfect audit trail.
7. LightAPI As Skill Source
API-backed skills should be generated from endpoint-level LightAPI descriptions whenever possible.
The skill registry should store skill metadata, access control, grouping, and agent-facing instructions. The LightAPI description should remain the source of truth for endpoint invocation and verification details.
Recommended flow:
- Light-Portal creates or imports endpoint-level LightAPI descriptions.
- API owners enrich endpoint descriptions with examples, behavior notes, result cases, and visibility.
- Approved endpoint descriptions are published as agent skills.
- Skill discovery returns lightweight summaries first.
- When the agent selects a skill, the registry loads the relevant LightAPI disclosure level.
- Execution goes through the workflow service or controller runtime, preserving audit and authorization.
This avoids manually duplicating every API endpoint as a separate hand-written skill while still giving agents strict schemas and progressive disclosure.
8. Next Steps
- Provision the
agent_skillstable in the core database. - Build the API layer (Controller) to handle
search,retrieve, andexecuterequests from agents. - Add publishing from LightAPI endpoint descriptions into the skill registry.
- Migrate existing Markdown-based skills into the structured DB payload (extracting prompts to the
instructionsfield and converting parameters to JSON Schema). - Implement Pattern B (Semantic Tool RAG) as the default progressive disclosure mechanism.
Hindsight Memory
Hindsight Memory is the core memory system for light-rs, designed to move beyond simple chat logs. Instead of just remembering what was said, the agent learns and forms mental models over time.
This design is strongly inspired by the paper Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects and extends it with multi-tenant support.
1. Core Concepts
Hindsight memory organizes information into three distinct "Pathway" types:
- World Facts: Objective truths about the environment (e.g., "The production server is in US-East-1").
- Experiences: The agent's own history of actions and results (e.g., "I tried to deploy to US-East-1 and it failed due to a timeout").
- Mental Models: Synthesized understandings formed by reflecting on facts and experiences (e.g., "Deployments to US-East-1 are unstable during peak hours").
2. The Three Operations
Interaction with the memory system is standardized into three primary operations:
Retain (Storage)
The retain operation ingests information. Behind the scenes, the system:
- Extracts entities and relationships.
- Normalizes time and temporal data.
- Stores the data in
agent_memory_unit_t.
Recall (Retrieval)
The recall operation retrieves relevant context using a hybrid strategy:
- Semantic: Vector similarity using the
hnswindex. - Graph: Following links in
agent_memory_link_t(causes, enables, prevents). - Temporal: Time-series filtering.
Reflect (Synthesis)
The reflect operation performs "deep thinking." It analyzes existing memories to generate new insights, which are stored in agent_memory_reflection_t.
3. Database Architecture
The Hindsight system is fully integrated into the portal's multi-tenant schema:
| Table Name | Description |
|---|---|
agent_memory_bank_t | The primary container. Defines personality and disposition (skepticism, empathy). |
agent_memory_doc_t | Source documents (logs, files, transcripts) that provide the raw text for memory units. |
agent_memory_unit_t | Sentence-level "atoms" of thought. Stores content, embeddings, and fact types (world, experience, etc.). |
agent_memory_entity_t | Resolved Knowledge Graph nodes, optionally linked to platform users (user_t). |
agent_memory_unit_entity_t | The join table linking individual memories to the entities they mention. |
agent_memory_entity_cooccur_t | Association graph tracking concept relationships and co-occurrence counts. |
agent_memory_link_t | Defines causal and semantic relationships between memories (causes, enables, etc.). |
agent_memory_directive_t | "Hard rules" that override probabilistic learning. |
agent_memory_reflection_t | Synthesized high-level insights generated during the "Reflect" phase. |
agent_session_history_t | The live record of active conversations, linked to a specific bank for context. |
4. Privacy & Multi-Tenancy
Isolation is managed at the Bank level using three scoping tiers:
- Global Host Bank (
user_idIS NULL,agent_def_idIS NULL):- Knowledge shared across all users and all agents within a specific
host_id. - Ideal for organization-wide SOPs, common facts, and shared documentation.
- Knowledge shared across all users and all agents within a specific
- Shared Agent Bank (
user_idIS NULL,agent_def_idIS NOT NULL):- Knowledge shared by all users interacting with a specific agent type.
- Used for maintaining a consistent agent "Persona" or specialized domain expertise.
- Private User Bank (
user_idIS NOT NULL):- Knowledge unique to a specific user.
- Can be scoped further by
agent_def_idto provide user-specific memory within a particular agent persona. - Used for personal preferences, private history, and individualized learning.
5. Implementation Guide
To implement a "Learning Agent," follow this sequence in your application logic:
- Ingestion: After every tool call or user interaction, call
retainto update the bank. - Context Loading: Before calling the LLM, call
recallto fetch the most relevant 3-5 memories for the current prompt. - Scheduled Reflection: Run the
reflectoperation during idle time to compress raw experiences into high-level mental models.
Fine-Grained Authorization
The following is the raw material collected when I chat with AI about migrating the light-4j fine-grained authorization to light-rs to secure light-agent and light-gateway.
The information needs to be organized and documented into a design document for light-rs fine-grained authorization.
Question: We have a Java implementation of fine-grained authorization that we want to migrate to light-rs. In our java implementation, we have a middleware handler to verify the custom claims like role, group, attribute etc in the jwt token against endpoint specific rules defined in database and cached at the service. How should we handle the same requirement in rust?
Response:
In Rust, bridging the gap between dynamic, database-driven rules and high-performance web requests is a well-solved problem. If you are coming from Java, the architecture translates very cleanly, but takes advantage of Rust's async runtime and thread-safe concurrency primitives.
To handle fine-grained authorization (Attribute-Based Access Control / ABAC or Role-Based Access Control / RBAC) where rules are cached locally but backed by a database, you have two primary paths in Rust:
- The Custom Middleware Approach (Closest to your current Java implementation)
- The Policy Engine Approach (Using an industry-standard Rust authorization crate)
Here is how you handle this requirement idiomatically in Rust, assuming you are using Axum (the current standard web framework in Rust).
Approach 1: Custom Middleware with Async Caching
In Java, you likely used something like Guava or Caffeine for your local cache. In Rust, the exact equivalent for high-performance, concurrent, async-aware caching is the moka crate.
Here is the architectural blueprint:
- Define your custom claims using
serde. - Setup a shared Application State holding your Database connection pool (e.g.,
sqlx) and your local Cache (moka). - Create an Axum Middleware to intercept the request, parse the JWT, consult the cache based on the route, and authorize the request.
1. Dependencies (Cargo.toml)
axum = "0.7"
jsonwebtoken = "9.0"
moka = { version = "0.12", features = ["future"] } # Async cache (Rust's Caffeine)
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio"] }
serde = { version = "1.0", features = ["derive"] }
2. Defining Claims and State
#![allow(unused)] fn main() { use axum::{extract::State, http::Request, middleware::Next, response::Response}; use moka::future::Cache; use serde::{Deserialize, Serialize}; use sqlx::PgPool; use std::sync::Arc; // 1. Define your custom JWT claims #[derive(Debug, Serialize, Deserialize, Clone)] struct CustomClaims { sub: String, roles: Vec<String>, groups: Vec<String>, attributes: std::collections::HashMap<String, String>, exp: usize, } // 2. Define the Endpoint Rule (what you fetch from DB) #[derive(Clone)] struct EndpointRule { required_roles: Vec<String>, // add required_groups, attributes, etc. } // 3. Define the Application State to share across your app struct AppState { db: PgPool, // Cache keyed by String (e.g., "GET:/api/data") returning an EndpointRule rule_cache: Cache<String, EndpointRule>, } }
3. The Authorization Middleware
In Axum, middleware is written as an asynchronous function that takes the request, does some work, and either rejects it or passes it to the Next handler.
#![allow(unused)] fn main() { use axum::http::StatusCode; use axum::response::IntoResponse; async fn fine_grained_authz_middleware( State(state): State<Arc<AppState>>, req: Request<axum::body::Body>, next: Next, ) -> Result<Response, StatusCode> { // 1. Extract and decode the JWT from Authorization header let auth_header = req.headers().get("Authorization").and_then(|h| h.to_str().ok()); let claims = decode_jwt(auth_header).ok_or(StatusCode::UNAUTHORIZED)?; // 2. Identify the endpoint being accessed (e.g., "GET:/api/users/123") // Note: In practice, use req.extensions() to get the matched generic route template // rather than the raw URI so "/api/users/123" matches "/api/users/{id}" let endpoint_key = format!("{}:{}", req.method(), req.uri().path()); // 3. Check Cache or Fallback to DB let rule = match state.rule_cache.get(&endpoint_key).await { Some(cached_rule) => cached_rule, None => { // Cache miss: Fetch from database let db_rule = fetch_rule_from_db(&state.db, &endpoint_key) .await .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?; // Populate cache state.rule_cache.insert(endpoint_key, db_rule.clone()).await; db_rule } }; // 4. Execute Fine-Grained Authorization Logic let is_authorized = rule.required_roles.iter().any(|role| claims.roles.contains(role)); // ... check groups, evaluate custom attribute expressions, etc. if !is_authorized { return Err(StatusCode::FORBIDDEN); } // 5. Pass to the actual route handler! Ok(next.run(req).await) } // Dummy helper functions to make the example compile mentally fn decode_jwt(_header: Option<&str>) -> Option<CustomClaims> { /* ... */ unimplemented!() } async fn fetch_rule_from_db(_db: &PgPool, _key: &str) -> Result<EndpointRule, ()> { /* ... */ unimplemented!() } }
4. Wiring it up
use axum::{routing::get, Router}; #[tokio::main] async fn main() { let state = Arc::new(AppState { db: PgPool::connect("postgres://...").await.unwrap(), rule_cache: Cache::builder() .max_capacity(10_000) .time_to_live(std::time::Duration::from_secs(300)) // 5 minute TTL .build(), }); let app = Router::new() .route("/api/secure-data", get(|| async { "Secret Data" })) // Apply the middleware to all routes above this line .route_layer(axum::middleware::from_fn_with_state(state.clone(), fine_grained_authz_middleware)) .with_state(state); let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap(); axum::serve(listener, app).await.unwrap(); }
Approach 2: Using a Dedicated Policy Engine (Recommended for complex rules)
If your endpoint-specific rules are highly complex (e.g., "User must be in Group X AND have Attribute Y = 'finance', UNLESS Role is 'Admin'"), writing that evaluation logic manually in Rust gets messy.
Instead of writing custom evaluation logic, the Rust ecosystem heavily relies on external Policy Engines for ABAC/RBAC. The two leaders are:
1. AWS Cedar (cedar-policy) - Highly Recommended
Developed by Amazon Web Services in Rust, Cedar is a blazing-fast language for defining permissions.
- You store your Cedar policies in your database as text.
- Your Rust middleware fetches the Cedar policy string from the DB, compiles it (and caches the compiled policy using
moka). - You pass the JWT
CustomClaimsinto the Cedar engine as a context entity. - Cedar evaluates the claims against the policy in microseconds.
2. Casbin (casbin)
Casbin is a very popular authorization library that exists in Java, Go, and Rust.
- It supports ACL, RBAC, and ABAC.
- It has native adapters for
sqlx(Postgres/MySQL). You don't even need to write the caching logic yourself;casbin-rshandles reading the rules from the database and keeping them cached in memory automatically.
Summary
- For simple Role/Group matching: Stick to the Custom Middleware approach using
mokafor async caching andaxum::middlewareto intercept requests. It is functionally identical to your Java approach but extremely fast. - For complex Attribute evaluation: Use your middleware to parse the JWT, but delegate the actual rule evaluation to
cedar-policyorcasbin.
Question: Could you please provide some examples for cedar-policy and casbin? For example, what is the format in database? In our system, we are using a yaml rule engine to evaluate the fine-grained authorization. We create some rules users can choose and they can also create their own rules if necessary.
Response:
Transitioning from a custom YAML rule engine to an industry-standard policy engine is a fantastic move. Both AWS Cedar and Casbin are designed precisely for this kind of "user-defined fine-grained rules" requirement, but they take very different approaches to how rules are written and stored.
Here is a breakdown of how both engines work, what their database format looks like, and how you map your JWT claims to them in Rust.
1. AWS Cedar (cedar-policy) - Best for Complex/Custom Rules
Cedar uses a custom, highly readable Domain Specific Language (DSL). It is specifically built to evaluate ABAC (Attribute-Based Access Control) using contextual data like JWT claims.
The Format (In the Database)
In your database, you do not need complex tables. You simply store the Cedar policies as plain text strings in a standard TEXT or VARCHAR column.
If a user wants to create a custom rule saying, "Allow access if the user's JWT has the 'finance' group and a clearance level greater than 3," the Cedar policy stored in the DB looks like this:
// Database column: policy_text
permit (
principal,
action == Action::"read_data",
resource == Resource::"secure_endpoint"
)
when {
context.jwt.groups contains "finance" &&
context.jwt.attributes.clearance_level > 3
};
For "predefined rules that users can choose," Cedar has Policy Templates. You store the template once, and users just link their parameters to it.
// Template: "Require specific group"
permit(principal, action, resource)
when { context.jwt.groups contains ?required_group };
How you execute it in Rust
When a request comes in, you fetch the relevant Cedar text strings from your database, pass in the JWT claims as the Context, and let Cedar evaluate it.
#![allow(unused)] fn main() { use cedar_policy::{Authorizer, Context, Decision, Entities, PolicySet, Request}; use serde_json::json; use std::str::FromStr; fn evaluate_cedar_rule(db_policy_text: &str, jwt_claims: &serde_json::Value) { // 1. Parse the text policy from the database let policies = PolicySet::from_str(db_policy_text).expect("Invalid policy format"); // 2. Map your JWT claims directly into the Cedar "Context" using JSON let context_json = json!({ "jwt": jwt_claims }); let context = Context::from_json_value(context_json, None).unwrap(); // 3. Create the evaluation request let request = Request::new( Some("User::\"Alice\"".parse().unwrap()), Some("Action::\"read_data\"".parse().unwrap()), Some("Resource::\"secure_endpoint\"".parse().unwrap()), context, None, ).unwrap(); // 4. Evaluate let authorizer = Authorizer::new(); let entities = Entities::empty(); // Used if fetching hierarchical data let answer = authorizer.is_authorized(&request, &policies, &entities); if answer.decision() == Decision::Allow { println!("Access Granted!"); } else { println!("Access Denied. Failed policies: {:?}", answer.diagnostics().errors()); } } }
2. Casbin (casbin-rs) - Best for Tabular/Database-Native Rules
Casbin works fundamentally differently. It splits authorization into two parts: the Model (a config file defining the structure of your rules) and the Policies/Rules (the actual data stored in the database).
The Format (In the Database)
Casbin doesn't store a DSL string. Instead, it natively connects to your database using an adapter (like sqlx-adapter) and creates a table usually called casbin_rule.
The table structure looks exactly like a CSV file, with columns: ptype, v0, v1, v2, v3, v4, v5.
If a user selects a predefined rule to allow the finance_admin role to read /api/data, Casbin inserts a row into your database like this:
| ptype | v0 (Subject) | v1 (Object) | v2 (Action) | v3 (Condition - optional) |
|---|---|---|---|---|
p | finance_admin | /api/data | read |
If a user writes a custom ABAC rule using a UI (e.g., clearance > 3), Casbin can store the evaluation expression as a string in one of the columns (evaluated internally via the rhai scripting language):
| ptype | v0 (Subject/Condition) | v1 (Object) | v2 (Action) |
|---|---|---|---|
p | r.sub.groups.contains("finance") && r.sub.clearance > 3 | /api/data | read |
How you execute it in Rust
To make Casbin evaluate JWT attributes dynamically, you define a model.conf file that tells Casbin to evaluate (eval) the string stored in the database.
The model.conf File:
[request_definition]
r = sub, obj, act
[policy_definition]
p = sub_rule, obj, act
[policy_effect]
e = some(where (p.eft == allow))
[matchers]
# This matcher says: check if the object and action match,
# AND evaluate the ABAC rule stored in the database (p.sub_rule) against the request subject (r.sub)
m = r.obj == p.obj && r.act == p.act && eval(p.sub_rule)
The Rust Code:
#![allow(unused)] fn main() { use casbin::{CoreApi, Enforcer, DefaultModel}; use sqlx_adapter::SqlxAdapter; use std::collections::HashMap; // A struct representing your JWT Claims #[derive(Clone)] struct JwtClaims { groups: Vec<String>, clearance: i32, } async fn evaluate_casbin() { // 1. Connect to DB. The adapter automatically fetches the rules from the `casbin_rule` table. let adapter = SqlxAdapter::new("postgres://user:pass@localhost/db", 8).await.unwrap(); let mut enforcer = Enforcer::new("model.conf", adapter).await.unwrap(); // 2. Parse the JWT for the incoming request let claims = JwtClaims { groups: vec!["finance".to_string()], clearance: 5, }; // 3. Casbin in Rust uses `rhai` under the hood for dynamic object evaluation. // You map your JWT fields into a format the engine understands. let mut sub_map = HashMap::new(); sub_map.insert("groups", claims.groups); sub_map.insert("clearance", claims.clearance); // 4. Enforce the rules against the DB // Request is: Subject (JWT Claims Map), Object (/api/data), Action (read) if enforcer.enforce((sub_map, "/api/data", "read")).unwrap() { println!("Access Granted!"); } else { println!("Access Denied!"); } } }
Which one maps better to your YAML Rule Engine?
AWS Cedar is highly recommended for your use case.
Because you currently use a YAML rule engine where users define complex logical rules, your users are already accustomed to defining logical expressions.
- Translation: It is very easy to write a script that takes your existing YAML definitions and converts them into Cedar text policies.
- Safety: Cedar is statically typed and formally verified. If a user writes an invalid custom rule, Cedar will fail to parse it instantly and give you a helpful error message to show in your UI. Casbin's
eval()logic can sometimes be brittle at runtime if the user types bad syntax. - Database Schema: With Cedar, you control your database schema. You just add a
policy_textcolumn to your existing "Rules" table. Casbin requires taking over a table (casbin_rule) to map to its strict CSV-like format.
Question: In our current implementation, we define all the persmissions and filters from UI and save the permissions and picked up rule in database. During server startup, it load everything locally in yaml format like the following. One of the pros is that our technical team write the rules, and the user or api owner just define the roles, groups, positions, attributes for the endpoint. The rule will be executed locally on the target server without going to the database so the performance is very good. If any permission changed for the target server, we can trigger a config reload from controller to ask the server to reload the config/permission. I am not sure if we should replace this implementation with Cedar. One of the feature we have is the column and row filter and I don't think Cedar support it. What do you think?
rule.endpointRules:
echo@call:
req-acc:
- allow-role-based-access-control.lightapi.net
permission:
roles: mcp-reader
/v1/pets@get:
req-acc:
- allow-role-based-access-control.lightapi.net
permission:
roles: host-admin
/v1/accounts@get:
req-acc:
- allow-role-based-access-control.lightapi.net
res-fil:
- response-column-filter.lightapi.net
- response-row-filter.lightapi.net
permission:
col:
role:
teller: '["accountNo","userId","accountType","firstName","lastName"]'
account-manager: '["accountNo","userId","accountType","firstName","lastName","status"]'
row:
role:
teller:
- colName: accountType
colValue: C
operator: =
roles: account-manager teller
Response:
You have excellent architectural instincts. You are absolutely right to hesitate, and your concern about column and row filtering is the exact reason you probably should not replace your current system with Cedar.
Your current design is actually highly sophisticated. Let’s break down why your current implementation is so strong, why Cedar would struggle with your specific requirements, and how you should idiomatically build your current design in Rust.
Why Cedar is NOT a drop-in replacement for your system
AWS Cedar (and most standard policy engines like Casbin or OPA) are designed as Binary Decision Engines. You give them a Request, and they return an Allow or Deny.
The Missing Feature: Data Filtering (Masking) Your system handles two fundamentally different concerns in one elegant YAML file:
- Access Control (
req-acc): Can the user access the/v1/accountsendpoint? (Cedar does this perfectly). - Data Filtering (
res-fil): Which rows and columns can the user see in the response? (Cedar does not do this natively).
If you forced Cedar to do column filtering, you would have to misuse its "Annotations" feature to attach custom metadata to policies, read the output, and then write a custom Rust engine to parse that metadata and filter the JSON anyway. You would essentially be rebuilding your existing system, just with Cedar awkwardly wedged in the middle.
Strengths of your current architecture
What you have built is essentially a custom API Gateway / Sidecar Data Masking Engine.
- Performance: Loading rules into memory at startup means zero network latency. (Matches Cedar).
- Hot Reloading: Fetching updates via a controller is standard cloud-native practice. (Matches Cedar).
- Separation of Concerns: Tech team writes the underlying engine logic (
allow-role-based-access-control), while API owners just write YAML configs. (Matches Cedar’s Policy Templates). - Row/Column filtering: Natively understood by your engine. (Far exceeds Cedar).
Verdict: Do not switch to Cedar. Your current YAML-based, locally cached evaluation engine is a perfect fit for your business requirements.
How to implement your architecture in Rust
Since you are migrating to or building in Rust, your current Java-based architecture translates beautifully. In fact, Rust will make this system significantly faster and more memory-efficient.
Here is how you replicate your exact Java architecture—including the hot-reloading and filtering—in Rust.
1. Hot-Reloading using arc-swap
In Java, hot-reloading configuration safely across threads can be tricky. In Rust, the industry standard for "read-heavy, occasionally-updated" memory structures is the arc-swap crate. It allows your web server to read the YAML rules lock-free at millions of requests per second, while your controller can swap out the rules in the background instantly.
#![allow(unused)] fn main() { use arc_swap::ArcSwap; use serde::Deserialize; use std::sync::Arc; use std::collections::HashMap; // 1. Map your YAML exactly to Rust Structs #[derive(Deserialize, Debug, Clone)] struct EndpointRule { #[serde(rename = "req-acc")] req_acc: Option<Vec<String>>, #[serde(rename = "res-fil")] res_fil: Option<Vec<String>>, permission: PermissionConfig, } #[derive(Deserialize, Debug, Clone)] struct PermissionConfig { roles: String, col: Option<HashMap<String, HashMap<String, String>>>, // Map for roles to column JSON strings } // 2. Wrap your entire config in an ArcSwap lazy_static::lazy_static! { static ref CONFIG: ArcSwap<HashMap<String, EndpointRule>> = ArcSwap::from_pointee(HashMap::new()); } // 3. Background task to listen to the Controller and Hot-Reload async fn reload_config_from_controller(new_yaml_str: &str) { let new_config: HashMap<String, EndpointRule> = serde_yaml::from_str(new_yaml_str).unwrap(); // Swaps the pointer atomically. // All new requests see the new rules instantly. Old requests finish using the old rules. CONFIG.store(Arc::new(new_config)); } }
2. The Request Access (req-acc) Middleware
You would write an Axum middleware that intercepts the request, looks up the current endpoint in the arc-swap configuration, checks the JWT roles against permission.roles, and rejects the request if it fails.
#![allow(unused)] fn main() { // Inside your Axum middleware let current_config = CONFIG.load(); if let Some(rule) = current_config.get("/v1/accounts@get") { let required_roles: Vec<&str> = rule.permission.roles.split_whitespace().collect(); if !user_roles.iter().any(|r| required_roles.contains(&r.as_str())) { return Err(StatusCode::FORBIDDEN); // req-acc fails } } }
3. The Response Filtering (res-fil) Middleware
This is where Rust shines. Because Axum and Actix allow you to intercept the response body after the handler completes but before it goes to the user, you can apply your column filtering dynamically.
Using serde_json, you can strip out columns that the user isn't allowed to see based on the col mapping in your YAML.
#![allow(unused)] fn main() { use serde_json::Value; fn apply_column_filter(mut response_json: Value, allowed_columns_json: &str) -> Value { // 1. Parse the allowed columns from the YAML string (e.g., '["accountNo","firstName"]') let allowed_cols: Vec<String> = serde_json::from_str(allowed_columns_json).unwrap(); // 2. Filter the JSON object if let Value::Object(ref mut map) = response_json { // Retain only the keys that are in the allowed_cols list map.retain(|key, _| allowed_cols.contains(key)); } response_json } }
Summary
Keep your current system!
- The YAML approach is great. It acts as a perfect declarative contract between your API owners and your technical rules engine.
- Cedar is the wrong tool for Data Masking. Cedar is for strict Allow/Deny decisions, not mutating response bodies or filtering data.
- Rust will supercharge this. By using
serde_yamlto parse your config,arc-swapfor lock-free hot-reloading, andserde_jsonfor column masking inside an Axum middleware, you will get a system that does exactly what your Java system does, but with significantly less memory overhead and higher throughput.
Agent Engine Pattern
The Agent Engine Pattern is the architectural standard for building industrial-grade, metadata-driven AI platforms within the Light-Fabric ecosystem.
In this model, the Rust Runtime acts as a high-performance Orchestrator, while the Application Logic resides in externalized metadata (JSON/YAML) and the Hindsight Memory database.
1. Why the Metadata-Driven Approach?
- Separation of Concerns: Complex platform logic (security, retries, database connectivity, LLM integration) is implemented once in Rust. Business logic—defining agent personas, goals, and steps—is "programmed" via JSON or Database records.
- Hot-Reloading: Using the
arc-swapcrate and YAML-based rule engines, agent personas, model parameters, and tool access can be updated in real-time without a server restart. - Elastic Scalability: Deploy a single, generic
light-agentbinary. At runtime, it specializes into a "Researcher," "Auditor," or "Support Specialist" based on theworkflow_idoragent_idit retrieves from the registry. - High Performance: Rust's asynchronous
tokioruntime allows a single engine instance to manage thousands of concurrent agentic sessions with minimal memory overhead.
2. The Core Architecture: Engine vs. Content
To function as a generic interpreter, the Light-Fabric Engine relies on four primary components:
A. The Tool & Skill Registry (The "Hands")
The engine maps string identifiers in the workflow JSON (e.g., "call": "get_customer_data") to executable code or remote MCP tools.
- Implementation: Uses a
ToolRegistrywith trait objects (Box<dyn Tool>) or dynamic dispatch to MCP (Model Context Protocol) servers. - Logic: When the LLM requests a tool call, the engine verifies permissions via Fine-Grained Authorization, executes the tool, and feeds the result back into the context.
B. Hindsight State Manager (The "Memory")
Unlike simple session storage, the state manager persists every step of the agentic interaction into biomimetic memory banks.
- Implementation: Every "turn" in the conversation is saved as a
unit_tin the Hindsight database. - Benefit: Provides fault tolerance (resuming from a crashed step) and "Recall" capabilities, allowing agents to remember past interactions across different sessions.
C. Prompt Templating (The "Mind")
System prompts and instructions are stored as templates rather than hardcoded strings.
- Implementation: Uses the
teraorrinjaengines for high-performance string interpolation. - Example:
"You are a {{agent_role}}. Your current objective is to {{agent_goal}}." - Rust Logic: The engine merges runtime context (user input, memory recall, tool results) into the template before calling the LLM.
D. Policy Engine (The "Shield")
Before any tool execution or data retrieval, the engine consults the Light-Rule middleware.
- Logic: Ensures the agent has the authority to access specific data or execute specific functions, preventing "prompt injection" from leading to unauthorized actions.
3. Conceptual Implementation in Rust
The AgentEngine in Light-Fabric follows a non-blocking, async loop:
#![allow(unused)] fn main() { pub struct AgentEngine { registry: Arc<ToolRegistry>, memory: Arc<HindsightClient>, rules: Arc<RuleEngine>, } impl AgentEngine { pub async fn execute_step(&self, session_id: Uuid, task: Task) -> anyhow::Result<()> { // 1. Fetch current context from Hindsight Memory let mut context = self.memory.get_context(session_id).await?; // 2. Resolve Task Type (Agentic vs. Tool Call) match task { Task::LlmCall { agent_id, prompt_template } => { // Render prompt with Tera let prompt = self.render_prompt(prompt_template, &context)?; // Call LLM Provider let response = self.llm_provider.chat(prompt, &context).await?; // Retain turn in Hindsight self.memory.retain_turn(session_id, response).await?; }, Task::ToolCall { tool_name, params } => { // 3. Enforce Fine-Grained Authorization if self.rules.authorize(session_id, &tool_name).await? { let result = self.registry.call(&tool_name, params).await?; context.add_result(tool_name, result); } } } // 4. Update Session State self.memory.checkpoint(session_id, context).await } } }
4. Operational Challenges & Solutions
- Tool Versioning: As the platform evolves, tools may change. Light-Fabric handles this by versioning tool definitions in the Registry, ensuring old workflows remain compatible with the tools they were designed for.
- Safe Execution: For dynamic "scripts" defined in metadata, Light-Fabric utilizes WebAssembly (WASM) runtimes to provide a high-performance, secure sandbox that is superior to traditional container-based isolation.
- Observability: Because the engine is generic, tracing is built into the
light-runtime. Every step generates OpenTelemetry traces, allowing developers to visualize the "thought process" and execution path of any agent in real-time.
The Recommendation
Light-Fabric adopts this "Engine-first" philosophy to ensure the platform remains sustainable. By treating the Agentic Workflow as data and the Rust Runtime as the interpreter, we achieve the perfect balance of extreme performance and business flexibility.
Database Design
The Light-Fabric utilizes a robust PostgreSQL schema to manage the entire lifecycle of agentic workflows, skills, and the biomimetic Hindsight memory system. The schema is organized into four logical layers:
1. Workflow Engine
These tables manage the definition and execution of long-running agentic workflows.
wf_definition_t
Stores the Agentic Workflow DSL (YAML) that defines the high-level orchestration logic.
process_info_t & task_info_t
Manage the runtime state of workflow instances (processes) and individual steps (tasks). They include input_data, context_data, and error_info to provide a resilient "scratchpad" for intermediate variables.
worklist_t & worklist_asst_t
Manage task assignments and visibility for human-in-the-loop interactions.
2. Agentic Core (The "Brain & Skills")
These tables define the identity, expertise, and capabilities of individual agents.
agent_definition_t
Defines the agent's persona, model provider (OpenAI, Anthropic, etc.), and runtime parameters like temperature and max tokens.
skill_t
Stores the "Expertise" of an agent in Markdown format. Skills are hierarchical and versioned.
tool_t & tool_param_t
The "Hands" of the agent. Defines executable functions, including REST endpoints, MCP server calls, or WASM scripts.
agent_skill_t & skill_tool_t
Maps agents to skills and skills to tools, implementing the Progressive Disclosure pattern where agents only see the tools required for their current skill context.
3. Hindsight Memory System
A biomimetic memory architecture that transitions from flat logs to structured "atoms of thought."
agent_memory_bank_t
Profiles for memory banks, defining the "Personality and Disposition" (e.g., skepticism, empathy) of the memory layer.
agent_memory_unit_t
The individual "Atoms" of memory. Each unit contains content and a vector embedding (384-dim) for semantic retrieval.
agent_memory_entity_t & agent_memory_link_t
A Knowledge Graph layer that resolves entities and causal/semantic relationships between memory units.
4. Session Management
agent_session_history_t
The "Source of Truth" for active conversations, linking specific sessions to their respective Hindsight memory banks.
DDL Specification
-- Workflow Definitions: Stores the Agentic Workflow JSON
CREATE TABLE wf_definition_t (
host_id UUID NOT NULL,
wf_def_id UUID NOT NULL,
namespace VARCHAR(126) NOT NULL,
name VARCHAR(126) NOT NULL,
version VARCHAR(20) NOT NULL,
definition TEXT NOT NULL, -- The Agentic Workflow DSL in YAML
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, wf_def_id),
UNIQUE(host_id, namespace, name, version)
);
CREATE TABLE worklist_t (
host_id UUID NOT NULL,
assignee_id VARCHAR(126) NOT NULL,
category_id VARCHAR(126) DEFAULT '(all)' NOT NULL,
status_code VARCHAR(10) DEFAULT 'Active' NOT NULL,
app_id VARCHAR(512) DEFAULT 'global' NOT NULL,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN NOT NULL DEFAULT TRUE,
update_user VARCHAR (255) DEFAULT SESSION_USER NOT NULL,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY(host_id, assignee_id, category_id)
);
CREATE TABLE worklist_column_t (
host_id UUID NOT NULL,
assignee_id VARCHAR(126) NOT NULL,
category_id VARCHAR(126) DEFAULT '(all)' NOT NULL,
sequence_id INTEGER NOT NULL,
column_id VARCHAR(126) NOT NULL,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, assignee_id, category_id, sequence_id),
FOREIGN KEY(host_id, assignee_id, category_id) REFERENCES worklist_t(host_id, assignee_id, category_id) ON DELETE CASCADE
);
CREATE TABLE process_info_t (
host_id UUID NOT NULL,
process_id UUID NOT NULL, -- generated uuid
wf_def_id UUID NOT NULL, -- workflow definition id
wf_instance_id VARCHAR(126) NOT NULL, -- workflow intance id
app_id VARCHAR(512) NOT NULL, -- application id
process_type VARCHAR(126) NOT NULL,
status_code CHAR(1) NOT NULL, -- process status code 'A', 'C'
started_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
ex_trigger_ts TIMESTAMP WITH TIME ZONE NOT NULL,
custom_status_code VARCHAR(126),
completed_ts TIMESTAMP WITH TIME ZONE,
result_code VARCHAR(126),
source_id VARCHAR(126),
branch_code VARCHAR(126),
rr_code VARCHAR(126),
party_id VARCHAR(126),
party_name VARCHAR(126),
counter_party_id VARCHAR(126),
counter_party_name VARCHAR(126),
txn_id VARCHAR(126),
txn_name VARCHAR(126),
product_id VARCHAR(126),
product_name VARCHAR(126),
product_type VARCHAR(126),
group_name VARCHAR(126),
subgroup_name VARCHAR(126),
event_start_ts TIMESTAMP WITH TIME ZONE,
event_end_ts TIMESTAMP WITH TIME ZONE,
event_other_ts TIMESTAMP WITH TIME ZONE,
event_other VARCHAR(126),
risk NUMERIC,
risk_scale INTEGER,
price NUMERIC,
price_scale INTEGER, -- Scale (number of digits to the right of the decimal) of the risk column. NULL implies zero
product_qy NUMERIC,
currency_code CHAR(3),
ex_ref_id VARCHAR(126),
ex_ref_code VARCHAR(126),
product_qy_scale INTEGER,
parent_process_id VARCHAR(22),
deadline_ts TIMESTAMP WITH TIME ZONE,
parent_group_id NUMERIC,
process_subtype_code VARCHAR(126),
owning_group_name VARCHAR(126), -- Name of the group that owns the process
input_data JSONB, -- The initial data that triggered the workflow
context_data JSONB, -- The runtime "scratchpad" for intermediate variables
error_info TEXT, -- Detailed error or stack trace if the process fails
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, process_id),
FOREIGN KEY(host_id, wf_def_id) REFERENCES wf_definition_t(host_id, wf_def_id) ON DELETE CASCADE
);
CREATE TABLE task_info_t
(
host_id UUID NOT NULL,
task_id UUID NOT NULL,
task_type VARCHAR(126) NOT NULL,
process_id UUID NOT NULL,
wf_instance_id VARCHAR(126) NOT NULL,
wf_task_id VARCHAR(126) NOT NULL,
status_code CHAR(1) NOT NULL, -- U, A, C
started_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
locked CHAR(1) NOT NULL,
priority INTEGER NOT NULL,
completed_ts TIMESTAMP WITH TIME ZONE NULL,
completed_user VARCHAR(126) NULL,
result_code VARCHAR(126) NULL,
locking_user VARCHAR(126) NULL,
locking_role VARCHAR(126) NULL,
deadline_ts TIMESTAMP WITH TIME ZONE NULL,
lock_group VARCHAR(126) NULL,
task_input JSONB, -- Specific data passed to the task
task_output JSONB, -- Result returned by the task action
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, task_id),
FOREIGN KEY (host_id, process_id) REFERENCES process_info_t(host_id, process_id) ON DELETE CASCADE
);
CREATE TABLE task_asst_t
(
host_id UUID NOT NULL,
task_asst_id UUID NOT NULL,
task_id UUID NOT NULL,
assigned_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
assignee_id VARCHAR(126) NOT NULL,
reason_code VARCHAR(126) NOT NULL,
unassigned_ts TIMESTAMP WITH TIME ZONE NULL,
unassigned_reason VARCHAR(126) NULL,
category_code VARCHAR(126) NULL,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, task_asst_id),
FOREIGN KEY(host_id, task_id) REFERENCES task_info_t(host_id, task_id) ON DELETE CASCADE
);
CREATE TABLE audit_log_t
(
host_id UUID NOT NULL,
audit_log_id UUID NOT NULL,
source_type_id VARCHAR(126) NULL,
correlation_id VARCHAR(126) NULL,
user_id VARCHAR(126) NULL,
event_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
success CHAR(1) NULL,
message0 VARCHAR(126) NULL,
message1 VARCHAR(126) NULL,
message2 VARCHAR(126) NULL,
message3 VARCHAR(126) NULL,
message VARCHAR(500) NULL,
user_comment VARCHAR(500) NULL,
PRIMARY KEY(host_id, audit_log_id)
);
CREATE INDEX audit_log_idx1 ON audit_log_t (source_type_id, correlation_id, event_ts, user_id);
-- Agent Definitions: Stores the "Brain" configuration
CREATE TABLE agent_definition_t (
host_id UUID NOT NULL,
agent_def_id UUID NOT NULL,
agent_name VARCHAR(126) NOT NULL,
model_provider VARCHAR(64) NOT NULL, -- 'openai', 'anthropic', etc.
model_name VARCHAR(126) NOT NULL, -- 'gpt-4o', 'claude-3-5-sonnet'
api_key_ref VARCHAR(126), -- Reference to Secret Manager key
temperature NUMERIC(3,2) DEFAULT 0.7,
max_tokens INTEGER, -- max number of tokens can be used
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT TRUE,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, agent_def_id),
UNIQUE(host_id, agent_name)
);
-- Skills: Stores Instructions and Domain Knowledge (The "Expertise")
-- Note: Use entity_tag_t and entity_category_t with entity_type = 'skill'
-- for flat tagging and hierarchical folder structure of skills.
CREATE TABLE skill_t (
host_id UUID NOT NULL,
skill_id UUID NOT NULL,
parent_skill_id UUID, -- Self-reference for Hierarchy
name VARCHAR(126) NOT NULL,
description VARCHAR(500), -- High-level description for the initial LLM prompt
content_markdown TEXT NOT NULL, -- The actual instructions/prompts
description_embedding VECTOR(384), -- For semantic lookup/discovery
version VARCHAR(20) DEFAULT '1.0.0',
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, skill_id),
FOREIGN KEY(host_id, parent_skill_id) REFERENCES skill_t(host_id, skill_id)
);
CREATE INDEX idx_skill_active ON skill_t(active);
CREATE INDEX idx_skill_name ON skill_t(name);
-- Tools: Stores Executable Functions (The "Hands")
CREATE TABLE tool_t (
host_id UUID NOT NULL,
tool_id UUID NOT NULL,
name VARCHAR(126) NOT NULL,
description TEXT NOT NULL, -- Instructions for LLM on when/how to use this tool
-- Implementation specifics
implementation_type VARCHAR(50), -- 'java', 'mcp_server', 'rest', 'python', 'javascript'
implementation_class VARCHAR(500), -- FQCN if 'java'
mcp_server_name VARCHAR(126), -- MCP server name if 'mcp_server'
api_endpoint VARCHAR(1024), -- URL if 'rest'
api_method VARCHAR(10), -- HTTP Method if 'rest'
endpoint_id UUID, -- Reference to fine-grained auth endpoint
script_content TEXT, -- Source code if 'python'/'javascript'
response_schema JSONB, -- Strict output schema for tool results
description_embedding VECTOR(384), -- For semantic lookup/discovery
version VARCHAR(20) DEFAULT '1.0.0',
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, tool_id),
FOREIGN KEY(host_id, endpoint_id) REFERENCES api_endpoint_t(host_id, endpoint_id) ON DELETE CASCADE
);
CREATE INDEX idx_tool_host_endpoint ON tool_t(host_id, endpoint_id);
CREATE INDEX idx_tool_active ON tool_t(active);
CREATE INDEX idx_tool_name ON tool_t(name);
-- Tool Parameters: Defines the arguments for each tool
CREATE TABLE tool_param_t (
host_id UUID NOT NULL,
param_id UUID NOT NULL,
tool_id UUID NOT NULL,
name VARCHAR(255) NOT NULL,
param_type VARCHAR(50) NOT NULL, -- 'string', 'number', 'boolean', 'object', 'array'
required BOOLEAN DEFAULT true,
default_value JSONB,
description TEXT, -- Helps LLM understand what value to extract
validation_schema JSONB, -- JSON Schema for complex validation
order_index INTEGER DEFAULT 0,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, param_id),
FOREIGN KEY(host_id, tool_id) REFERENCES tool_t(host_id, tool_id) ON DELETE CASCADE
);
-- Skill Dependencies: Manages hierarchies where one skill requires another
CREATE TABLE skill_dependency_t (
host_id UUID NOT NULL,
skill_id UUID NOT NULL,
depends_on_skill_id UUID NOT NULL,
required BOOLEAN DEFAULT true,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY (host_id, skill_id, depends_on_skill_id),
FOREIGN KEY(host_id, skill_id) REFERENCES skill_t(host_id, skill_id),
FOREIGN KEY(host_id, depends_on_skill_id) REFERENCES skill_t(host_id, skill_id)
);
-- Agent-Skill Mapping: Links Agents to their Skills
CREATE TABLE agent_skill_t (
host_id UUID NOT NULL,
agent_def_id UUID NOT NULL,
skill_id UUID NOT NULL,
config JSONB DEFAULT '{}',
priority INTEGER DEFAULT 0,
sequence_id INTEGER DEFAULT 0, -- Order in which skills are concatenated
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, agent_def_id, skill_id),
FOREIGN KEY(host_id, agent_def_id) REFERENCES agent_definition_t(host_id, agent_def_id) ON DELETE CASCADE,
FOREIGN KEY(host_id, skill_id) REFERENCES skill_t(host_id, skill_id) ON DELETE CASCADE
);
CREATE INDEX idx_agent_skill_agent ON agent_skill_t(agent_def_id);
-- Skill-Tool Mapping: Implements Progressive Disclosure
CREATE TABLE skill_tool_t (
host_id UUID NOT NULL,
skill_id UUID NOT NULL,
tool_id UUID NOT NULL,
config JSONB DEFAULT '{}',
access_level VARCHAR(20) DEFAULT 'read', -- e.g., 'read', 'write', 'execute'
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, skill_id, tool_id),
FOREIGN KEY(host_id, skill_id) REFERENCES skill_t(host_id, skill_id) ON DELETE CASCADE,
FOREIGN KEY(host_id, tool_id) REFERENCES tool_t(host_id, tool_id) ON DELETE CASCADE
);
CREATE INDEX idx_skill_tool_skill ON skill_tool_t(skill_id);
-- -- Hindsight Advanced Memory System
-- Transitioned from flat logs to biomimetic memory banks (World, Experiences, Mental Models)
-- Memory bank profiles (Personality & Disposition)
CREATE TABLE agent_memory_bank_t (
host_id UUID NOT NULL,
bank_id UUID NOT NULL,
agent_def_id UUID, -- NULL if bank is shared across agents
user_id UUID, -- NULL if bank is global for the host/agent
bank_name VARCHAR(126) NOT NULL,
disposition JSONB NOT NULL DEFAULT '{"skepticism": 3, "literalism": 3, "empathy": 3}'::jsonb,
background TEXT,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, bank_id),
FOREIGN KEY(host_id) REFERENCES host_t(host_id) ON DELETE CASCADE,
FOREIGN KEY(host_id, agent_def_id) REFERENCES agent_definition_t(host_id, agent_def_id) ON DELETE CASCADE,
FOREIGN KEY(user_id) REFERENCES user_t(user_id) ON DELETE CASCADE
);
-- Source documents for memory units
CREATE TABLE agent_memory_doc_t (
host_id UUID NOT NULL,
doc_id UUID NOT NULL,
bank_id UUID NOT NULL,
original_text TEXT,
content_hash TEXT,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY (host_id, bank_id, doc_id),
FOREIGN KEY (host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE
);
-- Individual sentence-level memories (The "Atoms" of thought)
CREATE TABLE agent_memory_unit_t (
host_id UUID NOT NULL,
unit_id UUID NOT NULL,
bank_id UUID NOT NULL,
doc_id UUID,
content TEXT NOT NULL,
embedding vector(384),
context TEXT,
event_date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
occurred_start TIMESTAMP WITH TIME ZONE,
occurred_end TIMESTAMP WITH TIME ZONE,
mentioned_at TIMESTAMP WITH TIME ZONE,
fact_type VARCHAR(32) NOT NULL DEFAULT 'world' CHECK (fact_type IN ('world', 'experience', 'opinion', 'observation', 'mental_model')),
metadata JSONB DEFAULT '{}'::jsonb,
proof_count INT DEFAULT 1,
source_memory_ids UUID[] DEFAULT ARRAY[]::UUID[],
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, bank_id, unit_id),
FOREIGN KEY(host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE,
FOREIGN KEY(host_id, bank_id, doc_id) REFERENCES agent_memory_doc_t(host_id, bank_id, doc_id) ON DELETE CASCADE
);
CREATE INDEX idx_mem_unit_bank ON agent_memory_unit_t(bank_id);
CREATE INDEX idx_mem_unit_embedding ON agent_memory_unit_t USING hnsw (embedding vector_cosine_ops);
-- Resolved entities (Knowledge Graph Nodes)
CREATE TABLE agent_memory_entity_t (
host_id UUID NOT NULL,
entity_id UUID NOT NULL,
bank_id UUID NOT NULL,
user_id UUID, -- Link to user_t if this entity is a platform user
canonical_name TEXT NOT NULL,
mention_count INT DEFAULT 1,
metadata JSONB DEFAULT '{}'::jsonb,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY (host_id, bank_id, entity_id),
FOREIGN KEY (host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE,
FOREIGN KEY (user_id) REFERENCES user_t(user_id) ON DELETE CASCADE
);
-- Association between memory units and entities
CREATE TABLE agent_memory_unit_entity_t (
host_id UUID NOT NULL,
bank_id UUID NOT NULL,
unit_id UUID NOT NULL,
entity_id UUID NOT NULL,
PRIMARY KEY (host_id, bank_id, unit_id, entity_id),
FOREIGN KEY (host_id, bank_id, unit_id) REFERENCES agent_memory_unit_t(host_id, bank_id, unit_id) ON DELETE CASCADE,
FOREIGN KEY (host_id, bank_id, entity_id) REFERENCES agent_memory_entity_t(host_id, bank_id, entity_id) ON DELETE CASCADE
);
-- Cache of entity co-occurrences (Concept Relationship Graph)
CREATE TABLE agent_memory_entity_cooccur_t (
host_id UUID NOT NULL,
bank_id UUID NOT NULL,
entity_id_1 UUID NOT NULL,
entity_id_2 UUID NOT NULL,
cooccur_count INT DEFAULT 1,
last_cooccurred TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY (host_id, bank_id, entity_id_1, entity_id_2),
CONSTRAINT entity_cooccur_order_check CHECK (entity_id_1 < entity_id_2),
FOREIGN KEY (host_id, bank_id, entity_id_1) REFERENCES agent_memory_entity_t(host_id, bank_id, entity_id) ON DELETE CASCADE,
FOREIGN KEY (host_id, bank_id, entity_id_2) REFERENCES agent_memory_entity_t(host_id, bank_id, entity_id) ON DELETE CASCADE
);
CREATE INDEX idx_mem_cooccur_e1 ON agent_memory_entity_cooccur_t(host_id, entity_id_1);
CREATE INDEX idx_mem_cooccur_e2 ON agent_memory_entity_cooccur_t(host_id, entity_id_2);
-- Links between memory units (Semantic & Causal relationships)
CREATE TABLE agent_memory_link_t (
host_id UUID NOT NULL,
bank_id UUID NOT NULL,
from_unit_id UUID NOT NULL,
to_unit_id UUID NOT NULL,
link_type VARCHAR(32) NOT NULL,
weight FLOAT NOT NULL DEFAULT 1.0,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY (host_id, bank_id, from_unit_id, to_unit_id, link_type),
CONSTRAINT memory_links_type_check CHECK (link_type IN ('temporal', 'semantic', 'entity', 'causes', 'caused_by', 'enables', 'prevents')),
FOREIGN KEY (host_id, bank_id, from_unit_id) REFERENCES agent_memory_unit_t(host_id, bank_id, unit_id) ON DELETE CASCADE,
FOREIGN KEY (host_id, bank_id, to_unit_id) REFERENCES agent_memory_unit_t(host_id, bank_id, unit_id) ON DELETE CASCADE
);
-- Directives (Hard rules that override probabilistic learning)
CREATE TABLE agent_memory_directive_t (
host_id UUID NOT NULL,
directive_id UUID NOT NULL,
bank_id UUID NOT NULL,
name VARCHAR(256) NOT NULL,
content TEXT NOT NULL,
priority INT NOT NULL DEFAULT 0,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, bank_id, directive_id),
FOREIGN KEY(host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE
);
-- Reflections (Synthesized knowledge and high-level observations)
CREATE TABLE agent_memory_reflection_t (
host_id UUID NOT NULL,
reflection_id UUID NOT NULL,
bank_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(384),
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, bank_id, reflection_id),
FOREIGN KEY(host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE
);
CREATE INDEX idx_mem_reflection_embedding ON agent_memory_reflection_t USING hnsw (embedding vector_cosine_ops);
-- Raw Session History (The source of Truth for active conversations)
CREATE TABLE agent_session_history_t (
host_id UUID NOT NULL,
session_id UUID NOT NULL,
bank_id UUID NOT NULL, -- Links the session to a Hindsight bank
messages JSONB NOT NULL DEFAULT '[]'::jsonb,
metadata JSONB DEFAULT '{}'::jsonb,
aggregate_version BIGINT DEFAULT 1 NOT NULL,
active BOOLEAN DEFAULT true,
update_ts TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
update_user VARCHAR(126) DEFAULT SESSION_USER,
PRIMARY KEY(host_id, bank_id, session_id),
FOREIGN KEY(host_id, bank_id) REFERENCES agent_memory_bank_t(host_id, bank_id) ON DELETE CASCADE
);
CREATE INDEX idx_session_bank ON agent_session_history_t(host_id, bank_id);
Light-Deployer Design
light-deployer is the cluster-local Kubernetes deployment executor in
Light Fabric.
This document focuses only on the deployer service that lives in
apps/light-deployer. The broader Light Portal deployment workflow, approval
flow, deployment history model, controller routing, and portal UI are covered
outside this repository.
Purpose
light-deployer receives a deployment command, fetches Kubernetes templates,
renders them with deployment values, validates the resulting resources, applies
or deletes resources in the target Kubernetes cluster, and returns safe status
details.
It is intentionally narrow. It does not decide whether a user is allowed to deploy an instance, does not own portal deployment history, and does not create tenant business workflows. Those decisions belong to Light Portal, Light Controller, and the workflow engine.
Service Boundary
light-deployer owns:
- local deployment policy enforcement
- template repository fetch
- YAML template rendering
- manifest parsing and resource summary generation
- Kubernetes dry-run, apply, delete, status, and pruning
- safe event and error reporting
- direct local/MicroK8s deployment endpoints
light-deployer does not own:
- tenant authorization
- instance metadata
- deployment approval
- deployment history persistence
- config snapshot creation
- long-running human workflow decisions
The deployer should reject commands outside its local policy even if an upstream service sends them.
Runtime Model
The service follows the same runtime pattern as light-agent.
main.rs builds the domain service and starts it through:
#![allow(unused)] fn main() { LightRuntimeBuilder::new(AxumTransport::new(app)) }
The HTTP listener is owned by light-runtime and light-axum, not by
service-specific socket code. Bind address, HTTP/HTTPS ports, service identity,
and registry settings live in runtime config files.
Default config files:
config/server.ymlconfig/deployer.ymlconfig/portal-registry.yml
Local cargo run resolves config from apps/light-deployer/config when run
from the workspace root. The container image runs from /app and uses
/app/config.
Public Endpoints
Phase 1 exposes a direct HTTP surface for local and MicroK8s testing:
GET /health
GET /ready
POST /mcp
GET /mcp/tools
GET /mcp/tools/list
GET /mcp/tools/{tool}
POST /deployments
POST /mcp/tools/{tool}
GET /events?request_id=...
POST /mcp is the MCP JSON-RPC 2.0 endpoint. It supports tools/list,
tools/call, and a minimal initialize response. This is the endpoint that
MCP clients, Light Portal, and AI agents should use.
/deployments accepts the canonical deployment request directly.
/mcp/tools/{tool} maps tool names onto the same internal service functions as
a REST-style local debugging convenience. The convenience tool-list endpoints
return metadata with name, description, inputSchema, endpoint, and
method, but they are not the MCP protocol endpoint.
Supported tool names:
deployment.renderdeployment.dryRundeployment.diffdeployment.applydeployment.deletedeployment.statusdeployment.rollback
The direct HTTP mode is useful for development and managed environments. The same internal command handling should later be reused by controller-mediated WebSocket/MCP routing.
Request Model
A deployment request is explicit and auditable.
{
"requestId": "01964b05-0000-7000-8000-000000000001",
"hostId": "01964b05-552a-7c4b-9184-6857e7f3dc5f",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "petstore-dev",
"action": "deploy",
"values": {
"name": "petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest"
}
},
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
},
"options": {
"dryRun": false,
"waitForRollout": true,
"timeoutSeconds": 300,
"pruneOverride": false
}
}
The current implementation supports inline values. The request model also
contains fields for future values references and immutable snapshot metadata so
it can align with the full portal deployment workflow.
When invoking a specific /mcp/tools/{tool} endpoint, callers do not need to
send action. The deployer derives the action from the tool name. The generic
/deployments endpoint still expects an explicit action in the request body.
For the MCP endpoint, callers use JSON-RPC:
{
"jsonrpc": "2.0",
"id": "tools-list-1",
"method": "tools/list",
"params": {}
}
Tool invocation uses tools/call:
{
"jsonrpc": "2.0",
"id": "render-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "local",
"namespace": "light-deployer",
"values": {},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}
tools/call derives the deployment action from params.name; callers should
not provide an action field in arguments.
Actions
render
: Fetch templates, render manifests, add namespaces and management labels, and
return resource summaries plus a manifest hash.
dryRun
: Render manifests and validate them against Kubernetes using server-side
dry-run.
diff
: Render manifests, fetch current managed resources, calculate additions,
modifications, and pruned resources, and return a redacted diff summary.
deploy
: Accept the request, run the deployment in the background, apply manifests,
prune removed managed resources, and stream events.
undeploy
: Delete resources associated with the deployment.
status
: Return current managed resource status.
rollback
: Reserved for redeploying a previous immutable portal snapshot. Native
Kubernetes rollout undo is not the target rollback model because it does not
restore ConfigMaps, Secrets, or values snapshots.
Template Fetching
Templates are loaded through the TemplateSource trait.
The current source supports two modes:
- local template root through
LIGHT_DEPLOYER_TEMPLATE_BASE_DIR - remote HTTPS Git clone through
gix
For remote repositories, the deployment request provides:
{
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
}
}
Private HTTPS Git access is controlled by environment variables:
LIGHT_DEPLOYER_GIT_TOKEN: token or app passwordLIGHT_DEPLOYER_GIT_USERNAME: optional username override
Defaults:
- GitHub uses
x-access-token - Bitbucket Cloud uses
x-token-auth
SSH authentication is intentionally deferred because it requires private key
handling and strict known_hosts validation.
Template Format
The built-in renderer uses simple placeholders:
image: ${image.repository}:${image.tag:latest}
Supported behavior:
- nested paths such as
image.repository - default values after
: - render failure when a required value is missing
- placeholder replacement only inside YAML string scalar values
The renderer parses YAML into serde_yaml::Value, traverses the AST, replaces
placeholders, and serializes or applies structured YAML values afterward. This
avoids the most common raw string replacement bugs around quoting,
indentation, certificates, and multi-line values.
Because placeholders currently produce strings, templates should avoid
placeholders in numeric-only Kubernetes fields unless Kubernetes accepts a
string value there. For example, containerPort should be fixed or rendered by
a future typed placeholder extension.
Resource Metadata
After rendering, the deployer ensures every resource has the target namespace and adds management labels:
app.kubernetes.io/managed-by=light-deployerlightapi.net/host-idlightapi.net/instance-idlightapi.net/request-id
These labels are used for status lookup and pruning.
Kubernetes Execution
Kubernetes execution is behind the KubeExecutor trait.
Current implementations:
KubeRsExecutor: real Kubernetes API execution throughkube-rsNoopKubeExecutor: local render/test mode
Execution mode:
LIGHT_DEPLOYER_KUBE_MODE=real: force real Kubernetes modeLIGHT_DEPLOYER_KUBE_MODE=noop: force no-op mode- default: real mode when
KUBERNETES_SERVICE_HOSTis present, otherwise no-op
The production path uses kube-rs, not kubectl.
Kubernetes operations should use:
- in-cluster ServiceAccount auth when running as a pod
- server-side dry-run for validation
- server-side apply with field manager
light-deployer - structured status and error handling
Pruning
The deployer is declarative. If a previously managed resource is no longer rendered from the template, it should be considered for pruning.
Pruning is calculated by comparing:
- current resources in the namespace with
lightapi.net/instance-id - resources rendered from the new template
The policy layer enforces blast-radius protection:
- maximum delete percentage
- sensitive kinds requiring override
- explicit
pruneOverridein deployment options
This prevents stale resources while still protecting against accidental large-scale deletion.
Policy
The local deployer.yml policy constrains what a deployer is allowed to do.
Policy dimensions:
- allowed namespaces
- allowed repository hosts
- allowed repository URL prefixes
- allowed image registries
- allowed actions
- allowed Kubernetes kinds
- blocked Kubernetes kinds
- prune settings
- development insecure mode
Version 1 allows application-level resource kinds by default:
DeploymentServiceIngressConfigMapSecret
Cluster-scoped and control-plane resources are blocked by default:
NamespaceClusterRoleClusterRoleBindingCustomResourceDefinition- admission webhooks
Security
The deployer can mutate a Kubernetes cluster, so its default posture must be conservative.
Required practices:
- run in Kubernetes with a dedicated ServiceAccount
- prefer namespace-scoped
RoleandRoleBinding - restrict allowed namespaces and resource kinds
- restrict template repository hosts or prefixes in production
- restrict image registries in production
- never log raw rendered Secret manifests
- never log raw Kubernetes patch/apply payloads containing Secret data
- return redacted summaries and diffs
Secret values in rendered manifests are redacted before being included in responses or diffs. Kubernetes Secret values are base64 encoded, not encrypted, so they must be treated as plaintext for logging purposes.
Response Model
Responses include enough detail for callers to understand what happened without exposing secrets.
Important fields:
requestIdactionstatusdeployerIdclusterIdnamespacemanifestHashtemplateCommitSharesourcesdiffeventserror
Resource summaries contain kind, namespace, name, apiVersion, and action. Full rendered manifests should not be returned or persisted by default.
Event Model
Long-running operations return quickly and continue in the background.
Clients can subscribe to:
GET /events?request_id=...
Events contain:
- request ID
- timestamp
- status
- message
- optional resource identity
The event stream is currently direct SSE. Controller-mediated mode can forward the same event shape later.
Installation
The app includes Kubernetes install manifests under apps/light-deployer/k8s:
- namespace
- RBAC
- deployment
- service
The deployment runs the container with LIGHT_DEPLOYER_KUBE_MODE=real. The
image contains /app/config, and server.yml defaults the HTTP port to 7088.
For MicroK8s testing:
./apps/light-deployer/build.sh latest
docker save networknt/light-deployer:latest | microk8s ctr image import -
microk8s kubectl apply -f apps/light-deployer/k8s/namespace.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/rbac.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/deployment.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/service.yaml
Current Limitations
- Direct HTTP/MCP-style mode is implemented first; controller-mediated WebSocket routing is a later integration step.
- Inline values are implemented; config-server
valuesReffetching is still a future integration point. - Rollback is represented in the model but needs portal snapshot integration.
- Helm and Kustomize are not implemented yet.
- Typed placeholders are not implemented yet.
- Rollout watch depth is intentionally basic in the first phase.
Design Direction
Keep light-deployer small and cluster-local.
The deployer should execute precise deployment commands, enforce local safety policy, and report structured results. It should not grow into a portal, workflow engine, or deployment database. That separation keeps the service easy to install inside customer clusters and reduces the security blast radius.
Asymmetric Decryptor
asymmetric-decryptor decrypts RSA encrypted configuration values.
It is used by config-loader when a service loads encrypted values that use
the CRYPT:RSA: prefix. The crate supports RSA private keys in PKCS#8 and
PKCS#1 PEM formats and decrypts payloads with RSA-OAEP using SHA-256.
Main Types
AsymmetricDecryptor: owns the RSA private key and decrypts supported payloads.AsymmetricError: error type for prefix, base64, key, and decrypt failures.CRYPT_RSA_PREFIX: the requiredCRYPT:RSA:payload prefix.
Usage
#![allow(unused)] fn main() { use asymmetric_decryptor::AsymmetricDecryptor; let decryptor = AsymmetricDecryptor::from_pem(private_key_pem)?; let plaintext = decryptor.decrypt("CRYPT:RSA:...")?; }
Notes
This crate is intentionally small. It does not fetch keys, rotate keys, or
perform configuration merging. Those concerns belong to config-loader and the
runtime layer.
Config Loader
config-loader loads, merges, resolves, and decrypts service configuration.
It provides the common configuration behavior used by fabric services and runtime modules. Configuration can be loaded from YAML, JSON, or TOML files, merged across layers, expanded from values maps, and decrypted when encrypted values are present.
Main Types
ConfigLoader: loads files and resolves${key:default}style values.ConfigManager<T>: stores hot-swappable typed configuration behind an atomic reference.ConfigError: shared error type for IO, parse, decrypt, and conversion failures.
Resolution Model
The loader supports:
- merging multiple config files in order
- external overlays through
LIGHT_RS_CONFIG_DIR - whole-value variable replacement
- embedded variable expansion inside strings
- typed deserialization through Serde
- symmetric encrypted values through
symmetric-decryptor - asymmetric encrypted values through
asymmetric-decryptor
Usage
#![allow(unused)] fn main() { use config_loader::ConfigLoader; use std::collections::HashMap; let loader = ConfigLoader::from_values(HashMap::new(), None, None)?; let config: MyConfig = loader.load_typed(["config/my-service.yml"])?; }
Consumers
light-runtime uses this crate for service bootstrap and runtime config.
Application crates can also use it for app-specific policy or domain config.
Hindsight Client
hindsight-client provides a small client abstraction for persistent agent
memory.
It stores and recalls memory units from PostgreSQL. The current implementation
uses sqlx and pgvector for vector similarity search.
Main Types
HindsightMemory: trait used by applications that need memory retention and recall without coupling to a specific database implementation.PgHindsightClient: PostgreSQL-backed implementation ofHindsightMemory.MemoryUnit: returned memory record with content, type, metadata, and bank identity.
Usage
#![allow(unused)] fn main() { use hindsight_client::{HindsightMemory, PgHindsightClient}; let memory = PgHindsightClient::new(pool); let unit_id = memory .retain(host_id, bank_id, "User prefers concise answers", "fact", None, metadata) .await?; }
Data Model
The PostgreSQL implementation writes to agent_memory_unit_t and uses
host_id plus bank_id to isolate memory between tenants, users, or sessions.
Consumers
light-agent uses this crate to persist and recall agent conversation memory.
Light Rule
light-rule is the Rust rule engine for evaluating rule definitions and
executing registered actions.
It is designed to align with the rule.yaml specification while remaining
runtime-neutral. Java services can use yaml-rule; Rust services use this
crate.
Main Types
RuleEngine: evaluates rule conditions and determines action execution.MultiThreadRuleExecutor: executes rules with runtime state.RuntimeState: input/output state passed through rule evaluation.ActionRegistry: registry for action plugins.RuleActionPlugin: trait implemented by Rust action handlers.Rule,RuleCondition,RuleAction,RuleConfig,EndpointConfig: rule model types.
Action Model
Rules reference actions by actionRef. In Rust, actionRef resolves to a
registered RuleActionPlugin; it is not a Java class name. This keeps the rule
format portable across Java and Rust executors.
Usage
#![allow(unused)] fn main() { use light_rule::{ActionRegistry, RuleEngine}; let registry = ActionRegistry::default(); let engine = RuleEngine::new(registry); }
Related Design
See Light-Rule for the rule format and its relationship to workflow assertions and portal rule management.
Light Runtime
light-runtime is the shared service runtime for Light Fabric applications.
It owns bootstrap, configuration loading, transport startup, graceful shutdown,
and optional portal registry registration. Apps such as light-agent and
light-deployer should start through this crate instead of binding sockets
directly.
Main Types
LightRuntimeBuilder: builds a runtime from a transport.LightRuntime: configured runtime before start.RunningRuntime: running service handle with shutdown support.Module: lifecycle hook abstraction.RuntimeConfig: resolved runtime configuration.ServerConfig: HTTP/HTTPS bind and service identity settings.BootstrapConfig: remote config bootstrap settings.PortalRegistryConfig: portal registry connection settings.
Startup Pattern
#![allow(unused)] fn main() { use light_axum::AxumTransport; use light_runtime::LightRuntimeBuilder; let runtime = LightRuntimeBuilder::new(AxumTransport::new(app)) .with_config_dir("config") .build(); let running = runtime.start().await?; running.shutdown().await?; }
Configuration
At minimum, runtime services need server.yml. Optional files include
startup.yml, client.yml, and portal-registry.yml.
Related Frameworks
light-runtime is transport-neutral. light-axum supplies the Axum transport
implementation.
MCP Client
mcp-client is a client for calling MCP-compatible gateway endpoints.
It provides a small API for listing and invoking tools through a configured MCP gateway path. It is intentionally focused on the client side; MCP server implementations live in applications or framework layers.
Main Types
McpGatewayClient: gateway client used by applications.McpTool: tool metadata returned by the gateway.McpContent: content item returned by MCP tool calls.McpToolCallResult: structured result for a tool invocation.
Usage
#![allow(unused)] fn main() { use mcp_client::McpGatewayClient; let client = McpGatewayClient::new(gateway_url, path, timeout_ms); let result = client.call_tool("tool.name", arguments).await?; }
Consumers
light-agent uses this crate when an agent session needs to discover or invoke
tools exposed through an MCP gateway.
Model Provider
model-provider defines a common abstraction over LLM providers and implements
multiple provider adapters.
The goal is to let agent and workflow code depend on one Provider trait while
supporting local models, hosted APIs, and provider-specific features.
Main Types
Provider: async trait implemented by model providers.ChatRequest,ChatResponse,ChatMessage: common chat data model.ToolSpec,ToolCall: tool-calling model.ProviderCapabilities: capability metadata.TokenUsage: usage accounting.ReliableProvider: reliability wrapper.RouterProvider: route requests across multiple providers.
Provider Implementations
Current modules include:
- Anthropic
- Azure OpenAI
- Bedrock
- Claude Code
- Codex
- OpenAI-compatible providers
- Copilot
- Gemini
- Gemini CLI
- GLM
- Kilo Code CLI
- Ollama
- OpenAI
- OpenRouter
- Telnyx
Consumers
light-agent uses this crate to send chat requests and tool specs without
hard-coding a single LLM provider.
Portal Registry
portal-registry provides client support for registering services with Light
Portal or Light Controller.
It uses a JSON-RPC style WebSocket protocol for service registration, metadata
updates, and skill search. Runtime services normally use this through
light-runtime, but applications can also use the client directly when they
need custom registry behavior.
Main Types
PortalRegistryClient: WebSocket client for registry communication.RegistryHandler: trait for handling registry callbacks and messages.RegistrationState: client registration state.RegistrationBuilder: helper for constructing registration parameters.ServiceRegistrationParams: service identity and advertised endpoint.ServiceMetadataUpdate: metadata update payload.SkillSearchRequest,SkillSearchResponse: skill discovery messages.
Usage
#![allow(unused)] fn main() { use portal_registry::RegistrationBuilder; let registration = RegistrationBuilder::new( "com.networknt.service-1.0.0", "1.0.0", "http", "127.0.0.1", 8080, ) .with_env("DEV") .with_jwt(token) .build(); }
Runtime Integration
light-runtime can register a service automatically when server.yml enables
registry support and portal-registry.yml supplies the portal connection.
Symmetric Decryptor
symmetric-decryptor decrypts legacy symmetric encrypted configuration values.
It supports payloads with the CRYPT prefix and decrypts AES-256-CBC data with
a key derived from the configured password using PBKDF2-HMAC-SHA256.
Main Types
Decryptor: trait implemented by decryptors.SymmetricDecryptor: password-based decryptor.DecryptError: error type for prefix, format, hex, and cipher failures.CRYPT_PREFIX: requiredCRYPTpayload prefix.
Usage
#![allow(unused)] fn main() { use symmetric_decryptor::{Decryptor, SymmetricDecryptor}; let decryptor = SymmetricDecryptor::new("password"); let plaintext = decryptor.decrypt("CRYPT:...")?; }
Consumers
config-loader uses this crate when it encounters symmetric encrypted values
and a config password is available.
Workflow Builder
workflow-builder provides fluent builders for creating Agentic Workflow
definitions programmatically.
It depends on workflow-core for the actual model types and layers a builder
API on top so applications and tests can construct valid workflows without
manually assembling nested maps.
Main Areas
- workflow metadata construction
- authentication definitions
- task definitions
- nested
do,for,fork,try, and other task structures - YAML/JSON serialization through
workflow-coremodel types
Usage
#![allow(unused)] fn main() { use workflow_builder::services::workflow::WorkflowBuilder; let workflow = WorkflowBuilder::new() .use_dsl("1.0.0") .with_namespace("lightapi") .with_name("example") .with_version("1.0.0") .build(); }
Relationship To Workflow Core
Use workflow-core when you need direct access to the schema model. Use
workflow-builder when you want an ergonomic construction API.
Workflow Core
workflow-core contains the Rust model for the Agentic Workflow DSL.
The crate is schema-oriented: its structs and enums represent workflow documents, tasks, authentication blocks, durations, timeouts, errors, and supporting map types.
Main Areas
- workflow document metadata
- task definitions
- call task protocol definitions
- ask and assert task definitions
- duration and timeout models
- error definitions
- ordered map support for workflow task lists
Usage
#![allow(unused)] fn main() { use workflow_core::models::workflow::{ WorkflowDefinition, WorkflowDefinitionMetadata, }; let document = WorkflowDefinitionMetadata::new( "lightapi", "example", "1.0.0", Some("Example".to_string()), None, None, None, ); let workflow = WorkflowDefinition::new(document); }
Consumers
workflow-builder builds on this crate. light-workflow and workflow-related
services use the model for loading, validating, and executing workflow
documents.
Light-Axum
light-axum adapts Axum applications to light-runtime.
Applications implement AxumApp and return an axum::Router. The framework
owns binding, optional TLS, runtime metadata resolution, and graceful shutdown
through the runtime transport contract.
Main Types
AxumApp: trait implemented by an application.AxumTransport: transport passed toLightRuntimeBuilder.ServerContext: runtime context passed into the app when building routes.AxumBoundHandle: running Axum server handle.
Pattern
#![allow(unused)] fn main() { use light_axum::{AxumApp, AxumTransport, ServerContext}; use light_runtime::LightRuntimeBuilder; #[derive(Clone)] struct App; impl AxumApp for App { fn router(&self, _context: ServerContext) -> axum::Router { axum::Router::new() } } let runtime = LightRuntimeBuilder::new(AxumTransport::new(App)) .with_config_dir("config") .build(); }
Consumers
light-agent and light-deployer use this framework.
Light-Pingora
light-pingora adapts Pingora proxy services to light-runtime.
It is the framework layer for high-performance gateway and proxy products. The crate keeps runtime concerns such as configuration and service lifecycle separate from Pingora-specific proxy behavior.
Role
- bridge Pingora services into the common runtime lifecycle
- expose transport metadata to
light-runtime - support gateway products without duplicating bootstrap code
Consumers
light-gateway uses this framework.
Light-Agent
light-agent is the interactive agent service in Light Fabric.
It provides a WebSocket chat interface, integrates with model providers,
invokes MCP tools through mcp-client, and stores conversation memory through
hindsight-client.
Key Dependencies
light-runtimelight-axummodel-providermcp-clienthindsight-clientportal-registry
Runtime
The app follows the standard runtime pattern:
- load config from
config/ - implement an Axum app
- start through
LightRuntimeBuilder - optionally register through portal registry
Light-Deployer
light-deployer is the cluster-local Kubernetes deployment executor for Light
Portal.
It renders Kubernetes templates, validates manifests, applies resources through
kube-rs, reports rollout status, and exposes deployment tools through an MCP
JSON-RPC endpoint for local and MicroK8s testing.
Key Capabilities
- MCP JSON-RPC endpoint at
POST /mcp - AST-based YAML template rendering
- Git template fetching with
gix - Kubernetes dry-run, apply, delete, status, and prune
- redacted manifest summaries and diffs
- SSE deployment events
Runtime
light-deployer uses light-runtime, light-axum, config-loader, and
portal-registry so it follows the same service boot model as light-agent.
Testing Path
Use these pages in order when testing locally:
Start with standalone noop mode to validate template rendering. Then move to
MicroK8s real mode once the render request and target templates are correct.
For MCP clients, Light Portal, and AI agents, use POST /mcp with JSON-RPC
methods such as tools/list and tools/call. The /mcp/tools/* routes are
kept only as local debugging conveniences.
Build Local
This page builds the light-deployer binary and container image from the
Light Fabric workspace.
Run all commands from the repository root:
cd ~/workspace/light-fabric
Rust Build
Use cargo check first for a quick compile validation:
cargo check -p light-deployer
Run the deployer tests:
cargo test -p light-deployer
Build a debug binary:
cargo build -p light-deployer
Build a release binary:
cargo build --release -p light-deployer
The release binary is written to:
target/release/light-deployer
Docker Image
Build the local image:
./apps/light-deployer/build.sh latest
The default image name is:
networknt/light-deployer:latest
To override the image name:
IMAGE=localhost:32000/light-deployer:latest ./apps/light-deployer/build.sh latest
Verify the image exists:
docker image inspect networknt/light-deployer:latest
What The Image Contains
The Dockerfile copies:
/usr/local/bin/light-deployer/app/config
The container runs from /app, so the default runtime config directory is:
/app/config
The default HTTP port is 7088, configured in:
apps/light-deployer/config/server.yml
Expected Result
Before moving on, these commands should pass:
cargo check -p light-deployer
cargo test -p light-deployer
./apps/light-deployer/build.sh latest
docker image inspect networknt/light-deployer:latest
Prepare Config
light-deployer uses two kinds of configuration:
- runtime config loaded by
light-runtime - deployment request data sent through MCP
tools/callatPOST /mcp
Runtime Config Files
Default config lives in:
apps/light-deployer/config
Files:
server.yml: HTTP/HTTPS bind settings and service identitydeployer.yml: local deployer policyportal-registry.yml: future portal/controller registry settings
When running from the workspace root, the deployer automatically uses:
apps/light-deployer/config
When running inside the Docker image, it uses:
/app/config
Override the config directory with:
LIGHT_DEPLOYER_CONFIG_DIR=/path/to/config
Server Config
The default server config listens on HTTP port 7088:
ip: ${server.ip:0.0.0.0}
httpPort: ${server.httpPort:7088}
enableHttp: ${server.enableHttp:true}
enableHttps: ${server.enableHttps:false}
serviceId: ${server.serviceId:com.networknt.light-deployer-0.1.0}
enableRegistry: ${server.enableRegistry:false}
To change the port without editing the file, provide values through the normal runtime values mechanism, or use a copied config directory for local testing.
Deployer Policy
The default policy is permissive enough for local testing:
deployerId: ${deployer.deployerId:local-light-deployer}
clusterId: ${deployer.clusterId:local}
allowedNamespaces: []
allowedRepoHosts: []
allowedRepoPrefixes: []
allowedImageRegistries: []
devInsecure: ${deployer.devInsecure:false}
Empty allow lists mean the policy does not restrict that dimension. For production, configure explicit values.
Example tighter policy:
deployerId: petstore-microk8s
clusterId: microk8s-local
allowedNamespaces:
- petstore-dev
allowedRepoHosts:
- github.com
allowedRepoPrefixes:
- https://github.com/networknt/
allowedImageRegistries:
- networknt
devInsecure: false
prune:
enabled: true
maxDeletePercent: 30
sensitiveKinds:
- PersistentVolumeClaim
overrideRequired: true
Git Access
Public repositories do not need credentials.
For private HTTPS repositories, set:
LIGHT_DEPLOYER_GIT_TOKEN=...
Defaults:
- GitHub username:
x-access-token - Bitbucket Cloud username:
x-token-auth
For Bitbucket app passwords or other Git servers:
LIGHT_DEPLOYER_GIT_USERNAME=my-user
LIGHT_DEPLOYER_GIT_TOKEN=my-token-or-app-password
Only HTTPS token auth is supported in Phase 1. SSH auth is deferred.
Template Repository Requirements
The target application repository should contain a k8s/ directory with YAML
templates. The deployer reads all .yaml and .yml files under the requested
template path.
Example template reference:
{
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
}
}
For local testing without Git clone, set:
LIGHT_DEPLOYER_TEMPLATE_BASE_DIR=/home/steve/workspace/openapi-petstore
Then use:
{
"template": {
"repoUrl": "local",
"ref": "master",
"path": "k8s"
}
}
Request Values
The request values object supplies placeholder values for templates.
Example for openapi-petstore:
{
"name": "openapi-petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest",
"pullPolicy": "IfNotPresent"
},
"service": {
"name": "openapi-petstore",
"type": "ClusterIP"
},
"resources": {
"requests": {
"memory": "64Mi",
"cpu": "250m"
},
"limits": {
"memory": "256Mi",
"cpu": "500m"
}
}
}
The current renderer replaces placeholders inside YAML string scalar values. Avoid placeholders in Kubernetes fields that must be numeric unless the template keeps those fields as fixed numbers.
Run Standalone
Standalone mode is the fastest way to test light-deployer before using a
real Kubernetes cluster.
Use noop mode first. It validates config, HTTP endpoints, template loading,
rendering, resource summaries, and response shape without mutating Kubernetes.
Run all commands from:
cd /home/steve/workspace/light-fabric
Start With Built-In Sample
Start the deployer with the sample template directory:
LIGHT_DEPLOYER_TEMPLATE_BASE_DIR=apps/light-deployer/examples/petstore \
LIGHT_DEPLOYER_KUBE_MODE=noop \
cargo run -p light-deployer
The service listens on:
http://127.0.0.1:7088
Check health from another terminal:
curl -fsSL http://127.0.0.1:7088/health
Expected output:
ok
List Tools With MCP JSON-RPC
The MCP endpoint is JSON-RPC 2.0 over HTTP at:
POST /mcp
List all deployment tools:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "tools-list-1",
"method": "tools/list",
"params": {}
}'
Call a tool through MCP:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "render-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
For local debugging, the deployer also exposes REST-style convenience endpoints:
curl -fsSL http://127.0.0.1:7088/mcp/tools/list
curl -fsSL http://127.0.0.1:7088/mcp/tools
curl -fsSL http://127.0.0.1:7088/mcp/tools/deployment.render
Use POST /mcp for MCP clients and AI agents.
Render The Built-In Sample
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "render-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
Expected response shape:
{
"jsonrpc": "2.0",
"result": {
"isError": false,
"structuredContent": {
"action": "render",
"status": "rendered",
"deployerId": "local-light-deployer",
"clusterId": "local",
"resources": [
{
"kind": "Deployment",
"name": "petstore"
},
{
"kind": "Service",
"name": "petstore"
}
]
}
}
}
The exact requestId and manifestHash will differ.
Render openapi-petstore Locally
If /home/steve/workspace/openapi-petstore is available and has a k8s/
folder, run:
LIGHT_DEPLOYER_TEMPLATE_BASE_DIR=/home/steve/workspace/openapi-petstore \
LIGHT_DEPLOYER_KUBE_MODE=noop \
cargo run -p light-deployer
Render request:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "render-openapi-petstore-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "openapi-petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "petstore-dev",
"values": {
"name": "openapi-petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest",
"pullPolicy": "IfNotPresent"
},
"service": {
"name": "openapi-petstore",
"type": "ClusterIP"
},
"resources": {
"requests": {
"memory": "64Mi",
"cpu": "250m"
},
"limits": {
"memory": "256Mi",
"cpu": "500m"
}
}
},
"template": {
"repoUrl": "local",
"ref": "master",
"path": "k8s"
}
}
}
}'
Expected resources:
Deployment/openapi-petstoreService/openapi-petstore
Test Git Fetch
Stop the local-template run and restart without LIGHT_DEPLOYER_TEMPLATE_BASE_DIR:
LIGHT_DEPLOYER_KUBE_MODE=noop \
cargo run -p light-deployer
Render from GitHub:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "render-git-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "openapi-petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "petstore-dev",
"values": {
"name": "openapi-petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest"
}
},
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
}
}
}
}'
For a private repository:
LIGHT_DEPLOYER_GIT_TOKEN=... \
LIGHT_DEPLOYER_KUBE_MODE=noop \
cargo run -p light-deployer
For Bitbucket app-password style auth:
LIGHT_DEPLOYER_GIT_USERNAME=my-user \
LIGHT_DEPLOYER_GIT_TOKEN=my-app-password \
LIGHT_DEPLOYER_KUBE_MODE=noop \
cargo run -p light-deployer
Dry Run And Diff In Noop Mode
Noop mode can also exercise the request path for these tools:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "dry-run-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.dryRun",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "diff-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.diff",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
These calls do not validate against Kubernetes unless real mode is enabled.
Stop The Service
Press Ctrl-C in the terminal running cargo run.
Run Kubernetes
This page runs light-deployer inside MicroK8s and uses the in-cluster
ServiceAccount with kube-rs.
Prerequisites
MicroK8s should be running and microk8s kubectl should work:
microk8s status --wait-ready
microk8s kubectl get nodes
Build the image first:
cd /home/steve/workspace/light-fabric
./apps/light-deployer/build.sh latest
Import Image Into MicroK8s
docker save networknt/light-deployer:latest | microk8s ctr image import -
If your MicroK8s install requires elevated permissions:
docker save networknt/light-deployer:latest | sudo microk8s ctr image import -
Verify the image is available:
microk8s ctr images ls | grep light-deployer
Install Deployer
Apply the included manifests:
microk8s kubectl apply -f apps/light-deployer/k8s/namespace.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/rbac.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/deployment.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/service.yaml
Wait for the pod:
microk8s kubectl -n light-deployer rollout status deploy/light-deployer
microk8s kubectl -n light-deployer get pods
Check logs:
microk8s kubectl -n light-deployer logs deploy/light-deployer
The deployment sets:
LIGHT_DEPLOYER_KUBE_MODE=real
So the service uses real Kubernetes API calls from inside the cluster.
Port Forward
microk8s kubectl -n light-deployer port-forward svc/light-deployer 7088:7088
In another terminal:
curl -fsSL http://127.0.0.1:7088/health
Expected:
ok
List Tools
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "tools-list-1",
"method": "tools/list",
"params": {}
}'
The response contains the deployer's tool names, descriptions, input schemas, and invocation metadata. Light Portal can use this JSON-RPC response to populate MCP tools for the API details view.
Render In Kubernetes
Rendering does not mutate the cluster:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "render-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
Dry Run In Kubernetes
Dry-run renders the manifest and asks the Kubernetes API to validate it without persisting resources:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "dry-run-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.dryRun",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
Expected status:
{
"jsonrpc": "2.0",
"result": {
"isError": false,
"structuredContent": {
"status": "validated"
}
}
}
Deploy Sample
The sample request deploys into the light-deployer namespace so it matches
the included namespace-scoped RBAC.
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "apply-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.apply",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "petstore",
"replicas": 1,
"image": {
"repository": "nginx",
"tag": "1.27"
},
"containerPort": 80,
"service": {
"port": 80
}
},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
The response should return quickly with an accepted/applying-style status. The operation continues in the deployer.
Watch Kubernetes resources:
microk8s kubectl -n light-deployer get deploy,svc,pods
Stream Events
Use the requestId from the deployment response:
curl -N "http://127.0.0.1:7088/events?request_id=<requestId>"
The event stream reports deployment progress and failures for that request.
Check Status
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "status-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.status",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
Undeploy Sample
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "delete-sample-1",
"method": "tools/call",
"params": {
"name": "deployment.delete",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}'
Then verify resources:
microk8s kubectl -n light-deployer get deploy,svc,pods
Deploy openapi-petstore From Git
After the openapi-petstore repository has a k8s/ folder committed, use a
request like this:
curl -fsSL http://127.0.0.1:7088/mcp \
-H 'content-type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "apply-openapi-petstore-1",
"method": "tools/call",
"params": {
"name": "deployment.apply",
"arguments": {
"hostId": "local-host",
"instanceId": "openapi-petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "light-deployer",
"values": {
"name": "openapi-petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest",
"pullPolicy": "IfNotPresent"
},
"service": {
"name": "openapi-petstore",
"type": "ClusterIP"
}
},
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
}
}
}
}'
For private Git access, set LIGHT_DEPLOYER_GIT_TOKEN on the deployer pod.
In Kubernetes this should be injected from a Secret, not written directly into
the deployment manifest.
Update The Deployer Image
After rebuilding locally:
./apps/light-deployer/build.sh latest
docker save networknt/light-deployer:latest | microk8s ctr image import -
microk8s kubectl -n light-deployer rollout restart deploy/light-deployer
microk8s kubectl -n light-deployer rollout status deploy/light-deployer
Remove The Deployer
microk8s kubectl delete -f apps/light-deployer/k8s/service.yaml
microk8s kubectl delete -f apps/light-deployer/k8s/deployment.yaml
microk8s kubectl delete -f apps/light-deployer/k8s/rbac.yaml
microk8s kubectl delete -f apps/light-deployer/k8s/namespace.yaml
Light-Gateway
light-gateway is the Pingora-based gateway product in Light Fabric.
It is intended to host gateway behavior such as routing, proxying, and eventually AI/MCP gateway integrations while using the shared runtime and config model.
Key Dependencies
light-runtimelight-pingoraconfig-loader
Runtime
The gateway uses light-pingora as its transport framework and
light-runtime for lifecycle, bootstrap, and service configuration.
Light-Workflow
light-workflow is the workflow execution service for Agentic Workflow
documents.
It loads workflow definitions, executes workflow tasks, integrates with
light-rule for rule-backed checks, and exposes workflow execution APIs.
Key Dependencies
workflow-corelight-ruleaxumsqlxreqwest
Role
light-workflow is the runtime service that turns workflow specifications into
long-running execution state. It is used by agentic flows, human-in-the-loop
orchestration, and integration-test style automation.
Comparison: Light-Fabric vs. AgentGateway
This document provides a high-level comparison between Light-Fabric and AgentGateway to help architects and engineering leaders choose the right foundation for their agentic workflows.
Overview
While both systems aim to facilitate interaction with Large Language Models (LLMs), they operate at different layers of the AI stack and prioritize different architectural outcomes.
| Feature | Light-Fabric | AgentGateway |
|---|---|---|
| Primary Philosophy | Agentic Fabric: Unified Governance & Lifecycle | Agentic Gateway: High-performance Proxy |
| Core Architecture | Integrated Platform (Layer) | Standalone Gateway (Service) |
| Target User | Central IT / Platform Engineering | Application Developers / DevOps |
| Lifecycle Management | APIs, Agents, MCPs, and Gateways | Primarily LLM Request Routing |
| Language | Native Rust (Extreme Performance) | Rust / Go (Variable) |
1. Governance vs. Connectivity
Light-Fabric (Governance)
Light-Fabric is designed as a Single Control Plane. It assumes that in an enterprise environment, "freedom without governance is chaos." It provides:
- Centralized Registry: Every agent, skill, and tool is registered and governed via the
light-portal. - Fine-Grained Authorization: Deep policy enforcement at the endpoint level, including row and column-level data masking.
- Auditability: A unified audit trail for all agentic interactions across the entire organization.
AgentGateway (Connectivity)
AgentGateway typically focuses on the North-South traffic between an application and multiple LLM providers. Its primary strength is:
- Simplified Routing: Getting a request from Point A to Point B with retries and failover.
- Provider Abstraction: Normalizing different LLM APIs into a single interface.
2. Integrated Intelligence: Hindsight
One of the defining differences of the Light-Fabric is the deep integration of Hindsight Memory.
- Light-Fabric: Memory is not an "add-on." The platform provides native biomimetic memory banks (World Facts, Experiences, Mental Models) that are automatically managed and scoped (Global, Shared, Private) as part of the fabric.
- AgentGateway: Typically treats memory as external state. The application or a separate vector database must manage context before sending the request through the gateway.
3. Skill & Tool Management
Centralized Skills (Fabric)
In Light-Fabric, skills (tools) are first-class citizens. They are registered, versioned, and governed centrally. An agent doesn't just "have" a tool; the Fabric grants the agent access to a skill based on its role and the current context.
Standard Tooling (Gateway)
AgentGateway generally passes tool definitions through to the provider. The management of who can use which tool and how those tools are secured is usually left to the application logic.
4. Orchestration: Hybrid Agentic Workflows
Light-Fabric (Integrated Orchestrator)
Light-Fabric treats orchestration as a foundational service. It implements a Hybrid Model:
- Deterministic Process: The overall business logic (e.g., insurance claim steps) is fixed and compliant.
- Autonomous Tasks: Individual steps within the process are delegated to agents.
- Statefulness: The Fabric manages long-running state across days or weeks, ensuring durability.
AgentGateway (Stateless Proxy)
AgentGateway is primarily a stateless component.
- External Orchestration: The workflow logic must reside in your application code or an external engine (like Temporal).
- Proxy Only: It handles the communication but does not "understand" or manage the multi-step business process itself.
5. Security: The Rule Engine
Light-Fabric (Integrated Governance)
Light-Fabric includes an integrated YAML-based Rule Engine (light-rule) designed for fine-grained authorization:
- Data Filtering: Automatically masks or filters response data (column/row level) based on policies.
- Policy Enforcement: Checks permissions before an agent executes a tool or accesses a memory unit.
- Hot-Reloading: Security rules can be updated in real-time without redeploying the platform.
AgentGateway (Basic Middleware)
AgentGateway typically provides basic security features like API key validation or rate limiting.
- Limited Filtering: While it can intercept traffic, implementing complex, context-aware data masking usually requires writing custom middleware or handling it at the application level.
6. MCP Support: Gateway vs. Ecosystem
Light-Fabric (Integrated Tooling)
Light-Fabric treats Model Context Protocol (MCP) as a primary source for agent tools.
- Direct Integration: Agents use the
mcp-clientto directly consume tools from MCP servers. - Registry Management: MCP servers are registered in the
light-portal, allowing for centralized discovery and governance. - Unified Security: The same Fine-Grained Authorization rules apply to MCP tools as they do to native Rust tools.
AgentGateway (Specialized MCP Proxy)
AgentGateway provides a highly specialized MCP Gateway layer.
- Protocol Translation: It excels at translating between different MCP transports (SSE, Streamable HTTP, etc.).
- Exposing Servers: Its primary role is to make MCP servers accessible to external applications through a normalized gateway interface.
- Advanced Networking: Includes features like stream merging and specialized MCP routing.
For a deep dive into the technical differences, see our Detailed MCP Feature Comparison.
Summary: Which to Choose?
Choose Light-Fabric if:
- You are building an Enterprise AI Strategy that requires unified governance, stateful workflows, and integrated security.
- You need to manage the entire lifecycle of agents and the business processes they participate in.
- You require advanced data privacy (masking) and long-term memory (Hindsight) as native platform features.
Choose AgentGateway if:
- You need a lightweight proxy to handle LLM provider failover and basic request normalization.
- You prefer to manage agent logic, workflows, memory, and security entirely within your external application stack.
- You are looking for a simple tool to solve immediate connectivity needs without implementing a comprehensive platform layer.
Detailed Comparison: MCP Gateway Features
This document provides a technical deep dive into the Model Context Protocol (MCP) implementations in Light-Fabric and AgentGateway.
Feature Matrix
| Feature | Light-Fabric | AgentGateway |
|---|---|---|
| Primary Role | Provider/Gateway/Portal: Exposes MCP/API Servers. | Provider/Gateway: Exposes MCP servers. |
| Onboarding | Auto-Discovery: Automatic tools/list sync. | Manual: K8s CRD/Manifest configuration. |
| Data Privacy | Deep: Row/Column level masking. | Basic: Allow/Deny access control. |
| Transports | SSE, Streamable HTTP, WebSocket | SSE, Streamable HTTP, WebSocket |
| Legacy Integration | Native: REST/RPC to MCP transformation. | External: Manual wrappers required. |
| Authorization | Managed: Roles, Groups, Positions, Attributes. | Infrastructure: CEL-based policies. |
| Hot-Reloading | Native: Integrated Control Plane & Registry. | Infrastructure: Istio/xDS sync. |
| Authentication | JWT (End-to-End Propagation) | JWT, Keycloak, OIDC, Passthrough |
| Observability | Distributed Tracing (OTEL) and Integrated Hindsight Memory | Distributed Tracing (OTEL) |
1. Architectural Intent
AgentGateway: The Network Proxy Layer
AgentGateway is designed as a high-availability proxy for MCP servers. Its primary focus is the North-South traffic between an application and multiple MCP backends.
- Multiplexing: Optimized for merging multiple MCP backends into a single upstream connection (
mergestream.rs). - Protocol Translation: Excels at translating between SSE, Streamable HTTP, and WebSocket transports.
- Infrastructure Focus: Operates as a Kubernetes-native component managed via manifests and standard networking policies.
Light-Fabric: The Managed Enterprise Platform
Light-Fabric provides a Unified Governance Fabric that treats AI agents and MCP tools as part of the broader enterprise API ecosystem.
- Unified Gateway: The AI Gateway (Rust/Pingora-based) serves as a single entry point for UI, Agents, and Tools, supporting both MCP and traditional REST/RPC APIs.
- Centralized Portal: Uses the Light-Portal as a control plane for onboarding (auto-discovery), configuration (hot-reloading), and security management.
- Governed Intelligence: Integrates the gateway directly with Hindsight Memory and the Fine-Grained Rule Engine, ensuring that every tool call is governed by corporate compliance rules (e.g., row/column masking).
- End-to-End Security: Maintains a single JWT-based identity from the user's chat interface all the way to the underlying MCP or API endpoint.
2. Security & Authorization
AgentGateway: Infrastructure-Aware RBAC
AgentGateway uses Common Expression Language (CEL) for its authorization policies.
- Capabilities: High-speed, network-level blocking based on JWT claims and request headers.
- Limitation: Lacks native support for content-aware data masking or organizational hierarchy logic.
Light-Fabric: Content-Aware Managed Auth
Light-Fabric provides a mature Fine-Grained Authorization layer:
- Managed ABAC/PBAC: Supports Role, Group, Corporate Position (Hierarchy), and Attribute-based protection.
- Data Privacy: Supports native Row and Column filtering (data masking), ensuring agents only see data they are authorized to process.
- End-to-End JWT: The same JWT token is propagated from the UI through the Agent to the AI Gateway and MCP tool.
3. Lifecycle & Tool Onboarding
AgentGateway: Configuration-Driven
Onboarding tools in AgentGateway is an infrastructure task:
- Manual Mapping: Requires defining Kubernetes Custom Resources (
HTTPRoute,Backend) to map MCP servers to the gateway. - Scope: Primarily focused on exposing existing MCP servers.
Light-Fabric: Registry-Driven
Light-Fabric provides a "Zero-Effort" onboarding experience via Light-Portal:
- Auto-Discovery: Registering an MCP API triggers an automatic
tools/listcall to populate the registry. - Protocol Transformation: Automatically transforms existing OpenAPI/REST and RPC services into MCP tools without requiring wrappers.
- Centralized Governance: All tools (Native, REST, MCP) are managed in a single unified registry.
4. Control Plane & Configuration
AgentGateway: Kubernetes-Native
- Orchestration: Managed via the Istio/xDS control plane.
- Updates: Configuration changes are applied via Kubernetes manifests (YAML).
Light-Fabric: Portal-Managed
- Hot-Reloading: Uses a dedicated Config Server and Control Plane to update gateway and agent configurations in real-time without restarts.
- Enterprise Management: Business-centric UI for managing tool visibility, agent permissions, and security policies.
5. Conclusion
- Use AgentGateway if you are an infrastructure provider who needs to expose MCP-based tools to multiple external applications securely and reliably.
- Use Light-Fabric if you are building intelligent agents that need to use those tools to solve complex business problems within a governed framework.
Why Light-Fabric Already Covers the MCP Gateway — No Second Gateway Required
This document addresses a recommendation (produced by Grok AI) suggesting that an enterprise should deploy the open-source AgentGateway as a dedicated MCP layer alongside an existing API platform. After performing a side-by-side source code analysis of both projects (see vs-agentgateway.md and vs-agent-gateway-mcp.md), we present the findings below.
1. The Recommendation Was Generated Without Knowledge of Light-Fabric
The Grok-produced analysis operates under a critical blind spot: it has no knowledge of Light-Fabric (Rust-based, open-sourced to customers) or its capabilities. The recommendation frames the choice as "keep your existing REST platform + add AgentGateway for MCP," because Grok only knows about publicly documented open-source projects. It does not account for the fact that:
- Light-Fabric is already in production and serving agentic workloads today.
- Every feature listed in the recommendation — MCP federation, tool discovery, protocol translation, security, and observability — has already been built, demonstrated, and validated with the project team.
- The comparison is therefore not between "a REST framework" and "an MCP gateway." It is between two systems that both provide MCP gateway capabilities, where one (Light-Fabric/Light-Gateway) is already deployed and battle-tested in our environment.
2. Source Code Analysis: Light-Fabric Already Does What AgentGateway Does
We conducted a detailed, code-level comparison of both projects. The full results are documented in our High-Level Comparison and Detailed MCP Feature Comparison. The key findings are summarized below.
2.1 MCP Protocol Support
| Capability | Light-Fabric | AgentGateway |
|---|---|---|
| Transports | SSE, Streamable HTTP, WebSocket | SSE, Streamable HTTP, WebSocket |
| Tool Discovery | Auto-discovery via tools/list sync | Manual K8s CRD configuration |
| Protocol Translation | Native REST/RPC → MCP transformation | Manual wrappers required |
| Stream Handling | Supported | Supported (mergestream) |
Both projects support the same MCP transports. Light-Fabric goes further with automatic tool discovery and native protocol transformation from existing REST/RPC APIs — exactly the "OpenAPI-to-MCP mapping" that the Grok recommendation credits to AgentGateway, except Light-Fabric does it without requiring a separate component.
2.2 Security & Authorization
| Capability | Light-Fabric | AgentGateway |
|---|---|---|
| Authentication | JWT (end-to-end propagation) | JWT, Keycloak, OIDC, Passthrough |
| Authorization | Role, Group, Position, Attribute-based (ABAC/PBAC) | CEL-based policies |
| Data Privacy | Row/Column-level masking | Allow/Deny access control |
| Rule Engine | Integrated YAML-based, hot-reloadable | Basic middleware |
The Grok recommendation highlights "tool-level RBAC" and "MCP-compliant OAuth 2.1" as AgentGateway strengths. Our code analysis shows that Light-Fabric's authorization model is significantly deeper — it supports corporate-hierarchy-aware policies and content-level data masking that AgentGateway simply does not implement.
2.3 Lifecycle & Operations
| Capability | Light-Fabric | AgentGateway |
|---|---|---|
| Onboarding | Portal-driven, auto-discovery | K8s manifest-driven, manual |
| Hot-Reloading | Native (Config Server + Control Plane) | Infrastructure-dependent (Istio/xDS) |
| Observability | OTEL + integrated Hindsight Memory | OTEL + OpenInference |
| Orchestration | Integrated hybrid workflows (deterministic + autonomous) | None (stateless proxy) |
Light-Fabric manages the entire lifecycle — from tool registration through governance to runtime orchestration — while AgentGateway only handles the proxy layer.
3. Two Gateways Is Overkill
The Grok recommendation frames the architecture as a "clean separation of concerns." In practice, deploying both Light-Fabric and AgentGateway creates redundant infrastructure with real costs:
Duplicated Capabilities
Both systems would be performing the same core functions:
- Receiving MCP requests from agents
- Translating tool calls to backend HTTP requests
- Enforcing security policies on tool access
- Providing observability for agentic traffic
Running two gateways that do the same thing is not "separation of concerns" — it is duplication of concerns. Every MCP request would traverse two proxy layers instead of one, adding latency and operational complexity for zero additional capability.
Operational Burden
- Two deployment pipelines to maintain on EKS
- Two sets of security policies to keep in sync
- Two configuration surfaces (K8s CRDs for AgentGateway vs. Portal for Light-Fabric)
- Two failure domains to monitor and troubleshoot
- Two upgrade cycles to coordinate
The "No Code Changes" Claim Is Misleading
The Grok recommendation states AgentGateway requires "no code changes." This is true only if you ignore the work required to:
- Write and maintain Kubernetes Custom Resources for every MCP backend
- Build manual wrappers for non-MCP services (Light-Fabric does this natively)
- Implement application-level logic for everything AgentGateway doesn't cover (stateful workflows, data masking, memory management)
Light-Fabric also requires no code changes to existing backend services — and it provides the governance layer out of the box.
4. Addressing the "Rust Performance" Argument
The recommendation claims AgentGateway has a "performance edge" due to its Rust data plane. This argument does not hold:
- Light-Fabric's AI Gateway currently runs on the high-performance Java-based light-gateway, and a new Rust-based AI Gateway is also under way, built on the Pingora framework (Cloudflare's production proxy engine). Even the existing Java gateway delivers exceptional throughput, and the Rust gateway will remove the JVM from the critical path entirely.
- Both systems benefit from Rust's zero-cost abstractions, memory safety, and lack of garbage collection pauses.
- The performance comparison between the two Rust implementations would be marginal and workload-dependent — not a differentiator.
5. Addressing the "Custom Development" Concern
The recommendation warns against "implementing MCP directly" because it "involves significant custom development." This concern does not apply:
- Light-Fabric's MCP support is not custom development — it is a fully implemented, production-ready feature of the platform.
- The MCP client, gateway routing, tool registry, and security integration are all existing, tested components, not a backlog of work to be done.
- The project team has already seen these features demonstrated end-to-end.
6. Summary
| Concern from Grok Recommendation | Reality |
|---|---|
| "Light4j is a REST framework, not an AI proxy" | Light-Fabric is a full agentic platform with an AI Gateway already in production |
| "AgentGateway provides MCP federation and tool discovery" | Light-Fabric provides the same capabilities with deeper governance |
| "Rust performance advantage over JVM" | Light-Fabric's Java gateway is already very fast, and a Rust (Pingora-based) gateway is coming |
| "Clean separation of concerns" | Two gateways doing the same thing is duplication, not separation |
| "No code changes required" | True for both — but AgentGateway requires extensive K8s manifest management |
| "Custom MCP implementation is risky" | Light-Fabric's MCP support is already built, tested, and in production |
Conclusion
The Grok-generated recommendation is well-structured but fundamentally flawed because it was produced without knowledge of Light-Fabric's capabilities. When evaluated against the actual source code and production state of both systems, the case for adding AgentGateway collapses:
- Light-Fabric already provides every MCP gateway capability that AgentGateway offers.
- Light-Fabric goes significantly further with integrated governance, data privacy, memory, and orchestration.
- Adding a second gateway introduces operational complexity and latency with no net-new capability.
The pragmatic, low-risk path is to continue with the platform that is already built, already in production, and already proven to the team.