Light-Deployer Design
light-deployer is the cluster-local Kubernetes deployment executor in
Light Fabric.
This document focuses only on the deployer service that lives in
apps/light-deployer. The broader Light Portal deployment workflow, approval
flow, deployment history model, controller routing, and portal UI are covered
outside this repository.
Purpose
light-deployer receives a deployment command, fetches Kubernetes templates,
renders them with deployment values, validates the resulting resources, applies
or deletes resources in the target Kubernetes cluster, and returns safe status
details.
It is intentionally narrow. It does not decide whether a user is allowed to deploy an instance, does not own portal deployment history, and does not create tenant business workflows. Those decisions belong to Light Portal, Light Controller, and the workflow engine.
Service Boundary
light-deployer owns:
- local deployment policy enforcement
- template repository fetch
- YAML template rendering
- manifest parsing and resource summary generation
- Kubernetes dry-run, apply, delete, status, and pruning
- safe event and error reporting
- direct local/MicroK8s deployment endpoints
light-deployer does not own:
- tenant authorization
- instance metadata
- deployment approval
- deployment history persistence
- config snapshot creation
- long-running human workflow decisions
The deployer should reject commands outside its local policy even if an upstream service sends them.
Runtime Model
The service follows the same runtime pattern as light-agent.
main.rs builds the domain service and starts it through:
#![allow(unused)] fn main() { LightRuntimeBuilder::new(AxumTransport::new(app)) }
The HTTP listener is owned by light-runtime and light-axum, not by
service-specific socket code. Bind address, HTTP/HTTPS ports, service identity,
and registry settings live in runtime config files.
Default config files:
config/server.ymlconfig/deployer.ymlconfig/portal-registry.yml
Local cargo run resolves config from apps/light-deployer/config when run
from the workspace root. The container image runs from /app and uses
/app/config.
Public Endpoints
Phase 1 exposes a direct HTTP surface for local and MicroK8s testing:
GET /health
GET /ready
POST /mcp
GET /mcp/tools
GET /mcp/tools/list
GET /mcp/tools/{tool}
POST /deployments
POST /mcp/tools/{tool}
GET /events?request_id=...
POST /mcp is the MCP JSON-RPC 2.0 endpoint. It supports tools/list,
tools/call, and a minimal initialize response. This is the endpoint that
MCP clients, Light Portal, and AI agents should use.
/deployments accepts the canonical deployment request directly.
/mcp/tools/{tool} maps tool names onto the same internal service functions as
a REST-style local debugging convenience. The convenience tool-list endpoints
return metadata with name, description, inputSchema, endpoint, and
method, but they are not the MCP protocol endpoint.
Supported tool names:
deployment.renderdeployment.dryRundeployment.diffdeployment.applydeployment.deletedeployment.statusdeployment.rollback
The direct HTTP mode is useful for development and managed environments. The same internal command handling should later be reused by controller-mediated WebSocket/MCP routing.
Request Model
A deployment request is explicit and auditable.
{
"requestId": "01964b05-0000-7000-8000-000000000001",
"hostId": "01964b05-552a-7c4b-9184-6857e7f3dc5f",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "microk8s-local",
"namespace": "petstore-dev",
"action": "deploy",
"values": {
"name": "petstore",
"image": {
"repository": "networknt/openapi-petstore",
"tag": "latest"
}
},
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
},
"options": {
"dryRun": false,
"waitForRollout": true,
"timeoutSeconds": 300,
"pruneOverride": false
}
}
The current implementation supports inline values. The request model also
contains fields for future values references and immutable snapshot metadata so
it can align with the full portal deployment workflow.
When invoking a specific /mcp/tools/{tool} endpoint, callers do not need to
send action. The deployer derives the action from the tool name. The generic
/deployments endpoint still expects an explicit action in the request body.
For the MCP endpoint, callers use JSON-RPC:
{
"jsonrpc": "2.0",
"id": "tools-list-1",
"method": "tools/list",
"params": {}
}
Tool invocation uses tools/call:
{
"jsonrpc": "2.0",
"id": "render-1",
"method": "tools/call",
"params": {
"name": "deployment.render",
"arguments": {
"hostId": "local-host",
"instanceId": "petstore-dev",
"environment": "dev",
"clusterId": "local",
"namespace": "light-deployer",
"values": {},
"template": {
"repoUrl": "local",
"ref": "main",
"path": "k8s"
}
}
}
}
tools/call derives the deployment action from params.name; callers should
not provide an action field in arguments.
Actions
render
: Fetch templates, render manifests, add namespaces and management labels, and
return resource summaries plus a manifest hash.
dryRun
: Render manifests and validate them against Kubernetes using server-side
dry-run.
diff
: Render manifests, fetch current managed resources, calculate additions,
modifications, and pruned resources, and return a redacted diff summary.
deploy
: Accept the request, run the deployment in the background, apply manifests,
prune removed managed resources, and stream events.
undeploy
: Delete resources associated with the deployment.
status
: Return current managed resource status.
rollback
: Reserved for redeploying a previous immutable portal snapshot. Native
Kubernetes rollout undo is not the target rollback model because it does not
restore ConfigMaps, Secrets, or values snapshots.
Template Fetching
Templates are loaded through the TemplateSource trait.
The current source supports two modes:
- local template root through
LIGHT_DEPLOYER_TEMPLATE_BASE_DIR - remote HTTPS Git clone through
gix
For remote repositories, the deployment request provides:
{
"template": {
"repoUrl": "https://github.com/networknt/openapi-petstore.git",
"ref": "master",
"path": "k8s"
}
}
Private HTTPS Git access is controlled by environment variables:
LIGHT_DEPLOYER_GIT_TOKEN: token or app passwordLIGHT_DEPLOYER_GIT_USERNAME: optional username override
Defaults:
- GitHub uses
x-access-token - Bitbucket Cloud uses
x-token-auth
SSH authentication is intentionally deferred because it requires private key
handling and strict known_hosts validation.
Template Format
The built-in renderer uses simple placeholders:
image: ${image.repository}:${image.tag:latest}
Supported behavior:
- nested paths such as
image.repository - default values after
: - render failure when a required value is missing
- placeholder replacement only inside YAML string scalar values
The renderer parses YAML into serde_yaml::Value, traverses the AST, replaces
placeholders, and serializes or applies structured YAML values afterward. This
avoids the most common raw string replacement bugs around quoting,
indentation, certificates, and multi-line values.
Because placeholders currently produce strings, templates should avoid
placeholders in numeric-only Kubernetes fields unless Kubernetes accepts a
string value there. For example, containerPort should be fixed or rendered by
a future typed placeholder extension.
Resource Metadata
After rendering, the deployer ensures every resource has the target namespace and adds management labels:
app.kubernetes.io/managed-by=light-deployerlightapi.net/host-idlightapi.net/instance-idlightapi.net/request-id
These labels are used for status lookup and pruning.
Kubernetes Execution
Kubernetes execution is behind the KubeExecutor trait.
Current implementations:
KubeRsExecutor: real Kubernetes API execution throughkube-rsNoopKubeExecutor: local render/test mode
Execution mode:
LIGHT_DEPLOYER_KUBE_MODE=real: force real Kubernetes modeLIGHT_DEPLOYER_KUBE_MODE=noop: force no-op mode- default: real mode when
KUBERNETES_SERVICE_HOSTis present, otherwise no-op
The production path uses kube-rs, not kubectl.
Kubernetes operations should use:
- in-cluster ServiceAccount auth when running as a pod
- server-side dry-run for validation
- server-side apply with field manager
light-deployer - structured status and error handling
Pruning
The deployer is declarative. If a previously managed resource is no longer rendered from the template, it should be considered for pruning.
Pruning is calculated by comparing:
- current resources in the namespace with
lightapi.net/instance-id - resources rendered from the new template
The policy layer enforces blast-radius protection:
- maximum delete percentage
- sensitive kinds requiring override
- explicit
pruneOverridein deployment options
This prevents stale resources while still protecting against accidental large-scale deletion.
Policy
The local deployer.yml policy constrains what a deployer is allowed to do.
Policy dimensions:
- allowed namespaces
- allowed repository hosts
- allowed repository URL prefixes
- allowed image registries
- allowed actions
- allowed Kubernetes kinds
- blocked Kubernetes kinds
- prune settings
- development insecure mode
Version 1 allows application-level resource kinds by default:
DeploymentServiceIngressConfigMapSecret
Cluster-scoped and control-plane resources are blocked by default:
NamespaceClusterRoleClusterRoleBindingCustomResourceDefinition- admission webhooks
Security
The deployer can mutate a Kubernetes cluster, so its default posture must be conservative.
Required practices:
- run in Kubernetes with a dedicated ServiceAccount
- prefer namespace-scoped
RoleandRoleBinding - restrict allowed namespaces and resource kinds
- restrict template repository hosts or prefixes in production
- restrict image registries in production
- never log raw rendered Secret manifests
- never log raw Kubernetes patch/apply payloads containing Secret data
- return redacted summaries and diffs
Secret values in rendered manifests are redacted before being included in responses or diffs. Kubernetes Secret values are base64 encoded, not encrypted, so they must be treated as plaintext for logging purposes.
Response Model
Responses include enough detail for callers to understand what happened without exposing secrets.
Important fields:
requestIdactionstatusdeployerIdclusterIdnamespacemanifestHashtemplateCommitSharesourcesdiffeventserror
Resource summaries contain kind, namespace, name, apiVersion, and action. Full rendered manifests should not be returned or persisted by default.
Event Model
Long-running operations return quickly and continue in the background.
Clients can subscribe to:
GET /events?request_id=...
Events contain:
- request ID
- timestamp
- status
- message
- optional resource identity
The event stream is currently direct SSE. Controller-mediated mode can forward the same event shape later.
Installation
The app includes Kubernetes install manifests under apps/light-deployer/k8s:
- namespace
- RBAC
- deployment
- service
The deployment runs the container with LIGHT_DEPLOYER_KUBE_MODE=real. The
image contains /app/config, and server.yml defaults the HTTP port to 7088.
For MicroK8s testing:
./apps/light-deployer/build.sh latest
docker save networknt/light-deployer:latest | microk8s ctr image import -
microk8s kubectl apply -f apps/light-deployer/k8s/namespace.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/rbac.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/deployment.yaml
microk8s kubectl apply -f apps/light-deployer/k8s/service.yaml
Current Limitations
- Direct HTTP/MCP-style mode is implemented first; controller-mediated WebSocket routing is a later integration step.
- Inline values are implemented; config-server
valuesReffetching is still a future integration point. - Rollback is represented in the model but needs portal snapshot integration.
- Helm and Kustomize are not implemented yet.
- Typed placeholders are not implemented yet.
- Rollout watch depth is intentionally basic in the first phase.
Design Direction
Keep light-deployer small and cluster-local.
The deployer should execute precise deployment commands, enforce local safety policy, and report structured results. It should not grow into a portal, workflow engine, or deployment database. That separation keeps the service easy to install inside customer clusters and reduces the security blast radius.