LLM Gateway
Status
Design proposal. The current light-agent runtime selects one active model
provider from model-provider.yml. That is acceptable if the selected provider
is an LLM gateway endpoint. The agent does not need to know how many upstream
providers the gateway can reach.
Purpose
The LLM gateway is a centralized model access layer for agents and services. Instead of each agent carrying credentials, endpoint details, routing rules, and provider fallback logic, each agent calls one gateway endpoint. The gateway then routes the request to OpenAI, Azure OpenAI, Bedrock, Anthropic, Gemini, Ollama, Codex, or another provider based on agent configuration, model policy, prompt characteristics, capability requirements, health, cost, and compliance constraints.
This keeps light-agent simple and matches the current bootstrap model:
startup.ymlis local.- Runtime configuration is fetched from config-server.
- The agent loads one model provider after bootstrap.
- That provider can be an OpenAI-compatible LLM gateway.
- The gateway owns multi-provider fan-out.
Goals
- Keep agents configured with one active model endpoint.
- Support many upstream LLM providers at the gateway at the same time.
- Allow provider routing by agent, service id, environment, prompt intent, requested capability, logical model name, cost, latency, region, and health.
- Keep provider credentials out of agent pods and agent config.
- Preserve the existing
model-providerabstraction for direct provider access and reuse it inside the gateway where useful. - Expose a provider-compatible HTTP API so existing agents can use the gateway without a new SDK.
- Support normal light-fabric bootstrap, config-server overrides, module registry visibility, config reload, controller registration, and audit.
- Make gateway decisions explainable enough for operations and compliance.
Non-Goals
- Do not make
light-agentload many providers directly for this use case. Multi-provider routing belongs in the gateway. - Do not require every agent to understand provider-specific fields such as AWS region, Azure deployment name, or Anthropic max token settings.
- Do not expose upstream provider secrets through
tools/list, diagnostics, or agent-visible configuration. - Do not depend on an LLM classification call for every routing decision. The gateway should support deterministic routing first and optional classifier routing later.
- Do not merge the LLM gateway with the MCP router. The LLM gateway routes model calls; the MCP router routes tool calls.
Relationship To Existing Components
Light-Agent
light-agent should continue to select one model provider after runtime
bootstrap. For an LLM gateway deployment, the selected provider is the gateway:
model-provider.provider: compatible
model-provider.model: agent-default
compatible.name: llm-gateway
compatible.baseUrl: https://llm-gateway.light-gateway:8443/v1
compatible.apiKey: ${secret.llmGatewayApiKey}
The model-provider.model value becomes a logical model name. It does not need
to be an upstream provider model id. Examples:
model-provider.model: agent-default
model-provider.model: fast
model-provider.model: reasoning
model-provider.model: coding
model-provider.model: pii-safe
The gateway maps the logical model to a physical provider and model.
Light-Gateway
The LLM gateway should be implemented as a light-gateway product capability,
activated by handler/config. This keeps LLM egress under the same gateway
family that already handles MCP routing, auth, rule execution, metrics,
service discovery, bootstrap, and reload.
The first implementation can expose an OpenAI-compatible endpoint:
POST /v1/chat/completions
That is enough for CompatibleProvider and many external clients. Later phases
can add:
POST /v1/responses
GET /v1/models
Model Provider Crate
The gateway can reuse crates/model-provider for upstream calls. The crate
already contains concrete providers and meta-providers:
- OpenAI
- Azure OpenAI
- Anthropic
- Bedrock
- Codex
- Compatible
- Gemini
- GLM
- Ollama
- OpenRouter
- Telnyx
- Copilot
- CLI providers where operationally appropriate
RouterProviderReliableProvider
For the gateway, direct concrete providers are upstream adapters. Routing and fallback should be controlled by gateway config and policy, not by each agent.
Request Flow
agent
-> LLM provider trait
-> CompatibleProvider
-> light-gateway /v1/chat/completions
-> auth, correlation, policy, rate limit
-> LLM route decision
-> upstream provider adapter
-> upstream LLM provider
-> normalized response
-> audit, metrics, token usage
-> agent
The agent sees one model provider. The gateway sees the full routing context.
Routing Inputs
The gateway should make routing decisions from a combination of trusted inputs:
- Authenticated caller identity from JWT, mTLS, or gateway-authenticated service registration.
- Agent metadata such as host id, agent definition id, service id, environment, tenant, and account.
- Logical model name from the request body.
- Request capabilities: tool calling, vision, JSON mode, long context, reasoning, streaming, prompt caching.
- Prompt features: intent keywords, size, language, sensitivity markers, coding vs support vs workflow execution.
- Configured policy: allowed providers, blocked providers, region constraints, cost tier, data residency, fallback chain.
- Runtime health: provider availability, error rate, latency, quota pressure.
If metadata is supplied as headers, the gateway should only trust those headers from authenticated internal clients. Otherwise it should derive identity from the token or connection.
Suggested internal headers:
X-Light-Request-Id
X-Light-Service-Id
X-Light-Env-Tag
X-Light-Agent-Host-Id
X-Light-Agent-Definition-Id
X-Light-Tenant-Id
Routing Stages
Routing should be deterministic before it is intelligent.
-
Explicit route
If the request asks for a logical model with a direct configured route, use that route.
-
Agent policy
Apply policy for the authenticated agent or service. This can narrow the allowed logical models and upstream providers.
-
Capability filter
Remove upstreams that cannot satisfy required capabilities such as tools, vision, long context, or streaming.
-
Prompt classifier
Optionally classify the prompt into a routing domain such as
fast,reasoning,coding,customer-support, orrestricted-data. -
Cost and latency preference
Choose the cheapest or fastest provider that satisfies policy and capability constraints.
-
Health and fallback
If the selected upstream is unhealthy or returns a retryable error, follow a configured fallback chain.
Gateway Configuration
The gateway should use a dedicated config file, for example
llm-gateway.yml, loaded through the same runtime config layering as other
light-fabric modules.
Example:
enabled: ${llm-gateway.enabled:true}
pathPrefix: ${llm-gateway.pathPrefix:/v1}
defaultRoute: ${llm-gateway.defaultRoute:agent-default}
routes:
agent-default:
provider: openai-prod
model: gpt-4o
fallbacks:
- provider: bedrock-us
model: anthropic.claude-3-5-sonnet-20240620-v1:0
fast:
provider: openai-prod
model: gpt-4o-mini
reasoning:
provider: bedrock-us
model: anthropic.claude-3-7-sonnet-20250219-v1:0
requiredCapabilities:
- tools
- long-context
providers:
openai-prod:
type: openai
baseUrl: ${llm.openai.baseUrl:https://api.openai.com/v1}
apiKey: ${llm.openai.apiKey:}
maxTokens: ${llm.openai.maxTokens:}
costTier: medium
regions:
- global
bedrock-us:
type: bedrock
region: ${llm.bedrock.region:us-east-1}
accessKeyId: ${llm.bedrock.accessKeyId:}
secretAccessKey: ${llm.bedrock.secretAccessKey:}
sessionToken: ${llm.bedrock.sessionToken:}
costTier: high
regions:
- us-east-1
agentPolicies:
com.networknt.agent.account-1.0.0:
defaultRoute: agent-default
allowedRoutes:
- agent-default
- fast
- reasoning
blockedProviders: []
dataResidency:
allowedRegions:
- us-east-1
- global
fallback:
maxRetries: ${llm-gateway.fallback.maxRetries:1}
baseBackoffMs: ${llm-gateway.fallback.baseBackoffMs:100}
The exact schema can evolve, but the important boundary is stable:
- Agent config points to one gateway endpoint.
- Gateway config owns provider inventory and route policy.
- Provider secrets are masked in module registry output.
Provider Inventory
Each configured provider should have:
- A stable provider id.
- A provider type.
- Provider-specific connection settings.
- Supported capabilities.
- Allowed regions.
- Cost tier.
- Timeout and retry settings.
- Optional quota metadata.
- Optional tenant or account restrictions.
Provider ids should be operational names, not user-visible model names:
providers:
openai-prod:
type: openai
openai-eu:
type: azure-openai
bedrock-us:
type: bedrock
local-ollama:
type: ollama
Logical model names are route names. They are safe for agents to request.
Request And Response Contract
The first API should be OpenAI-compatible enough for CompatibleProvider:
POST /v1/chat/completions
Authorization: Bearer <agent-or-service-token>
Content-Type: application/json
Request:
{
"model": "agent-default",
"messages": [
{"role": "system", "content": "You are a support agent."},
{"role": "user", "content": "Help me investigate this account."}
],
"temperature": 0.7,
"tools": []
}
Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1200,
"completion_tokens": 240,
"total_tokens": 1440
},
"light_gateway": {
"route": "agent-default",
"provider": "openai-prod",
"model": "gpt-4o"
}
}
The light_gateway field should be optional and controlled by diagnostics
policy. It is useful for internal debugging, but may be hidden from external
clients.
Tool Calling
Tool calling remains an agent responsibility, but model-native tool-call generation flows through the LLM gateway.
The LLM gateway should:
- Accept OpenAI-style tool definitions from the agent.
- Convert tool definitions to the upstream provider's native format when possible.
- Normalize provider tool-call responses back to the OpenAI-compatible shape.
- Return clear errors when a route cannot support tool calling.
The gateway should not execute MCP tools. The agent still calls
light-gateway MCP endpoints for tools/list and tools/call.
Security
The gateway becomes the model egress control point, so it must enforce:
- Authentication for every model request.
- Authorization for logical models and provider routes.
- Tenant isolation.
- Provider allowlists and denylists.
- Secret masking in module registry and diagnostics.
- Optional request/response redaction or tokenization hooks.
- Data residency rules.
- Rate limits by tenant, agent, route, and provider.
- Audit records for route selection and usage.
Provider credentials should live only in gateway config or the secret system feeding config-server. They should not be copied into agent config.
Observability
Each gateway model call should produce structured telemetry:
- Request id.
- Caller identity.
- Agent id and service id when available.
- Logical model.
- Selected provider and physical model.
- Routing reason.
- Fallback attempts.
- Prompt and completion token counts.
- Latency by stage.
- Provider status code and error class.
- Cache hit or prompt-cache usage where available.
- Policy decisions.
Metrics should support dashboards by route, provider, tenant, and agent.
Config Reload
The gateway should register llm-gateway.yml in the module registry and support
runtime reload.
Reload should be atomic:
- Load and validate the new config.
- Build provider clients and route tables.
- Reject invalid route references before swapping state.
- Swap active routing state.
- Keep in-flight requests on the old state.
Validation should catch:
- Unknown provider ids.
- Unknown provider types.
- Routes without a provider.
- Fallbacks pointing to missing providers.
- Logical routes that require capabilities no provider can satisfy.
- Missing required provider settings for active routes.
Agent Configuration Pattern
For a direct provider deployment:
model-provider.provider: bedrock
model-provider.model: anthropic.claude-3-5-sonnet-20240620-v1:0
bedrock.region: us-east-1
For a gateway deployment:
model-provider.provider: compatible
model-provider.model: agent-default
compatible.name: llm-gateway
compatible.baseUrl: https://llm-gateway.light-gateway:8443/v1
compatible.apiKey: ${llmGateway.agentApiKey}
The second form is the preferred enterprise model once centralized routing is available.
Phased Implementation
Phase 1: OpenAI-Compatible Gateway Endpoint
- Add
llm-gateway.yml. - Add a
light-gatewayhandler for/v1/chat/completions. - Support non-streaming OpenAI-compatible requests and responses.
- Route by logical model name to one configured upstream provider.
- Mask provider secrets in module registry.
- Add basic audit and metrics.
Phase 2: Policy, Fallback, And Reload
- Add per-agent route policy.
- Add health-aware fallback chains.
- Support runtime reload with atomic state swap.
- Add diagnostics endpoint or module registry details for active routes.
Phase 3: Capability-Aware Routing
- Add capability metadata for each provider route.
- Route by tools, vision, long context, streaming, and JSON mode.
- Normalize tool-call request and response shapes across providers.
Phase 4: Prompt-Aware Routing
- Add deterministic prompt classifiers.
- Add optional lightweight model or embedding classifier for complex routing.
- Record routing reasons for audit.
Phase 5: Advanced Provider Features
- Add streaming.
- Add
/v1/responses. - Add prompt caching hints.
- Add quota-aware routing.
- Add data redaction or tokenization hooks when the tokenization service contract is finalized.
Open Questions
- Should the first implementation live in
light-pingoraas a handler module or in a newllm-gatewaycrate used bylight-gateway? - Should logical model policy be stored only in config-server values, or also managed by portal database tables for runtime UI edits?
- Should gateway diagnostics expose selected provider/model to agents, or only to operators?
- Should prompt-aware routing use Light-Rule first, a dedicated classifier, or both?
- How should provider quota information be collected for cloud providers that do not expose uniform quota APIs?
Decision
Use the LLM gateway as the single model provider endpoint for enterprise
agents. light-agent stays single-provider from its own point of view. The
gateway owns multiple upstream providers, route selection, fallback,
credentials, policy, audit, and observability.