Back to projects

AI Harness

AI agents need an operating model before they touch enterprise workflows.

AI Harness is self-hosted TypeScript infrastructure for building provider-neutral LLM agent systems inside an application or platform, with typed agents, workflows, tools, skills, state, sandboxing, run events, traces, and eval helpers.

Why it matters

AI pilots are easy to demo and hard to operate. Prompts, tools, model calls, approvals, traces, state, and provider choices scatter across notebooks, SDK snippets, SaaS automations, and hidden scripts. That is not an enterprise operating model. It is liability with a chat interface.

What changes

AI Harness puts the AI runtime inside the application boundary. Teams define typed agents for model-driven loops, typed workflows for orchestration, explicit tools and skills for capability, provider adapters for model choice, and run events, traces, state, sandboxing, review gates, and eval helpers around the execution path.

How the model holds

The application creates a harness session; agents handle typed model loops, workflows orchestrate business processes, tools and skills provide capability, adapters connect providers, and state, sandbox, telemetry, run events, review gates, and eval helpers wrap the execution boundary.
ai harness AI Harness
typed agent intent model loop + tools
owned boundary session state, policy, review
provider edge adapters models, sandbox, telemetry
skills run events evals traces

The application creates a harness session; agents handle typed model loops, workflows orchestrate business processes, tools and skills provide capability, adapters connect providers, and state, sandbox, telemetry, run events, review gates, and eval helpers wrap the execution boundary.

AI Harness is for the point where AI stops being an experiment and starts touching work that has owners, budgets, risks, and audit trails.

The core move is simple: do not let the model provider become the architecture. Put the agent runtime inside the enterprise application boundary. Keep the model call behind adapters. Keep tools explicit. Keep workflows typed. Keep approvals, state, traces, and evals close to the business process.

Agents handle the model-driven loop: prepare messages, call a model, execute tools, continue until a validated output exists, and emit run events. Workflows handle the surrounding business process: sequence agents, branch, parallelize, apply deterministic logic, request review, write artifacts, and persist state.

That gives decision makers a path from AI prototype to enterprise platform: provider choice remains open, tool execution is controlled, outputs are validated, review gates are visible, and operational evidence exists without turning prompts and customer data into accidental log retention.

Where this connects to enterprise pressure.

Provider neutrality keeps model selection, procurement, fallback, and regional deployment choices open.
Typed input and output schemas make AI behavior reviewable before it becomes business state.
Review gates support human approval for mutation, policy decisions, and stale run rejection.
Privacy-safe telemetry defaults avoid persisting prompts, model outputs, tool payloads, files, memory, or user data in run events.
Sandbox and tool controls make execution capability explicit instead of implicit in an agent prompt.

AI Harness turns pressure into an operating model.

Typed agents for LLM conversation loops with tool execution and validated outputs

Typed workflows for sequencing, branching, parallel agents, review gates, durable writes, and artifacts

Provider-neutral model operations for text, structured objects, multimodal work, embeddings, and reranking

TypeScript, built-in, and MCP tools plus reusable skill directories

State, sandboxing, logs, traces, run events, and provider-neutral eval helpers

Provider adapters for OpenAI, Anthropic, Amazon Bedrock, and Azure AI Foundry

Where decision makers should care.

Build internal agent platforms without handing the operating model to a single model provider
Move AI workflows from demo scripts into typed, observable, reviewable application code
Add human approval before mutation, state changes, or sensitive tool execution
Compare prompt candidates and scorer tests before AI behavior reaches production workflows
Keep traces, run events, and retained state aligned with enterprise data-handling policy