AI Harness

AI agents need an operating model before they touch enterprise workflows.

AI Harness is self-hosted TypeScript infrastructure for building provider-neutral LLM agent systems inside an application or platform, with typed agents, workflows, tools, skills, state, sandboxing, run events, traces, and eval helpers.

Docs Repository

Enterprise pressure

Why it matters

AI pilots are easy to demo and hard to operate. Prompts, tools, model calls, approvals, traces, state, and provider choices scatter across notebooks, SDK snippets, SaaS automations, and hidden scripts. That is not an enterprise operating model. It is liability with a chat interface.

Product answer

What changes

AI Harness puts the AI runtime inside the application boundary. Teams define typed agents for model-driven loops, typed workflows for orchestration, explicit tools and skills for capability, provider adapters for model choice, and run events, traces, state, sandboxing, review gates, and eval helpers around the execution path.

Operating model

How the model holds

The application creates a harness session; agents handle typed model loops, workflows orchestrate business processes, tools and skills provide capability, adapters connect providers, and state, sandbox, telemetry, run events, review gates, and eval helpers wrap the execution boundary.

ai harness AI Harness

typed agent intent model loop + tools

owned boundary session state, policy, review

provider edge adapters models, sandbox, telemetry

skills run events evals traces

Decision path

AI Harness is for the point where AI stops being an experiment and starts touching work that has owners, budgets, risks, and audit trails.

The core move is simple: do not let the model provider become the architecture. Put the agent runtime inside the enterprise application boundary. Keep the model call behind adapters. Keep tools explicit. Keep workflows typed. Keep approvals, state, traces, and evals close to the business process.

Agents handle the model-driven loop: prepare messages, call a model, execute tools, continue until a validated output exists, and emit run events. Workflows handle the surrounding business process: sequence agents, branch, parallelize, apply deterministic logic, request review, write artifacts, and persist state.

That gives decision makers a path from AI prototype to enterprise platform: provider choice remains open, tool execution is controlled, outputs are validated, review gates are visible, and operational evidence exists without turning prompts and customer data into accidental log retention.

Enterprise relation

Where this connects to enterprise pressure.

Provider neutrality keeps model selection, procurement, fallback, and regional deployment choices open.

Typed input and output schemas make AI behavior reviewable before it becomes business state.

Review gates support human approval for mutation, policy decisions, and stale run rejection.

Privacy-safe telemetry defaults avoid persisting prompts, model outputs, tool payloads, files, memory, or user data in run events.

Sandbox and tool controls make execution capability explicit instead of implicit in an agent prompt.

Capabilities

AI Harness turns pressure into an operating model.

Typed agents for LLM conversation loops with tool execution and validated outputs

Typed workflows for sequencing, branching, parallel agents, review gates, durable writes, and artifacts

Provider-neutral model operations for text, structured objects, multimodal work, embeddings, and reranking

TypeScript, built-in, and MCP tools plus reusable skill directories

State, sandboxing, logs, traces, run events, and provider-neutral eval helpers

Provider adapters for OpenAI, Anthropic, Amazon Bedrock, and Azure AI Foundry

Enterprise use cases

Where decision makers should care.

Build internal agent platforms without handing the operating model to a single model provider

Move AI workflows from demo scripts into typed, observable, reviewable application code

Add human approval before mutation, state changes, or sensitive tool execution

Compare prompt candidates and scorer tests before AI behavior reaches production workflows

Keep traces, run events, and retained state aligned with enterprise data-handling policy

Links

Docs Docs Repository Repository

Related writing will appear here as the article collection grows.