Skip to content

Runtime layer — connecting the toolkit to AI and LLMs

How the toolkit's OWL/SHACL/JSON-LD outputs become governance gates around live LLM calls. Covers flavors, grounding, payload assembly, input/output gates, the SDK, all five LLM adapters, and OCI Generative AI setup. For the SDK reference see sdk.md.


8. Runtime — connecting the toolkit to AI and LLMs

The pipeline phases (1–5, TMF) build the semantic artifacts. The runtime layer is how an enterprise uses those artifacts to power AI agents and LLM applications. This section describes the architecture, the five components, and the two flows every deployment follows.

The core idea

After onboarding, an enterprise has: - A master OWL 2 ontology describing every domain concept - SHACL validation shapes enforcing data quality - A JSON-LD context binding field names to ontology IRIs - Ontology-annotated enterprise data in a relational database

The runtime layer assembles these into a governed payload sent to an LLM. The LLM responds. The response is validated, stamped with provenance, and stored back as a governed enterprise fact.

Ontology flavors — scoped views per agent type

An enterprise has many types of AI agents. A network monitoring agent, a billing agent, and a compliance agent all work with different concepts. You do not send the entire enterprise ontology to every agent — you send the relevant slice.

Each flavor is a named configuration file that defines: - The subset of OWL classes and properties relevant to this agent type - The corresponding SHACL shape subset for validation - A scoped JSON-LD context with only the terms this agent needs - The sensitivity-tier access level for this agent's authorisation

Flavors are stored in runtime/flavors/{name}.json and auto-generated from ontology_metadata sid_domain groupings. They can be manually extended.

Starter flavors (telecom):

Flavor Classes included Typical agent use
network-ops Resource, NetworkFunction, Alarm, PerformanceIndicator Network monitoring, fault detection
billing Product, ProductOrder, CustomerAccount, Agreement Invoice generation, payment processing
compliance Policy, Agreement, Party, PartyRole Regulatory checks, audit trail queries
customer Party, Service, Product, CustomerAccount Customer service, CX analysis
fault-management Alarm, ServiceProblem, Resource, Service Incident management, root cause analysis

Two flows

Flow A — Semantic preparation (offline, runs when schema or ontology changes)

Enterprise DB
    ↓ ontology_metadata annotations
Toolkit pipeline (Phases 1–5, TMF)
    ↓ OWL ontology + SHACL shapes + JSON-LD context
Flavor registry
    ↓ named scoped views over the master ontology
Ready for runtime consumption

This flow runs once per schema change or major ontology update. It populates the output/ directory with artifacts the runtime layer reads at query time.

Flow B — Runtime inference (per question, per agent invocation)

Step 1 — Select flavor
  Incoming question → choose relevant flavor (network-ops, billing, etc.)

Step 2 — Ground the data  ← THE CRITICAL STEP MOST IMPLEMENTATIONS MISS
  Query enterprise DB for records relevant to the question
  Serialise records as JSON-LD using the flavor's scoped context
  Result: every field value is bound to its ontology IRI — not raw data

Step 3 — Validate inbound data
  Run SHACL acceptance gate on the grounded records
  Reject malformed, incomplete, or low-confidence records here
  Bad data rejected before reaching the LLM

Step 4 — Assemble the payload
  Component 1: System prompt — ontology summary + domain rules + agent role
  Component 2: Ontology flavor — class and property definitions in natural language
  Component 3: Grounded data — JSON-LD serialised enterprise records
  Component 4: PROV-O context — provenance of each data point
  Component 5: Question + output format instructions

Step 5 — Send to LLM of choice
  Anthropic, OpenAI, Google, Llama, or any custom endpoint
  The payload is LLM-agnostic — JSON-LD works with any model

Step 6 — Govern the output  ← THE MISSING STEP IN MOST DESIGNS
  SHACL validate structured response against ontology shapes
  Stamp PROV-O provenance:
    prov:wasGeneratedBy = LLM identifier + model version
    prov:generatedAtTime = timestamp
    confidence_score = extracted from model output or metadata
    derivation_method = SYNTHESIZED
  Store as ObservationRecord back into the semantic layer

Why the grounding step matters

Raw enterprise data sitting next to an ontology in a prompt does not connect the two. An LLM reading a field called status = Enabled cannot infer that this maps to the OWL class NetworkFunction with hasOperationalState = Enabled — unless something makes that binding explicit. JSON-LD serialisation is that binding mechanism. Without it, you have data and a schema in the same prompt; with it, you have semantically typed assertions the model can reason over with precision.

Why the output governance step matters

An LLM response is a new assertion entering your enterprise knowledge base. Without output governance, it is an ungovernable, untraceable string. With PROV-O stamping it becomes a first-class enterprise fact with a full provenance chain: who asked, which model answered, when, with what confidence, derived from which source records. This is what makes AI output auditable — and what regulators increasingly require.

Running the runtime phase

# Validate all flavors and generate runtime MCP tools
python3 toolkit.py --phase runtime

# Ground data for a question, output JSON-LD to stdout
python3 runtime/grounder.py --flavor network-ops \
    --question "Which NFs are degraded?" \
    --db db/enterprise.db

# Validate inbound records against SHACL shapes before they reach the LLM
python3 -c "
import sys; sys.path.insert(0, 'runtime')
from input_gate import InputGate
gate = InputGate('db/enterprise.db', min_confidence=0.6)
accepted, rejected = gate.screen_db_query('tmf_resource', flavor_name='network-ops')
print(gate.summary(accepted, rejected))
"

Runtime SDK

import sys; sys.path.insert(0, 'runtime')
from client import RuntimeClient

client = RuntimeClient(
    db_path="db/enterprise.db",
    adapter="anthropic",         # or "openai", "vertex", "ollama"
    model="claude-sonnet-4-5",
)

result = client.ask(
    question="Which 5G network functions are currently degraded and what is their impact on active services?",
    flavor="network-ops",
    output_format="json",
)

print(result["answer"])           # LLM response text
print(result["valid"])            # True if output gate SHACL check passed
print(result["observation_iri"])  # IRI of the PROV-O ObservationRecord stored
print(result["prov"])             # Full provenance dict: model, timestamp, confidence

Supported LLM adapters

Adapter adapter= key Default model Install
Anthropic Messages API "anthropic" claude-sonnet-4-5 pip install anthropic
OpenAI Chat Completions "openai" gpt-4o pip install openai
Google Vertex AI (Gemini) "vertex" gemini-1.5-pro pip install google-cloud-aiplatform
Ollama (local) "ollama" llama3 Ollama server running at localhost:11434
Oracle Cloud (OCI Generative AI) "oci" cohere.command-r-plus pip install oci — see OCI Generative AI setup below

All adapters are optional — the core runtime modules (grounder, assembler, input_gate, output_gate) have zero external dependencies. Install only the adapter you need. The Anthropic adapter uses prompt caching on the system prompt for reduced latency and cost.

Runtime MCP tools

The runtime phase generates output/jsonld/runtime-mcp-tools.json with four MCP tool definitions: ground_data, assemble_payload, validate_response, and ask_ontology. These tools let any MCP-compatible agent call the runtime pipeline directly.

What the runtime layer does NOT do

The runtime layer is not a replacement for the toolkit pipeline. It does not generate ontologies, create SHACL shapes, or manage database schemas. Those are the pipeline's responsibility. The runtime layer is strictly a consumption layer — it reads the pipeline's outputs and uses them to power governed, auditable LLM interactions.

The runtime layer also does not choose which LLM to use. That is an enterprise decision. The payload it assembles is LLM-agnostic, and adapters for Anthropic, OpenAI, Google Vertex, and Ollama handle API-specific mechanics while the semantic payload remains identical.

Runtime components

Component Output
FlavorRegistry 5 starter flavors, auto-discovery from runtime/flavors/
Grounder JSON-LD nodes with @type, ontology IRI bindings, PROV-O grounding record
InputGate SHACL acceptance screening, rejection log to semantic_loss_log
PayloadAssembler 5-component payload, token budget, LLM-agnostic dict output
OutputGate SHACL response validation, PROV-O stamping, ObservationRecord storage
RuntimeClient Full pipeline in one call, async support, 4 LLM adapters

8.x OCI Generative AI setup

The runtime layer ships with an Oracle Cloud Infrastructure (OCI) adapter alongside Anthropic, OpenAI, Vertex AI, and Ollama. Use it when you want to route the toolkit's governed payloads to Cohere or Llama models hosted on OCI Generative AI.

1. Install the SDK

pip install oci

2. Configure credentials

The adapter follows the standard OCI config-file pattern documented at docs.oracle.com — Python SDK Configuration. Create ~/.oci/config (or run oci setup config) with at least:

[DEFAULT]
user=ocid1.user.oc1..<your-user-ocid>
fingerprint=<api-key-fingerprint>
key_file=~/.oci/oci_api_key.pem
tenancy=ocid1.tenancy.oc1..<your-tenancy-ocid>
region=us-chicago-1

3. Set the compartment

OCI Generative AI requires a compartment OCID for routing and billing:

export OCI_COMPARTMENT_ID=ocid1.compartment.oc1..<your-compartment-ocid>

Optional environment overrides:

Variable Default Purpose
OCI_CONFIG_FILE ~/.oci/config Path to the OCI config file
OCI_CONFIG_PROFILE DEFAULT Profile name within the config file
OCI_GENAI_ENDPOINT https://inference.generativeai.us-chicago-1.oci.oraclecloud.com Service endpoint (set this for non-Chicago regions)

4. Use it from the runtime

from runtime import RuntimeClient

client = RuntimeClient(
    db_path="db/enterprise.db",
    adapter="oci",
    model="cohere.command-r-plus",   # or a Meta/Llama model OCID
)

result = client.ask(question="Which network functions are degraded?", flavor="network-ops")
print(result["answer"])

The adapter defaults to the Cohere request shape. To target a Meta/generic model, instantiate OCIAdapter directly with provider="meta" and pass it via RuntimeClient(adapter=<instance>).