Integration guide — get value in 5 minutes, not 5 days¶

The shortest path from pip install to running ontology against your data. If you read nothing else in this repository, read this. Deeper reference lives in install.md and the docs in this directory.

Contents¶

5-minute SQLite path
30-minute existing-database path
What you actually got
Adding LLMs / drift / federation later
You probably don't need (yet)

5-minute SQLite path¶

Three commands, one bundled SQLite database, no API keys, no driver compiles.

# 1. Install (~50 MB, no compile)
pip install ontoforge

# 2. Run the full pipeline against the bundled retail demo DB
ontoforge --db db/demo.db --out output/demo

# 3. Open the report
open output/demo/reports/toolkit_report.html

That's it. You now have a working OWL ontology, SHACL shapes, JSON-LD context, mapping workbook, and a governance scorecard — all under output/demo/. Skim the HTML report and you've seen everything the pipeline produces.

To use your own SQLite file, swap in its path. Nothing else changes.

30-minute existing-database path¶

For PostgreSQL, MySQL, SQL Server, Oracle, or DB2.

Step 1 — install the right driver only¶

pip install -r requirements-core.txt
pip install psycopg2-binary          # or mysql-connector-python / pyodbc / oracledb / ibm_db

Step 2 — add two system tables¶

The toolkit needs a small annotation control plane in your DB. Two tables, one-time setup, no changes to your existing tables:

-- copy from db/schema.sql (the relevant CREATE TABLE statements)
CREATE TABLE ontology_metadata (
    table_name      TEXT NOT NULL,
    column_name     TEXT,
    semantic_class  TEXT,        -- e.g. 'Customer', 'Order', 'NetworkFunction'
    business_term   TEXT,        -- plain-language meaning of this column
    sensitivity     TEXT,        -- Public | Internal | Confidential | Restricted
    -- ...other optional columns; see db/schema.sql for the full list
    PRIMARY KEY (table_name, column_name)
);

CREATE TABLE semantic_loss_log (
    finding_id      TEXT PRIMARY KEY,
    severity        TEXT,        -- info | warn | error
    description     TEXT,
    captured_at     TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 3 — annotate the columns that matter¶

You don't need to annotate every column. The minimum that produces a useful ontology is:

One row per important table with a semantic_class (e.g. Customer, Order).
One row per primary key column so it becomes the identifier.
Sensitivity tier on any PII or restricted column.

Skip everything else on day 1. Re-run the pipeline and refine over time.

INSERT INTO ontology_metadata (table_name, column_name, semantic_class, business_term, sensitivity) VALUES
  ('orders',    NULL,          'Order',    'A customer purchase',                'Internal'),
  ('orders',    'id',          'Order',    'Order identifier',                   'Internal'),
  ('orders',    'customer_id', 'Customer', 'Foreign key to the placing customer','Internal'),
  ('customers', NULL,          'Customer', 'A person or organisation',           'Confidential'),
  ('customers', 'email',       'Customer', 'Primary contact address',            'Confidential');

Step 4 — run the pipeline¶

python3 toolkit.py --db "postgresql://user:pass@host/mydb" --out output/
open output/reports/toolkit_report.html

If you'd rather start from a wizard with no SQL writing, see install.md §2.

What you actually got¶

After output/ is populated, three files matter on day 1:

File	Why you care
`output/reports/toolkit_report.html`	One-page visual run summary — open this first
`output/ontology/enterprise.ttl`	The OWL 2 ontology you can hand to any RDF tool, reasoner, or graph store
`output/jsonld/enterprise-context.json`	The JSON-LD context used by every downstream agent payload

Everything else (SHACL shapes, SKOS vocab, mapping workbook, MCP tool definitions) is real value but not required to evaluate the toolkit's output.

Adding LLMs / drift / federation later¶

These are opt-in. Skip them on day 1 — the pipeline runs fine without any of them.

When you want to…	Install	Read
Wire LLM calls through the SHACL gates and OWL grounding	`pip install -r requirements-runtime.txt`	docs/runtime.md · docs/sdk.md
Monitor production data drift against the ontology	`pip install -r requirements-drift.txt`	examples/infodrift/README.md
Use the browser-based wizard	`pip install -r requirements-advanced.txt`	install.md §2
Publish to Fuseki / Stardog / Neptune / GraphDB	`pip install -r requirements-advanced.txt` (Neptune only)	docs/advanced.md
Federate with another organisation's ontology	core only	docs/advanced.md
Generate compliance evidence bundles	core only	docs/advanced.md

You probably don't need (yet)¶

These are advanced features. Useful when you have a specific need — distracting otherwise. Each lives behind a --phase flag and is documented inside docs/advanced.md.

--phase reasoner — OWL 2 consistency checking via ROBOT (needs Java + a 100 MB jar)
--phase modular — splits the ontology into importable modules with cycle detection
--phase discover — NLP entity discovery from log corpora (requires spaCy + an English model)
--phase tmf630 — TMF Open API Task + Bulk operations
--phase evolve — autonomous ontology evolution proposals
--phase federate — cross-enterprise federation with signed manifests
--phase comply — regulatory evidence bundles (EU AI Act, Basel IV, HIPAA, Ofcom)
--phase embed / --phase retrieve — ontology-bounded vector retrieval
--phase monitor — production drift monitoring (drift_monitor / infodrift)

Run python3 toolkit.py --help to see every flag, but ignore most of them on first contact.

When to read what¶

Document	When
This file	First contact, integration recipe, "what's the minimum?"
install.md	You hit a database driver issue or want every connection-string format
features.md	You want to see the full capability map and decide where to read next
docs/artifacts.md	You want to know what a specific generated file is for
docs/runtime.md	You're wiring LLM calls through the toolkit's gates
docs/advanced.md	You're enabling drift monitoring, federation, compliance, vector retrieval, etc.
docs/sdk.md	You're writing application code that calls `RuntimeClient`, `Grounder`, etc.
examples/	You want to see a runnable end-to-end demo