MUNICH · PRAGUE · ON-PREMISE AI ENGINEERS

AI that never
leaves the building.

Fine-tuned language models deployed on your GPU servers — air-gapped, GDPR-compliant by architecture, owned by you. For hospitals, law firms, and AI vendors locked out of regulated deals.

Live in 10+ public hospitals · Rowan Legal · T-Systems · Eurowag

10+

Public hospitals running our AI in production

0

External API calls in air-gapped mode

3 wks

Fastest contract-to-production deployment

100%

Data residency on your own hardware

LIVE AT

Fakultní nemocnice PlzeňFakultní nemocnice Hradec KrálovéPenta HospitalsT-SystemsEurowagRowan LegalEUCMedicalcDie NetzwerkpartnerHeggelAnywhere.LegalFakultní nemocnice PlzeňFakultní nemocnice Hradec KrálovéPenta HospitalsT-SystemsEurowagRowan LegalEUCMedicalcDie NetzwerkpartnerHeggelAnywhere.Legal

WHAT WE BELIEVE

Cloud AI is the wrong architecture for your data.

When the data is patient records, M&A drafts, or government correspondence, 'send it to the cloud' isn't a procurement decision — it's a liability. Every API call to a cloud LLM leaves a trace. Every query becomes training material. Or evidence in court. We built the alternative: AI that runs on your hardware, trained on your data, owned by you.
01

No tokens. No quotas. No third-party access.

02

Your data never leaves your network. By architecture.

03

You own the model. You own the updates. You own the outcomes.

— Krystof Olik, Founder

WHAT WE DEPLOY

Four systems. One footprint: your hardware.

Fine-tuned open-weights models, local inference, and production integrations into the systems you already run.

VOICE → RECORDS

Dictation to structured records

Doctors and lawyers dictate; structured output lands in the system of record. Speaker separation, domain vocabulary, schema-true output.

  • Olingo Speech
  • FHIR / HL7
  • OCR — 99.8% accuracy
  • Medical & legal vocabulary

DOCUMENTS → ANSWERS

Knowledge engine over your archive

Decades of contracts, records, and correspondence become a sourced answer system — citations down to the exact paragraph.

  • Local RAG
  • Vector search
  • Source citations
  • Legacy DB integration

BACKGROUND AGENTS

Airgapped agents for operations

Intake classification, document routing, compliance checks — running continuously inside your network with full audit logs.

  • Multi-agent orchestration
  • Local inference
  • Audit logs
  • Zero egress

MODELS & HARDWARE

Fine-tuned models on sized hardware

We pick or fine-tune open-weights models per domain and spec the GPU servers they run on. You own both.

  • Mistral / Llama / Whisper
  • Olingo model line
  • NVIDIA DGX / RTX
  • CUDA / ROCm

HOW ENGAGEMENT WORKS

Fixed scope. Fixed price. Something you keep at every step.

Three phases. Each one ends with an artifact that is yours — a blueprint, a running system, or both.

012 WEEKS

Sovereignty assessment

€9,800 fixed

We map your data flows, size the hardware, and design the deployment architecture.

Credited in full against the build if you proceed.

026–10 WEEKS

Pilot deployment

from €120,000

One use case in production on your hardware — real users, real data, measured outcomes.

Fixed-price proposal before we start. Hardware procured at cost.

03ONGOING

Rollout & managed service

from €6,000 / month

Site-wide rollout, integrations, model updates, monitoring, and SLA — operated by us, owned by you.

No per-token costs. No per-seat licences.

See the full engagement model

FAQ

Questions buyers actually ask

The AI model and all processing run on hardware in your own data centre or server room. No data is sent to any external cloud service. You own the hardware, control the access, and the system can operate fully air-gapped — with no internet connection at all.

Our fastest deployment was 3 weeks from contract to production. A typical pilot takes 6–12 weeks depending on integration complexity and your IT readiness.

Fixed per phase: a €9,800 sovereignty assessment (credited against the build), pilots from €120,000, and managed service from €6,000/month. There are no token-based running costs — just infrastructure and support.

We specify it during the assessment. Production-grade inference for a 70B-class model typically lands between €40,000 and €190,000 in hardware, depending on throughput. We procure at cost, or deploy on GPUs you already own.

Often contested, which is the problem. EU–US transfer frameworks remain under court challenge, and a cloud LLM prompt containing patient or client facts is a disclosure to a third party. On-premise deployment removes the question entirely: the data never leaves your network.

Open-weights models — Mistral, Llama, Whisper, and others — fine-tuned on your domain, plus our Olingo model line for healthcare and legal workloads. You own the resulting weights.

Your IT team, with our managed service behind it: monitoring, model updates (delivered offline for air-gapped sites), and an incident SLA. Full handover to your team is an option, not a hostage negotiation.

Yes. We port cloud AI products to customer hardware: local inference replaces external API calls, the stack is containerised, and updates work air-gapped. Fixed scope, typically four to ten weeks.

CONTACT

Talk to engineers, not sales.

We work with organisations ready to own their AI infrastructure. 30-minute session: we map your data flows and propose a deployment architecture. If we're not the right fit, we'll tell you.

WHAT HAPPENS NEXT

  1. 01We map your data flows and existing systems.
  2. 02We propose a deployment architecture for your infrastructure.
  3. 03You leave with a concrete plan. No pitch.
info@ollsoft.com

Or send a question by email — we reply within 48 hours.

Embed blocked? Open the booking page directly