Turn your cameras into AI agents your team can govern — frame by frame.
Alquimia Vision is the sovereign platform for real-time computer vision at enterprise scale — configurable in plain language, composable across cameras, and governable from prompt to event. No retraining. No data science team.
Real-time at frame speed
Detect, track, and identify across cameras as it happens. Multi-camera identity stays persistent end to end, so the same person or vehicle is the same entity in your event stream.
Configurable by prompt
Define new use cases in plain language. No retraining, no labeling pipeline, no data science team required. The day your protocol changes, the prompt changes — the pipeline does not.
Sovereign, composable, governable
Run on your infrastructure — on-prem, private cloud, or hybrid. Replace any model. Audit every decision from prompt to event.
How Vision works.
Vision is composable by design. Specialized models do what they are best at — fast, precise detection and tracking. A vision-language model (VLM) does what only it can do — zero-shot semantic reasoning, configured by prompt. The two layers work together so the platform stays both fast and flexible.
Workflows
No-code configuration: when a condition is met, where in your camera grid, analyze with this prompt. Triggers and prompts in plain language, no code required.
Real-time pipeline
Detector, tracker, and cross-camera embeddings running at frame speed. Persistent identity within and across cameras, so an entity is the same entity from the moment it appears to the moment it leaves your scene.
VLM reasoning, three levels
Visual reasoning invoked only when the rest of the system needs it: Level A on a single object crop (attribute classification), Level B on a full frame (spatial relations), Level C on a temporal sequence (what happened over time).
Plugins
Opt-in vertical capabilities for domain-specific tasks: license plate OCR, PPE detection, human pose, face identity, person re-identification. Activate only the ones your case needs.
Event stream + governance
Structured events flow through a NATS broker to your existing systems — SOC, observability stack, custom integrations — with OpenTelemetry traces and full audit history. Every decision is inspectable end to end.
If your reference for vision is closed CCTV analytics or pure VLM-on-every-frame, this is a different architecture.
How a Workflow looks in production.
Take a security checkpoint use case. Guards in orange vests review visitors who arrive without vests. The goal is to detect, for every visitor, whether they were actually reviewed by a guard. Two Workflows configured in plain language are enough.
- WHEN
- A new entity is detected (class: person)
- WHERE
- Camera 04 — Main Entry, anywhere in frame
- ANALYZE
- Use VLM Level A on a crop of the person. Prompt: "Is this person a guard wearing an orange vest, or a visitor without a vest?"
- RESULT
- Each person is tagged with their role, once, the moment they appear.
- WHEN
- A tracked entity leaves the scene (class: person, role: visitor, duration > 60 seconds)
- WHERE
- Anywhere in the camera grid
- ANALYZE
- Use VLM Level C on five crops sampled across the entity's time on scene. Prompt: "Was this person reviewed by a guard? At what moment, and by whom?"
- RESULT
- A structured event is published with the visitor's identity, the answer, the timestamp, and a reference to the guard who reviewed them.
No retraining. No new code. Two prompts and two trigger configurations. If tomorrow the convention changes — guards now wear green vests, or the use case shifts entirely — only the prompts change. The pipeline stays the same.
Solutions in production.
Verify security checkpoint compliance
Detect, in real time, whether visitors are being reviewed by guards according to your protocol. Configure the rule in plain language; the system handles identity, attribution, and event structure.
Track perimeter access across cameras
Follow a person or vehicle across multiple cameras with persistent identity. Detect entries to restricted zones, lingering, and unauthorized cross-camera movement.
Enforce safety and quality on the production floor
Detect missing PPE, unsafe poses, and quality defects in real time. Route alerts to your operations stack with the evidence frame attached.
Industries we serve.
Chips filter the Solutions hub by industry. No deep page in v1 — promoted when a customer case is authorized.
Open source by design.
Open code, replaceable components, no vendor lock-in. We craft Gaussia, our open evaluation suite, for the community — so every behavioral metric we publish is reproducible in your environment.
From the team.
Composable vision: where the model ends and the prompt begins.
Why the right architecture for real-time vision is not a single big model, but a pipeline that calls a VLM only when it needs to.
Persistent identity across a camera grid, explained.
What it takes to make sure the same person is the same entity from the moment they appear to the moment they leave the scene.
Workflows: the no-code config that survives a protocol change.
How When / Where / Analyze / Result keeps the pipeline stable while the prompt absorbs the change.
Bring your camera feed. We'll walk you through it.
We work with enterprise teams running real-time vision on their own infrastructure. A short call is enough to see if Alquimia Vision is the right fit for your case.
Get in touch