Our Take on AI Week 2025
THE Ò-BLOG
Vision and Language: bringing multimodal AI to manufacturing

16 May 2025
Reading time: 3 minutes
When our CTO Luca Antiga took the stage, he didn’t deliver the usual polished story you often hear at AI conferences. Instead, he spoke about manufacturing, agriculture, and real processes.
He spoke about cutting-edge multimodal AI models – with both feet firmly on the ground.
Vision and language: AI that sees, reads, and understands
In his talk, “Vision and Language: Multimodal AI in the Real World”, Luca explored the potential of multimodal models for high-complexity industrial environments.
He focused on Vision-Language Models (VLMs), still considered cutting-edge or academic by many, but, as we know in this field, they’re about to hit the mainstream. In AI, things move fast: yesterday’s research paper is today’s product.
These are models that can see and interpret electrical diagrams, schematics, industrial images, and also read and interact with text, tables, and operational instructions. And they’re not just built to impress, they’re built to work where it matters most: on factory floors, in food production lines, out in the field.

Manufacturing: the critical last mile
Luca broke down what it really means to run visual inspections in production: shifting lighting, high-speed, non-uniform materials, and performance thresholds that must exceed 99%.
This is the environment where our vision models operate, deployed at the edge, integrated with cameras, PLCs, and production systems, and optimized down to the millisecond.
This is also where AI-GO, comes in: our platform that enables teams to train and deploy computer vision models in minutes, with just a handful of examples.
And we’re not stopping there. AI-GO is evolving to natively integrate Vision-Language Models, enabling prompt-based visual quality inspections with one goal in mind:remove friction, reduce overhead, and deliver results fast.
Agriculture: from field to data
Yes, we work in manufacturing – but also in agriculture, where traditional computer visiontraditional computer vision strugglesdue to extreme variability and complex conditions. Field inspections are still manual, subjective, and hard to scale.
With Qualyfruit on-the-go, we make the process automated, objective, and continuous. A simple consumer-grade camera, mounted on a tractor, collects geo-tagged images. AI does the rest.
From visual crop quality to disease mapping, from yield estimation to harvest planning: precision agriculture is already here. And it works.Low cost, measurable ROI.

Technical documentation: Teki Doc reads the manual, so you don’t have to
Anyone in production knows: manuals only get opened when there’s a problem.
But managing technical documentation remains a high-friction, high-impact challenge across industries.
With Teki Doc, technical documents become fully conversational.
Thanks to multimodal models, users can ask questions, even about visual elements like diagrams or technical drawings, and receive accurate answers, with direct links to the relevant pages.
And yes, it works with voice commands too, because your hands are probably busy.
We stress-tested it with the most complex assembly manual from a certain well-known Swedish furniture company (you know the one, the manual with just pictures). Spoiler: it passed the test. Here is the full video!

Getting our hands dirty (always)
That’s how we like it: getting our hands dirty, earning from mistakes, and building AI solutions that actually work in production.
If you were in the room and want to go deeper, if you missed something, or if you’ve got a problem worth solving: reach out.📩 info@orobix.com
We’re here to talk about multimodal AI in production. No buzzwords. Just a genuine drive to solve real problems.