MiniCPM-o 4.5 Multimodal Streaming AI: What if we built smarter, localized AI assistants?
Article Content
Recently, I came across MiniCPM-o 4.5, a streaming duplex audio-video multimodal AI that processes video and audio in real time, generating both text and speech. It impressively understands English and Chinese, can clone voices, mimic roles, and tackle OCR and document parsing in over 30 languages. This blend of SigLip2, Whisper-medium, CosyVoice2, and Qwen3-8B packs 9 billion parameters, promising real-time dialogues and multimodal interactions.
Thinking about Norwegian SMBs, especially those with 10-50 employees, this kind of technology could revolutionize everyday operations. Imagine Jonas, who runs a 35-person ventilation company in Drammen. Handling customer calls, invoices through Tripletex or Fiken, and ensuring compliance with NAV and Skatteetaten requires time and attention. At 400-600 NOK per hour for admin and over 1000 NOK for specialists, automating conversations or document parsing with AI could save both money and effort.
What if you could have a smart assistant that listens to your calls, understands your Norwegian business context, and automatically drafts texts, schedules meetings, or even files documents? Instead of relying on models limited to English and Chinese, imagine leveraging AI that integrates directly with Norwegian APIs like Vipps for payments or Altinn for reporting, all while respecting GDPR.
Here’s how this CAN be done: I’d use a local AI setup combining models like Whisper (for Norwegian speech recognition) and OpenAI or Claude APIs for language understanding, layered with n8n workflows to connect Tripletex, Fiken, Vipps, and NAV APIs. This low-code approach lets you build prototypes quickly — in 2-4 weeks — that handle voice commands, generate replies, and automate tedious tasks. Adding Telegram bots can bring notifications and data collection into one place. The key is not to chase perfect enterprise solutions but to focus on “good enough to test” that actually works in your daily business.
This approach fits SMB owners eager to optimize operations without hefty investments in custom AI training or complex programming. It’s less suited for massive enterprises or highly specialized tasks requiring bespoke ML models. Also, if you need native mobile apps or hardware integration, that’s outside this scope.
So, what if your next employee was an AI assistant that understood your business and spoke your language? How would that change your workflow and free up your time?
Resources: