Chatzuri - AI Agents & Workflows for Business Automation

Multimodal means your agent can do more than text. Customers can speak to it, send pictures, share documents, or even video — and your agent can respond in kind.

Find these settings under Settings → Multimodal on any agent.

Voice

Turn on Voice to let customers send audio messages and hear the agent's reply spoken back. Pick a voice from the dropdown — preview each one to find the right tone.

Works on the website widget, WhatsApp (voice notes), and any channel that supports audio.
Real-time speech: customers can interrupt, just like a phone call.
Adds a small per-minute cost — see your team's plan for details.

Images

With Images on, customers can attach photos to a message. Your agent can describe what it sees, identify products, read text from a screenshot, or extract details from a receipt.

Useful for:

Customer support ("here's the error I'm seeing")
E-commerce ("do you have this in red?")
Field operations ("this is the part that broke")
Education ("can you explain this diagram?")

Files

With Files on, customers can attach PDFs, Word docs, and spreadsheets directly to a chat. Your agent reads the contents and answers questions about them.

Heads up

Files attached in chat are not added to the agent's permanent knowledge — they stay scoped to that conversation. Use the Sources tab for permanent training data.

Video

Video support lets customers share a clip — your agent extracts audio and key frames, then reasons about both. Best for product demos, onboarding walkthroughs, or troubleshooting.

Streaming

Already enabled by default. Replies appear word-by-word so the conversation feels alive. Turn off only for SMS-style channels where partial messages don't make sense.

Channel support

Not every channel supports every modality. Roughly:

Website widget — supports everything
WhatsApp — text, images, voice notes, files
Telegram — text, images, voice notes, files, video
SMS — text only (and short text at that)
Email — text and file attachments
Slack / Messenger / Instagram — text, images, files

Voice

Turn on Voice to let customers send audio messages and hear the agent's reply spoken back. Pick a voice from the dropdown — preview each one to find the right tone.

Works on the website widget, WhatsApp (voice notes), and any channel that supports audio.

Real-time speech: customers can interrupt, just like a phone call.

Adds a small per-minute cost — see your team's plan for details.

Images

With Images on, customers can attach photos to a message. Your agent can describe what it sees, identify products, read text from a screenshot, or extract details from a receipt.

Useful for:

Customer support ("here's the error I'm seeing")

E-commerce ("do you have this in red?")

Field operations ("this is the part that broke")

Education ("can you explain this diagram?")

Files

With Files on, customers can attach PDFs, Word docs, and spreadsheets directly to a chat. Your agent reads the contents and answers questions about them.

Heads up

Files attached in chat are not added to the agent's permanent knowledge — they stay scoped to that conversation. Use the Sources tab for permanent training data.

Channel support

Not every channel supports every modality. Roughly:

Website widget — supports everything

WhatsApp — text, images, voice notes, files

Telegram — text, images, voice notes, files, video

SMS — text only (and short text at that)

Email — text and file attachments

Slack / Messenger / Instagram — text, images, files

Voice, video, images, and files

Voice

Images

Files

Video

Streaming

Channel support

Voice, video, images, and files

Voice

Images

Files

Video

Streaming

Channel support