AI Services

AI Butler is model-agnostic and provider-agnostic. You bring the keys, AI Butler handles the plumbing. This page covers the optional AI services (beyond the core text model).

Speech-to-Text (STT)

Used for voice messages on Telegram, Discord, Slack, WhatsApp, and the web chat microphone.

Provider	Notes
whisper	Default. Whisper.cpp via local binary when present, otherwise Whisper API.
stub	No-op — useful for testing without audio

configurations:
  voice:
    stt_provider: whisper

Check current status:

aibutler voice status
aibutler voice providers

Text-to-Speech (TTS)

Used for voice replies on channels that support voice messages (Telegram, Discord, Slack, WhatsApp).

Provider	Notes
stub	Default — no-op (no audio output)
piper	Fully local CPU-only TTS via the Piper binary

configurations:
  voice:
    tts_provider: piper

Vision

Image understanding is handled by the primary model if it’s vision-capable (Claude 3+, GPT-4o, Gemini 1.5+, LLaVA via Ollama). No extra configuration — just send an image in any channel.

AI Image Generation

Tool-based image generation for creative tasks.

Provider	Tool	Vault Key
DALL-E 3	`image.generate`	`openai_api_key`
Stable Diffusion	`image.generate`	`stability_api_key`
Flux	`image.generate`	`replicate_api_key`

AI Design Tools

Integration with design platforms for generating branded assets.

Provider	Tools	Vault Key
Canva	`design.canva_create`, `design.canva_get`	`canva_api_key`
Figma	`design.figma_read`, `design.figma_comment`	`figma_api_key`

3D Generation

Text-to-3D and image-to-3D for creative and smart-home projects.

Provider	Tools	Vault Key
Meshy	`threed.meshy_text_to_3d`	`meshy_api_key`
Tripo	`threed.tripo_text_to_3d`	`tripo_api_key`
Luma	`threed.luma_genie`	`luma_api_key`

BYOK Pattern

Every AI service follows the same pattern: tools are always registered, but they return "configure API key" errors until you store the credential in the vault:

aibutler vault set canva_api_key YOUR_KEY

This lets you enable services one at a time without restarting or touching config files.

Local-First Option

For a fully-local deployment with zero API keys:

Text model — Ollama (Llama 3.3, Mistral, Qwen, etc.)
STT — whisper.cpp or Ollama
TTS — Piper
Embeddings — Ollama
Vision — Ollama with LLaVA

docker compose -f docker-compose.ollama.yml up -d

See Choose Your AI for a comparison of providers.