OpenAI - CowAgent

OpenAI offers the most complete coverage and can simultaneously serve text chat, vision understanding, image generation, speech-to-text (ASR), text-to-speech (TTS), and embedding. A single open_ai_api_key lets the Agent use all of these capabilities.

All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.

Text Chat

{
  "model": "gpt-5.5",
  "open_ai_api_key": "YOUR_API_KEY",
  "open_ai_api_base": "https://api.openai.com/v1"
}

Parameter	Description
`model`	Same as OpenAI’s model parameter; supports `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, the `gpt-5` series, `gpt-4.1`, the o-series, etc. Agent mode defaults to `gpt-5.5`; use `gpt-5.4` for better cost-efficiency
`open_ai_api_key`	Create one on the OpenAI Platform
`open_ai_api_base`	Optional; change it to access a third-party proxy
`bot_type`	Not required when using OpenAI’s official models; set to `openai` when accessing other providers via the compatible protocol

Image Understanding

OpenAI models like gpt-5.5, gpt-5.4, gpt-4o, and gpt-4.1 natively support vision. Once open_ai_api_key is configured, the Agent’s Vision tool automatically uses the main model to recognize images. If the main model does not support vision or you want to specify it explicitly, set it in the configuration file:

{
  "tools": {
    "vision": {
      "model": "gpt-5.4-mini"
    }
  }
}

Supported Vision models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4o.

Image Generation

Specify the image generation model in the configuration file; the Agent automatically routes image generation skill calls to OpenAI:

{
  "skills": {
    "image-generation": {
      "model": "gpt-image-2"
    }
  }
}

Supported image generation models: gpt-image-2, gpt-image-1.

Speech-to-Text (ASR)

{
  "voice_to_text": "openai",
  "voice_to_text_model": "gpt-4o-mini-transcribe"
}

Parameter	Description
`voice_to_text`	Set to `openai` to enable OpenAI speech-to-text
`voice_to_text_model`	Optional, defaults to `gpt-4o-mini-transcribe`; can also be `gpt-4o-transcribe`, `whisper-1`

Credentials are automatically reused from open_ai_api_key.

Text-to-Speech (TTS)

{
  "text_to_voice": "openai",
  "text_to_voice_model": "tts-1",
  "tts_voice_id": "alloy"
}

Parameter	Description
`text_to_voice_model`	`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`
`tts_voice_id`	Voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`, `ash`, `ballad`, `coral`, `sage`, `verse`

Embedding

{
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small"
}

Available models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002. After changing the embedding, run /memory rebuild-index to rebuild the index.

​Text Chat

​Image Understanding

​Image Generation

​Speech-to-Text (ASR)

​Text-to-Speech (TTS)

​Embedding

Text Chat

Image Understanding

Image Generation

Speech-to-Text (ASR)

Text-to-Speech (TTS)

Embedding