Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cowagent.ai/llms.txt

Use this file to discover all available pages before exploring further.

Qwen (Alibaba DashScope / Bailian) is one of the most fully-featured vendors. Text, image understanding, image generation, speech-to-text, text-to-speech, and embedding can all be enabled with a single dashscope_api_key.
All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.

Text Chat

{
  "model": "qwen3.6-plus",
  "dashscope_api_key": "YOUR_API_KEY"
}
ParameterDescription
modelCan be qwen3.6-plus, qwen3.7-max, qwen3.5-plus, qwen3-max, qwen-max, qwen-plus, qwen-turbo, qwq-plus, etc.
dashscope_api_keyCreate one in the Bailian Console; see the official docs

Image Understanding

Once dashscope_api_key is configured, the Agent’s Vision tool automatically calls Qwen’s vision models to recognize images. Models like qwen3-max / qwen3.5-plus / qwen3.6-plus are already multimodal; if the main model is text-only (e.g. qwen-turbo), it automatically falls back to qwen-vl-max. To manually specify a Vision model:
{
  "tools": {
    "vision": {
      "model": "qwen3.6-plus"
    }
  }
}
Supported models: qwen3.6-plus, qwen3.5-plus, qwen3-max.

Image Generation

{
  "skills": {
    "image-generation": {
      "model": "qwen-image-2.0"
    }
  }
}
Available models: qwen-image-2.0, qwen-image-2.0-pro.

Speech-to-Text (ASR)

{
  "voice_to_text": "dashscope",
  "voice_to_text_model": "qwen3-asr-flash"
}
ParameterDescription
voice_to_textSet to dashscope to enable Qwen ASR
voice_to_text_modelOptional, defaults to qwen3-asr-flash
Credentials are automatically reused from dashscope_api_key. A single audio segment should be smaller than 10MB and no longer than 300 seconds.

Text-to-Speech (TTS)

{
  "text_to_voice": "dashscope",
  "text_to_voice_model": "qwen3-tts-flash",
  "tts_voice_id": "Cherry"
}
ParameterDescription
text_to_voice_modelOptional, defaults to qwen3-tts-flash; covers Mandarin, dialects, and major foreign languages
tts_voice_idVoice ID; see the common list below
Common voice examples:
Voice IDDescription
CherryQianyue · Sunny Female Voice
SerenaSuyao · Gentle Female Voice
EthanChenxu · Sunny Male Voice
ChelsieQianxue · Anime Girl
DylanBeijing Dialect · Xiaodong
RockyCantonese · Aqiang
SunnySichuan Dialect · Qing’er
The full voice list (Mandarin / regional dialects / bilingual, etc.) can be selected visually in the Web Console under “Model Management → Text-to-Speech”.

Embedding

{
  "embedding_provider": "dashscope",
  "embedding_model": "text-embedding-v4"
}
The default model is text-embedding-v4. After changing the embedding, run /memory rebuild-index to rebuild the index.