Qwen - CowAgent

Qwen (Alibaba DashScope / Bailian) is one of the most fully-featured providers. Text, image understanding, image generation, speech-to-text, text-to-speech, and embedding can all be enabled with a single dashscope_api_key.

All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.

Text Chat

{
  "model": "qwen3.7-plus",
  "dashscope_api_key": "YOUR_API_KEY"
}

Parameter	Description
`model`	Can be `qwen3.7-plus`, `qwen3.7-max`, `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc.
`dashscope_api_key`	Create one in the Bailian Console; see the official docs

Image Understanding

Once dashscope_api_key is configured, the Agent’s Vision tool automatically calls Qwen’s vision models to recognize images. Models like qwen3.7-plus / qwen3.6-plus / qwen3.5-plus / qwen3-max are already multimodal; if the main model is text-only (e.g. qwen-turbo), it automatically falls back to qwen-vl-max. To manually specify a Vision model:

{
  "tools": {
    "vision": {
      "model": "qwen3.7-plus"
    }
  }
}

Supported models: qwen3.7-plus, qwen3.6-plus, qwen3.5-plus, qwen3-max.

Image Generation

{
  "skills": {
    "image-generation": {
      "model": "qwen-image-2.0"
    }
  }
}

Available models: qwen-image-2.0, qwen-image-2.0-pro.

Speech-to-Text (ASR)

{
  "voice_to_text": "dashscope",
  "voice_to_text_model": "qwen3-asr-flash"
}

Parameter	Description
`voice_to_text`	Set to `dashscope` to enable Qwen ASR
`voice_to_text_model`	Optional, defaults to `qwen3-asr-flash`

Credentials are automatically reused from dashscope_api_key. A single audio segment should be smaller than 10MB and no longer than 300 seconds.

Text-to-Speech (TTS)

{
  "text_to_voice": "dashscope",
  "text_to_voice_model": "qwen3-tts-flash",
  "tts_voice_id": "Cherry"
}

Parameter	Description
`text_to_voice_model`	Optional, defaults to `qwen3-tts-flash`; covers Mandarin, dialects, and major foreign languages
`tts_voice_id`	Voice ID; see the common list below

Common voice examples:

Voice ID	Description
`Cherry`	Qianyue · Sunny Female Voice
`Serena`	Suyao · Gentle Female Voice
`Ethan`	Chenxu · Sunny Male Voice
`Chelsie`	Qianxue · Anime Girl
`Dylan`	Beijing Dialect · Xiaodong
`Rocky`	Cantonese · Aqiang
`Sunny`	Sichuan Dialect · Qing’er

The full voice list (Mandarin / regional dialects / bilingual, etc.) can be selected visually in the Web Console under “Model Management → Text-to-Speech”.

Embedding

{
  "embedding_provider": "dashscope",
  "embedding_model": "text-embedding-v4"
}

The default model is text-embedding-v4. After changing the embedding, run /memory rebuild-index to rebuild the index.

​Text Chat

​Image Understanding

​Image Generation

​Speech-to-Text (ASR)

​Text-to-Speech (TTS)

​Embedding

Text Chat

Image Understanding

Image Generation

Speech-to-Text (ASR)

Text-to-Speech (TTS)

Embedding