Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cowagent.ai/llms.txt

Use this file to discover all available pages before exploring further.

Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single zhipu_ai_api_key enables all capabilities.
All capabilities below can be configured in one place via the “Model Management” page in the Web Console, with no need to manually edit the configuration file.

Text Chat

{
  "model": "glm-5.1",
  "zhipu_ai_api_key": "YOUR_API_KEY"
}
ParameterDescription
modelCan be glm-5.1, glm-5-turbo, glm-5, glm-4.7, glm-4-plus, glm-4-flash, glm-4-air, etc. See model codes
zhipu_ai_api_keyCreate one in the Zhipu AI Console
zhipu_ai_api_baseOptional, defaults to https://open.bigmodel.cn/api/paas/v4

Image Understanding

Zhipu’s chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent’s Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.

Speech-to-Text (ASR)

{
  "voice_to_text": "zhipu",
  "voice_to_text_model": "glm-asr-2512"
}
ParameterDescription
voice_to_textSet to zhipu to enable Zhipu ASR
voice_to_text_modelOptional, defaults to glm-asr-2512
Credentials are automatically reused from zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.

Embedding

{
  "embedding_provider": "zhipu",
  "embedding_model": "embedding-3"
}
Available models: embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.