Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A singleDocumentation Index
Fetch the complete documentation index at: https://docs.cowagent.ai/llms.txt
Use this file to discover all available pages before exploring further.
zhipu_ai_api_key enables all capabilities.
Text Chat
| Parameter | Description |
|---|---|
model | Can be glm-5.1, glm-5-turbo, glm-5, glm-4.7, glm-4-plus, glm-4-flash, glm-4-air, etc. See model codes |
zhipu_ai_api_key | Create one in the Zhipu AI Console |
zhipu_ai_api_base | Optional, defaults to https://open.bigmodel.cn/api/paas/v4 |
Image Understanding
Zhipu’s chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent’s Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.
Speech-to-Text (ASR)
| Parameter | Description |
|---|---|
voice_to_text | Set to zhipu to enable Zhipu ASR |
voice_to_text_model | Optional, defaults to glm-asr-2512 |
zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.
Embedding
embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.