A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. No need to choose a model manually — the script automatically selects a configured provider based on a fixed priority order.
Model Selection
image-generation uses a “fixed priority + automatic fallback” strategy — just configure your keys and it works:
- Priority order:
OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
- Unconfigured providers are skipped: only providers with an API key participate
- Automatic fallback on failure: on errors like 401, model not enabled, or network issues, the next provider is tried
- Specified model goes first: if a specific model name is provided, its provider is promoted to the front
Supported Models
| Provider | Models / Aliases | Notes |
|---|
| OpenAI | gpt-image-2, gpt-image-1 | General-purpose, high quality, supports quality parameter |
| Gemini Nano Banana | nano-banana-2, nano-banana-pro, nano-banana | Corresponds to gemini-3.1-flash, gemini-3-pro, gemini-2.5-flash image variants |
| Seedream (Volcengine Ark) | seedream-5.0-lite, seedream-4.5 | Native 2K–4K, up to 14 reference images for fusion |
| Qwen (DashScope) | qwen-image-2.0, qwen-image-2.0-pro | Strong with Chinese text rendering and text-image layouts |
| MiniMax | image-01 | Fast and simple image generation |
| LinkAI | Any model | Universal proxy, used as fallback |
By default, the Agent does not pick a model — it uses automatic routing. If you want a specific model, just say so in the conversation, e.g. “use seedream to draw a cat” or “generate a poster with gpt-image-2”. You can also pin a default model via the “Custom Configuration” section below.
Custom Configuration
API Key Setup
You need at least one provider key. Configuring multiple providers enables automatic fallback. There are three ways to set up keys:
Option 1: Automatic Reuse of Existing Keys
If you have already configured model keys in the web console or config.json (e.g. openai_api_key, gemini_api_key, etc.), these keys are automatically synced to the corresponding environment variables at startup. In other words, if your chat model works, image generation can use the same key with zero extra configuration.
Add the key fields directly to config.json:
{
"openai_api_key": "sk-xxx",
"openai_api_base": "https://api.openai.com/v1",
"gemini_api_key": "AIza-xxx",
"ark_api_key": "xxx",
"dashscope_api_key": "sk-xxx",
"minimax_api_key": "xxx",
"linkai_api_key": "xxx"
}
A restart is required after changes. Each key also has a corresponding *_api_base field for custom endpoints.
Send an API key in the chat and the Agent will save it to ~/cow/.env using the env_config tool — no restart needed. For example:
Set OPENAI_API_KEY to sk-xxx
Or:
Configure ARK_API_KEY as xxx
API Key Reference
| Environment Variable | config.json Field | Provider | Default Base URL |
|---|
OPENAI_API_KEY | openai_api_key | OpenAI | https://api.openai.com/v1 |
GEMINI_API_KEY | gemini_api_key | Gemini | https://generativelanguage.googleapis.com |
ARK_API_KEY | ark_api_key | Volcengine Ark (Seedream) | https://ark.cn-beijing.volces.com/api/v3 |
DASHSCOPE_API_KEY | dashscope_api_key | Alibaba DashScope (Qwen) | https://dashscope.aliyuncs.com |
MINIMAX_API_KEY | minimax_api_key | MiniMax | https://api.minimaxi.com |
LINKAI_API_KEY | linkai_api_key | LinkAI | https://api.link-ai.tech |
Pinning a Default Model
To force all image generation through a specific provider’s model, add this to config.json:
"skill": {
"image-generation": {
"model": "seedream-5.0-lite"
}
}
At startup, this is automatically converted to the environment variable SKILL_IMAGE_GENERATION_MODEL, and the script will always use this model’s provider for generation.
Enabling and Disabling
image-generation is a built-in skill that automatically adjusts its status based on API keys:
- Key configured: the skill is active — the Agent will invoke it when asked to draw
- Key not configured: the skill still appears in context (marked as “needs configuration”) — the Agent will guide the user to set up a key rather than failing silently
To control it manually:
/skill disable image-generation # Disable (won't be invoked even if keys are present)
/skill enable image-generation # Re-enable
In the terminal: cow skill disable image-generation / cow skill enable image-generation.
Parameters
| Parameter | Type | Required | Default | Description |
|---|
prompt | string | Yes | — | Image description |
image_url | string / list | No | null | Input image(s) for editing — local path or URL. Pass multiple for multi-image fusion |
quality | string | No | auto | low / medium / high — only some providers support this |
size | string | No | auto | 512 / 1K / 2K / 3K / 4K, or pixel value like 1024x1024 |
aspect_ratio | string | No | null | 1:1 / 3:2 / 2:3 / 16:9 / 9:16 / 21:9; Gemini also supports 1:4 / 4:1 / 1:8 / 8:1 |
Higher quality and larger size cost more and take longer.
- For everyday conversations and quick previews, use the defaults (
auto) or quality=low + size=1K — roughly 20 seconds
- For posters or when the user explicitly asks for high resolution, use
quality=high + size=2K/4K — may take 1–5 minutes depending on the model
Output
On success:
{
"model": "doubao-seedream-5-0-260128",
"images": [
{"url": "/path/to/output.png"}
]
}
On failure: { "error": "..." }. After an error, do not retry directly — it is almost always a configuration issue (wrong key, incorrect API base, model not enabled). Have the user fix the configuration first.
Common Use Cases
- Text-to-image: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
- Image-to-image: change styles, swap elements, add decorations or text on an existing image
- Multi-image fusion: combine multiple reference images into one (outfit swaps, character group photos, etc.)
- Bash timeout should be set to 600 seconds. Each provider has a 300-second HTTP timeout, but the script may try multiple providers sequentially
- Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px
- Gemini / Seedream / Qwen / MiniMax do not support the
quality parameter — passing it has no effect
- Seedream defaults to 2K;
seedream-5.0-lite supports up to 3K; seedream-4.5 supports up to 4K