OpenAI Image Vision

Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.

Setup

This skill requires at least one of the following API keys (OpenAI is preferred when both are set):

OpenAI

(preferred):

env_config(action="set", key="OPENAI_API_KEY", value="your-key")

LinkAI

(fallback):

env_config(action="set", key="LINKAI_API_KEY", value="your-key")

Optional: Set custom API base URL:

env_config

(

action

=

"set"

,

key

=

"OPENAI_API_BASE"

,

value

=

"your-base-url"

)

Usage

Important

Scripts are located relative to this skill's base directory.
When you see this skill in

, note the

path.
CRITICAL: Always use bash command to execute the script:

General pattern (MUST start with bash):

bash "/scripts/vision.sh" "" "" [ model ]

DO NOT execute the script directly like this (WRONG):

"/scripts/vision.sh" ...

Parameters:

- image_path_or_url: Local image file path or HTTP(S) URL (required)

- question: Question to ask about the image (required)

- model: OpenAI model to use (default: gpt-4.1-mini)

Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo

Examples

Analyze a local image

bash

"/scripts/vision.sh"

"/path/to/image.jpg"

"What's in this image?"

Analyze an image from URL

bash

"/scripts/vision.sh"

"https://example.com/image.jpg"

"Describe this image in detail"

Use specific model

bash

"/scripts/vision.sh"

"/path/to/photo.png"

"What colors are prominent?"

"gpt-4o-mini"

Extract text from image

bash

"/scripts/vision.sh"

"/path/to/document.jpg"

"Extract all text from this image"

Analyze multiple aspects

bash

"/scripts/vision.sh"

"image.jpg"

"List all objects you can see and describe the overall scene"

Supported Image Formats

JPEG (.jpg, .jpeg)

PNG (.png)

GIF (.gif)

WebP (.webp)

Performance Optimization

Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.

Response Format

The script returns a JSON response:

{

"model"

:

"gpt-4.1-mini"

,

"content"

:

"The image shows..."

,

"usage"

:

{

"prompt_tokens"

:

1234

,

"completion_tokens"

:

567

,

"total_tokens"

:

1801

}

Or in case of error:

{

"error"

:

"Error description"

,

"details"

:

"Additional error information"

}

Notes

Image size

Images are automatically resized if too large

Timeout

60 seconds for API calls

Rate limits

Subject to your OpenAI API plan limits

Privacy

Images are sent to OpenAI's servers for processing

Local files

Automatically converted to base64 for API submission
URLs: Can be passed directly to the API without downloading

openai-image-vision

安装

General pattern (MUST start with bash):

DO NOT execute the script directly like this (WRONG):

"/scripts/vision.sh" ...

Parameters:

- image_path_or_url: Local image file path or HTTP(S) URL (required)

- question: Question to ask about the image (required)

- model: OpenAI model to use (default: gpt-4.1-mini)

Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo