Define every visual attribute as structured JSON instead of hoping natural language gets it right. VGL (Visual Generation Language) gives you explicit, deterministic control over objects, lighting, camera settings, composition, and style for Bria's FIBO models.
Related Skill
Use
bria-ai
to execute these VGL prompts via the Bria API. VGL defines the structured control format; bria-ai handles generation, editing, and background removal.
Core Concept
VGL replaces ambiguous natural language prompts with deterministic JSON that explicitly declares every visual attribute: objects, lighting, camera settings, composition, and style. This ensures reproducible, controllable image generation.
Operation Modes
Mode
Input
Output
Use Case
Generate
Text prompt
VGL JSON
Create new image from description
Edit
Image + instruction
VGL JSON
Modify reference image
Edit_with_Mask
Masked image + instruction
VGL JSON
Fill grey masked regions
Caption
Image only
VGL JSON
Describe existing image
Refine
Existing JSON + edit
Updated VGL JSON
Modify existing prompt
JSON Schema
Output a single valid JSON object with these required keys:
1.
short_description
(String)
Concise summary of image content, max 200 words. Include key subjects, actions, setting, and mood.
Prefer
"standard lens (35mm-50mm)"
or
"portrait lens (50mm-85mm)"
. Avoid wide-angle unless specified.
7.
style_medium
(String)
"photograph"
|
"oil painting"
|
"watercolor"
|
"3D render"
|
"digital illustration"
|
"pencil sketch"
Default to
"photograph"
unless explicitly requested otherwise.
8.
artistic_style
(String)
If not photograph, describe characteristics in max 3 words:
"impressionistic, vibrant, textured"
For photographs, use
"realistic"
or similar.
9.
context
(String)
Describe the image type/purpose:
"High-fashion editorial photograph for magazine spread"
"Concept art for fantasy video game"
"Commercial product photography for e-commerce"
10.
text_render
(Array)
Default: empty array
[]
Only populate if user explicitly provides exact text content:
{
"text"
:
"Exact text from user (never placeholder)"
,
"location"
:
"center | top-left | bottom"
,
"size"
:
"small | medium | large"
,
"color"
:
"white | red | blue"
,
"font"
:
"serif typeface | sans-serif | handwritten | bold impact"
,
"appearance_details"
:
"Metallic finish | 3D effect | etc."
}
Exception: Universal text integral to objects (e.g., "STOP" on stop sign).
11.
edit_instruction
(String)
Single imperative command describing the edit/generation.
Edit Instruction Formats
For Standard Edits (no mask)
Start with action verb, describe changes, never reference "original image":
Category
Rewritten Instruction
Style change
Turn the image into the cartoon style.
Object attribute
Change the dog's color to black and white.
Add element
Add a wide-brimmed felt hat to the subject.
Remove object
Remove the book from the subject's hands.
Replace object
Change the rose to a bright yellow sunflower.
Lighting
Change the lighting from dark and moody to bright and vibrant.
Composition
Change the perspective to a wider shot.
Text change
Change the text "Happy Anniversary" to "Hello".
Quality
Refine the image to obtain increased clarity and sharpness.
For Masked Region Edits
Reference "masked regions" or "masked area" as target:
Intent
Rewritten Instruction
Object generation
Generate a white rose with a blue center in the masked region.
Extension
Extend the image into the masked region to create a scene featuring...
Background fill
Create the following background in the masked region: A vast ocean extending to horizon.
Atmospheric fill
Fill the background masked area with a clear, bright blue sky with wispy clouds.
Subject restoration
Restore the area in the mask with a young woman.
Environment infill
Create inside the masked area: a greenhouse with rows of plants under glass ceiling.
Fidelity Rules
Standard Edit Mode
Preserve ALL visual properties unless explicitly changed by instruction:
Subject identity, pose, appearance
Object existence, location, size, orientation
Composition, camera angle, lens characteristics
Style/medium
Only change what the edit strictly requires.
Masked Edit Mode
Preserve all visible (non-masked) portions exactly
Fill grey masked regions to blend seamlessly with unmasked areas
Match existing style, lighting, and subject matter
Never describe grey masks—describe content that fills them
Example Output
{
"short_description"
:
"A professional businesswoman in a navy blazer stands confidently in a modern glass office, holding a tablet. Natural daylight streams through floor-to-ceiling windows, creating a warm, productive atmosphere."
,
"objects"
:
[
{
"description"
:
"A confident businesswoman in her 30s with shoulder-length dark hair, wearing a tailored navy blazer over a white blouse. She holds a tablet in her left hand while gesturing naturally with her right."
,
"location"
:
"center-right"
,
"relative_size"
:
"large within frame"
,
"shape_and_color"
:
"Human figure, navy and white clothing"
,
"texture"
:
"smooth fabric, professional attire"
,
"appearance_details"
:
"Minimal jewelry, well-groomed professional appearance"
,
"relationship"
:
"Main subject, interacting with tablet"
,
"orientation"
:
"facing slightly left, three-quarter view"
,
"pose"
:
"Standing upright, relaxed professional stance"
,
"expression"
:
"confident, approachable smile"
,
"clothing"
:
"Tailored navy blazer, white silk blouse, dark trousers"
,
"action"
:
"Presenting or reviewing information on tablet"
,
"gender"
:
"female"
,
"skin_tone_and_texture"
:
"Medium warm skin tone, healthy smooth complexion"
}
,
{
"description"
:
"A modern tablet device with a bright display showing charts and graphs"
,
"location"
:
"center, held by subject"
,
"relative_size"
:
"small"
,
"shape_and_color"
:
"Rectangular, silver frame with illuminated screen"
,
"texture"
:
"smooth glass and metal"
,
"appearance_details"
:
"Thin profile, business application visible on screen"
,
"relationship"
:
"Held by businesswoman, focus of her attention"
,
"orientation"
:
"vertical, screen facing viewer at slight angle"
,
"pose"
:
null
,
"expression"
:
null
,
"clothing"
:
null
,
"action"
:
null
,
"gender"
:
null
,
"skin_tone_and_texture"
:
null
,
"number_of_objects"
:
null
}
]
,
"background_setting"
:
"Modern corporate office interior with floor-to-ceiling windows overlooking a city skyline. Minimalist furniture in neutral tones, potted plants adding touches of green."
,
"lighting"
:
{
"conditions"
:
"bright natural daylight"
,
"direction"
:
"side-lit from left through windows"
,
"shadows"
:
"soft, natural shadows"
}
,
"aesthetics"
:
{
"composition"
:
"rule of thirds, medium shot"
,
"color_scheme"
:
"professional blues and neutral whites with warm accents"
,
"mood_atmosphere"
:
"confident, professional, welcoming"
}
,
"photographic_characteristics"
:
{
"depth_of_field"
:
"shallow, background slightly soft"
,
"focus"
:
"sharp focus on subject's face and upper body"
,
"camera_angle"
:
"eye-level"
,
"lens_focal_length"
:
"portrait lens (85mm)"
}
,
"style_medium"
:
"photograph"
,
"artistic_style"
:
"realistic"
,
"context"
:
"Corporate portrait photography for company website or LinkedIn professional profile."
,
"text_render"
:
[
]
,
"edit_instruction"
:
"Generate a professional businesswoman in a modern office environment holding a tablet."
}
Common Pitfalls
Don't invent text
- Keep
text_render
empty unless user provides exact text
Don't over-describe
- Max 5 objects, prioritize most important
Match the mode
- Use correct
edit_instruction
format for masked vs standard edits
Preserve fidelity
- Only change what's explicitly requested
Be specific
- Use concrete values ("85mm portrait lens") not vague terms ("nice camera")
Null for irrelevant
- Human-specific fields should be
null
for non-human objects
curl Example
curl
-X
POST
"https://engine.prod.bria-api.com/v2/image/generate"
\
-H
"api_token:
$BRIA_API_KEY
"
\
-H
"Content-Type: application/json"
\
-H
"User-Agent: BriaSkills/1.2.3"
\
-d
'{
"structured_prompt": "{\"short_description\": \"...\", ...}",
"prompt": "Generate this scene",
"aspect_ratio": "16:9"
}'
References
Schema Reference
- Complete JSON schema with all parameter values
bria-ai
- API client and endpoint documentation for executing VGL prompts