Generation Guide

Master every workspace — from text-to-image to 3D objects.

How generation works

Every generation on Gizmoji follows the same flow regardless of media type:

Choose a studio (Image, Video, Audio, Music, 3D, or Avatar).
Select a model — each model has a credit cost displayed on its card.
Configure parameters — each model has its own set of inputs (prompt, aspect ratio, duration, etc.).
Hit Generate — credits are pre-held and the job enters the queue.
Watch the real-time progress indicator as your asset is created.
Download, iterate, or approve the result.

If a job fails for any reason, your credits are automatically refunded in full.

Model selection

Gizmoji offers a wide range of AI models organized into three pricing tiers:

Budget — Fast and affordable. Ideal for drafting, iteration, and exploring ideas quickly.
Value — Balanced quality and cost. Best for most production work.
Premium — Highest quality output. Use for hero assets, final renders, and client-facing work.

Each model card shows its credit cost, average generation time, and supported parameters. A good workflow is to draft with budget models, refine your prompt, then switch to a premium model for the final generation.

Image Studio

The Image Studio is the most versatile workspace, supporting two primary modes:

Text-to-image — Describe what you want in a text prompt. Be specific about style, composition, lighting, mood, and subject matter for the best results.
Image-to-image — Provide a reference image along with a prompt. The AI uses the reference as a starting point and transforms it according to your instructions. Adjust the strength parameter to control how much the output deviates from the input.

Common parameters include aspect ratio (portrait, landscape, square, and custom), quality level, seed (for reproducible results), and negative prompts (to exclude unwanted elements). Not all parameters are available on every model — the form dynamically adapts to each model's capabilities.

Video Studio

The Video Studio supports multiple generation modes for different creative needs:

Text-to-video — Generate a video clip directly from a text description. Best for quick concepts and motion studies.
Image-to-video — Animate an approved still frame into a video. This is the recommended workflow for production-quality output: generate and approve a frame in Image Studio first, then bring it to Video Studio.
Video extend — Extend an existing video clip with additional frames, continuing the motion and narrative.
Video transform — Apply style changes to an existing video while preserving its motion and composition.
Visual effects — Add VFX treatments, transitions, and stylistic effects to video clips.
Character animation — Animate characters with motion, expressions, and gestures from text or audio input.

Important: For image-to-video, the source image must be approved first. This ensures you're investing video credits in a frame you're happy with. Approve any image from its detail view or the review panel.

Audio Studio

Generate spoken audio and sound effects with AI:

Text-to-speech — Convert written text into natural-sounding speech. Choose from multiple voices with different accents, genders, and tonal qualities. Adjust speed and emphasis for the delivery you need.
Sound effects — Describe an ambient sound, foley effect, or audio texture and the AI generates it. Useful for adding atmosphere to video projects.

Output formats include MP3 and WAV. Audio assets integrate directly into your project timeline alongside visual assets.

Music Studio

Create original music tracks and scores:

Text-to-music — Describe the mood, genre, tempo, and instrumentation you want. The AI generates an original instrumental track.
Audio-to-music — Provide a reference audio clip and the AI generates music that matches its style, tempo, or mood.

Music assets are great for scoring video projects, creating background tracks for podcasts, or generating royalty-free music for content.

3D Studio

Create 3D objects from text descriptions or reference images:

Text-to-3D — Describe an object and the AI generates a textured 3D model.
Image-to-3D — Provide a reference image and the AI infers the 3D structure and generates a model that matches it.

Output formats include GLB and OBJ, ready for import into game engines (Unity, Unreal), 3D editors (Blender), AR/VR applications, and web viewers.

Avatar Studio

Create character animations and talking-head videos:

Lip-sync — Provide a portrait image and an audio track. The AI animates the face to match the speech, producing a realistic talking-head video.
Face animation — Animate facial expressions and head movements on a portrait image from a reference video or audio input.
Face swap — Transfer one face onto another in a video or image, maintaining the original motion and expressions.
Portrait transfer — Apply the style or likeness of a portrait across different contexts and poses.

Avatar models work best with clear, front-facing reference images and high-quality audio input for lip-sync.

Prompting tips

The quality of your output depends heavily on your prompt. Here are tips that apply across all studios:

Be specific — Instead of “a cat,” try “a tabby cat sitting on a windowsill, afternoon sunlight, soft bokeh background, photorealistic.”
Describe style and mood — Include art style (cinematic, anime, watercolor), lighting (golden hour, studio lighting, neon), and mood (serene, dramatic, playful).
Use negative prompts — Where supported, negative prompts exclude unwanted elements (e.g. “blurry, low quality, text, watermark”).
Iterate with budget models — Refine your prompt using fast, low-cost models. Once you're happy with the composition and style, switch to a premium model for the final render.
Use the prompt optimizer — The built-in AI prompt optimizer rewrites your prompt to be more detailed and model-friendly. It adds technical parameters that improve output quality.
Use seeds for consistency — If you find a result you like, note its seed value. Using the same seed with the same prompt produces similar output, letting you make small prompt tweaks while keeping the overall composition.

Parallel generation

You can submit multiple generation jobs at once. Each plan tier includes a concurrency limit — the number of jobs that can run simultaneously:

Free and Starter plans allow multiple parallel jobs.
Creator, Pro, and Studio plans increase the limit for faster throughput.

Jobs from different studios can run at the same time — generate an image while a video renders, or produce audio while a 3D model is being created.

Frequently asked questions

How do I choose the right model?

Each studio shows available models with their credit cost, speed, and quality rating. Start with budget models for drafting, then switch to premium models for final output.

Can I generate multiple assets at once?

Yes. You can submit multiple jobs in parallel up to your plan's concurrency limit. Each job runs independently, so you can work across studios simultaneously.

Why does video require an approved frame?

Approving a frame first ensures you're happy with the visual direction before spending credits on the more expensive video generation. This prevents wasted credits on videos from frames you'd reject.

What is the prompt optimizer?

The built-in prompt optimizer rewrites your prompt to be more detailed and model-friendly. It adds style cues, composition details, and technical parameters that help the AI produce better results.

Do negative prompts work on all models?

Not all models support negative prompts. When available, the negative prompt field appears in the model's parameter form. Check each model's description for supported features.