I generated a blog header last month that looked perfect. Three days later I needed a second image in the same series and couldn’t reproduce it — close, but slightly off in ways I couldn’t name. Different line weight, maybe. Different color temperature. The images looked like they came from related but not identical sources.

That’s the consistency problem with AI image generation. It’s not capability — every model released in the last year can produce something usable. It’s that you get a great result, can’t remember exactly what you typed, and the next image is slightly off. Multiply that across a content operation and you end up with visual noise instead of a coherent brand.

I’d been generating images for this site, for the DeRP satirical products, and for Signal Over Noise long enough to hit this wall repeatedly. So I built a system. It started as a personal skill inside Cerebro — my AI second brain — and eventually got specific enough that it made sense to open-source it. It’s at github.com/aplaceforallmystuff/claude-art-skill. MIT licensed.

One Markdown File. Sixteen Workflows.

The art skill is a Claude Code skill — a markdown file that loads into Claude’s context via ~/.claude/skills/art/SKILL.md. When you invoke /art in a Claude Code session, you get a structured system for generating images via the Gemini API, not a generic “make an image” prompt.

Four things make it work:

16 specialized workflow types, each tuned for a different visual problem. You don’t just ask for “an image” — you specify the workflow (editorial illustration, technical diagram, sketchnote, comic strip, comparison chart, timeline, framework visualization, stats graphic, recipe card) and the system knows what parameters, composition rules, and prompt structure that type actually needs.

A brand aesthetic file that you write once and every image respects automatically. Your color palette, visual style, typography preferences, recurring elements — all defined in ~/.claude/skills/art/aesthetics/your-brand.md. The skill reads it before generating anything.

A base prompt prefix pattern that locks visual parameters across an entire image set. You define it once after getting one image you like, then prepend it to every subsequent prompt. More on this below — it’s the part that actually makes everything else work.

A CLI (bun run generate-image.ts) that wraps the Gemini API with flags for model selection, output size, aspect ratio, thinking depth, web search grounding, reference image injection, and background removal.

Two Models, One Decision

The skill uses Google Gemini image models. The default is Nano Banana 2 (Gemini 2.0 Flash Image Generation), which runs about $0.067 per image. The Pro variant (Nano Banana Pro) costs roughly $0.134 per image and handles more complex compositions and iterative refinement better.

NB2 has a few capabilities worth calling out: web search grounding means it can look up actual logos and landmarks to render them accurately rather than hallucinating; text rendering is solid enough for diagrams and technical illustrations with legible callouts; and it supports output from 512px up to 4K. For most blog headers and social images, NB2 is fine. For product illustrations where you need precise multi-element compositions, Pro is worth the extra cost.

The Base Prompt Prefix Pattern

Before I had this, I’d spend the first few attempts on any new image set just figuring out the visual parameters: what line weight, what camera angle, what color ratios, what lighting. Then I’d generate a dozen images and some would nail it and some wouldn’t — because I’d worded things slightly differently each time.

The base prompt prefix is a reusable string that locks these parameters for an entire project. You define it once — usually after getting one image you like — and prepend it to every subsequent prompt. The model gets identical visual constraints on every generation.

For the DeRP satirical products, the infomercial aesthetic has a base prefix:

Bold commercial illustration, high-contrast color blocking,
dramatic product staging, "As Seen on TV" visual language,
strong typographic hierarchy with serif display text,
stark white backgrounds with color accent panels,
professional product photography composition —

Every CARPETS image starts with that. They all look related.

For this site’s claymorphic blog headers, the prefix is:

Isometric 3D claymorphic diorama, soft polymer clay aesthetic,
rounded pillow-like edges on all objects, warm peach background (#F5D5C8),
soft ambient lighting with gentle shadows, Blender cycles render style,
pastel colors —

Then I add the scene-specific content. The visual parameters are already locked. I’m only writing the part that changes.

Write Your Visual Identity Once

The aesthetic file is a markdown document you write once and place in ~/.claude/skills/art/aesthetics/. Here’s the structure:

# Brand Aesthetic: [Your Brand Name]

## Visual Identity
- Style: [claymorphic 3D / hand-drawn sketch / flat vector / etc.]
- Palette: [primary colors, accent colors, background colors]
- Typography: [font preferences for text in images]
- Mood: [warm, technical, playful, editorial, etc.]

## Recurring Elements
- [List of props, motifs, or elements that appear across your images]

## Composition Rules
- [Camera angle preferences]
- [Negative space guidelines]
- [What to avoid]

## Base Prompt Prefix
[The locking string described above]

When you invoke /art with a brand aesthetic configured, the skill reads the file and applies your visual identity automatically. You don’t have to retype your color palette for every generation.

If you work on multiple brands — the skill supports multiple aesthetic files, selected by flag:

bun run generate-image.ts \
  --aesthetic your-brand \
  --workflow editorial-illustration \
  "Editor at vintage desk reviewing manuscripts"

Show It Instead of Describing It

The --reference-image flag lets you pass an existing image to extract visual style from. The model analyzes the reference and applies its aesthetic characteristics to the new generation.

Useful when you have an image you like but can’t fully articulate why — the lighting treatment, the color temperature, the compositional energy — and you want to match it. You don’t have to reverse-engineer the style into words. You just show it.

bun run generate-image.ts \
  --reference-image ./existing-hero.png \
  --workflow editorial-illustration \
  "Match the lighting and color treatment of the reference"

Installation

The skill installs into ~/.claude/skills/art/:

git clone https://github.com/aplaceforallmystuff/claude-art-skill.git
ln -s ~/Dev/claude-art-skill/SKILL.md ~/.claude/skills/art/SKILL.md

You’ll need Bun for the CLI and a Gemini API key. The README walks through both.

How It Got Built

This grew out of Cerebro’s needs. I was generating blog headers, social images, and product illustrations at a pace where ad-hoc prompting wasn’t sustainable — each new image required rediscovering what worked. The system emerged from writing down what I already knew: these sixteen workflows, these composition rules, this aesthetic file structure.

The February post about systematizing AI art covers the underlying analysis — how studying 16 existing workflow patterns and actual Gemini model capabilities produced a prompting system that got 40+ production-quality illustrations in a week with zero failures. That’s Cerebro’s perspective on the same work. This post is about what the system looks like from the outside, as a tool you can install and use.

Is It Worth Setting Up?

If you’re generating images occasionally and the aesthetic doesn’t need to be tight — a one-off illustration here, a social image there — it’s probably more infrastructure than you need. Use whatever works.

If you’re generating images at scale, or you care about visual consistency across a content operation, or you’ve burned hours trying to reproduce that one good result you got three weeks ago — this is what I wish I’d had earlier. The brand aesthetic file and base prompt prefix pattern solve the consistency problem directly. The 16 workflows mean you’re not reinventing composition rules for every new content type.

Issues and PRs welcome.


I write about building with AI tools — the systems, the failures, and what actually works — over at Signal Over Noise.