AI Photo Prompts for Product Photography: A Complete E-commerce Guide

A traditional lifestyle product shoot costs between $200 and $5,000 per session. For a brand managing 50 SKUs across four seasonal backgrounds, that is a $40,000–$100,000 annual photography budget — before post-production.
Generative AI promises to eliminate this entirely. The problem is that standard AI workflows fail at the one thing e-commerce brands cannot compromise on: consistency. Ask any model to put your sneaker on a beach and it generates a different sneaker every time. Wrong colorway. Hallucinated logo. Slightly altered sole profile.
This guide explains exactly how to solve the consistency problem using structured Nano Banana prompts — and why locking product identity at the prompt level is the only reliable solution at scale.
Quick answer: AI product photography works at scale only when the prompt is split into two distinct layers — a frozen Identity Layer (exact product attributes that cannot change) and a variable Environment Layer (background, lighting, props). Mixing both layers in a single prompt causes the model to treat your product as a suggestion, not a constraint.
Why Generic AI Tools Fail E-commerce Brands
Tools like Photoroom, Claid, and Flair are built for background swapping and catalog cleanup. They are excellent at what they do — but they start from the image, not the prompt.
When you need to generate a product in a new lifestyle environment that does not exist yet (a lookbook shoot you have not done, a seasonal campaign for next quarter, an international market variant), you need to build the scene from scratch in Nano Banana. And that requires a structured prompt that keeps your product anatomy locked while the environment changes freely around it.
Generic prompt attempts fail because of one architectural problem: the model treats your product description as a suggestion, not a constraint. Without a hard identity lock at the prompt level, Nano Banana fills the gaps with its best statistical guess — which is never your specific SKU.
The Two-Layer Prompt Architecture for Product Consistency
Professional product photography with Nano Banana requires separating every prompt into two distinct layers:
Layer 1 — Identity Layer (frozen): The physical attributes of the product that must never change. Shape, dimensions, color code, material texture, logo placement, hardware details.
Layer 2 — Environment Layer (variable): Everything outside the product. Background, lighting setup, props, surface, mood, camera angle.
When you write a prompt that mixes both layers without separation, the model redistributes creative weight across the whole scene. When you freeze Layer 1 explicitly, the model understands it has zero creative license over the product and channels all generation energy into the environment.
The same principle applies to writing realistic human subjects — see the 8-part prompt formula for how layer separation works in portrait photography.
Writing the Identity Layer: What to Include
For a physical product, the Identity Layer should define:
- Form factor: exact dimensions relative to other elements, silhouette, weight impression
- Color: use precise descriptors — not "blue" but "deep navy matte with slight indigo undertone"
- Material: "brushed aluminum casing," "full-grain vegetable-tanned leather," "ribbed cotton jersey"
- Logo/branding: placement, size, and treatment — "embossed white logo, top left chest, 3cm"
- Distinguishing details: hardware, stitching, zipper type, sole profile, cap construction
The more specific the Identity Layer, the fewer liberties Nano Banana takes. Vague descriptions produce variants. Precise descriptions produce replicas.
A Complete Prompt Example: Sneaker in a Lifestyle Scene
Here is what a locked product prompt looks like in practice:
Identity Layer (frozen):
Low-top sneaker, deep navy matte canvas upper with slight indigo undertone, white rubber cupsole with 3mm lateral ridge, silver metal eyelets, flat white laces, small embossed brand logo on lateral heel in white, no other branding.
Environment Layer (variable — Scenario A: minimalist studio):
Resting on a concrete pedestal, brutalist studio setting, natural light from above-left, soft shadows, 85mm lens, shallow depth of field, matte background, no props.
Environment Layer (variable — Scenario B: lifestyle outdoor):
On a worn wooden bench in a sunlit park, golden hour, slight lens flare, 35mm, candid placement — laces slightly undone, casual.
Shared negative constraints:
No color shift on upper, no logo modification, no shape distortion, no extra details not in original, no plastic texture.
Same product. Two completely different environments. Zero identity drift between generations.
The CookedBanana Workflow for E-commerce Teams
Writing and maintaining the Identity Layer manually across dozens of SKUs is a significant operational overhead. Every time you brief a new scene, you risk introducing inconsistencies in how the product is described.
CookedBanana solves this at the workflow level. Upload your clean product packshot — a white-background studio image works best — and activate Lock Ref. The engine extracts and locks the product's anatomical attributes as image_ref.1, freezing the Identity Layer automatically. You then describe only the environment you want.
The practical workflow for agency teams:
- Upload clean packshot (white background, single product)
- Activate Lock Ref — identity layer locked as
image_ref.1 - Type the new environment in plain language ("marble surface, luxury bathroom, morning light")
- Copy the structured prompt output into Nano Banana
- Generate unlimited lifestyle variants — identity stays consistent across all of them
For brands with high SKU volume, the Pro Plan (700 generations) and Agency Plan (2,500 generations) provide the scale to replace an entire season's photography budget in a single workflow.
The Economics: What This Replaces
Industry benchmarks put traditional lifestyle product photography at $200–$5,000 per session. AI-generated alternatives deliver comparable quality at $0.10–$2.00 per image — an 80–95% cost reduction.
For a mid-sized brand with 100 SKUs requiring 4 lifestyle environments each:
| Method | Cost estimate | Time to delivery | |---|---|---| | Traditional shoot | $8,000–$20,000 | 2–4 weeks | | Generic AI (no consistency) | $50–$200 | Hours — but unusable | | Nano Banana + CookedBanana | $200–$500 | Same day |
The third row only works if product identity is locked correctly. Without it, you are back to the second row — fast generation, unusable output.
Platform Compliance: Amazon and Shopify
Both Amazon and Shopify permit AI-generated product images, with platform-specific requirements:
Amazon: The main hero image must have a pure white background with the product filling at least 85% of the frame. AI lifestyle images are fully acceptable as secondary images, A+ content, and Sponsored Brand ad creatives. The product must accurately represent the item being sold — color, dimensions, and branding must match the physical product exactly.
Shopify: No specific restrictions on AI imagery, but product images must accurately represent what is being sold. Lifestyle images on product pages, collection pages, and email campaigns are fully supported.
For on-body apparel and fashion products, Nano Banana's candid photography capabilities are also useful for generating authentic lifestyle and UGC-style content that performs well on social and paid channels.
Frequently Asked Questions
Why does my product look different every time I generate it with Nano Banana?
Because you are describing the product as part of the scene rather than isolating it as a locked identity constraint. Nano Banana treats unanchored descriptions probabilistically — it generates a plausible version of what you described, not a faithful replica. To prevent this, separate your prompt into a frozen Identity Layer and a variable Environment Layer, or use CookedBanana's Lock Ref feature to anchor the product automatically.
Can I use AI product photography for Amazon and Shopify listings?
Yes, with one important caveat: marketplace compliance. Amazon and Shopify both allow AI-generated images as long as they accurately represent the product and comply with background requirements (Amazon requires pure white backgrounds for main images). AI lifestyle images work best as secondary listing images, brand story content, and ad creatives — not as the hero PDP image where product accuracy is most scrutinized.
How many SKUs can I realistically manage with this workflow?
With CookedBanana's Agency Plan (2,500 generations), a team can realistically cover 600–800 SKUs across 3 lifestyle environments per product in a single billing cycle. The main bottleneck is not generation speed but packshot quality — every input image needs to be a clean, well-lit studio shot for the Lock Ref extraction to work reliably.
What is the best type of input image for consistent AI product photography?
A clean white-background packshot with even, diffused lighting and no heavy shadows is the most reliable input. The product should fill 70–80% of the frame, all branding and hardware details should be in focus, and the image should be shot straight-on or at a 3/4 angle. Low-quality inputs with strong shadows or motion blur will produce inconsistent Identity Layer extractions regardless of the tool used.
Does this work for apparel and fashion products?
Yes, but apparel requires additional handling for fabric drape and fit simulation. For flat-lay garments and packshots, the workflow is identical. For on-body lifestyle scenes, you will need to specify body type, pose, and styling in the Environment Layer. CookedBanana's Outfit Ref system is specifically designed for this — it locks individual garment elements as separate reference layers so you can mix and match pieces across generations while keeping each item accurate.
What AI tools exist specifically for product photography?
The main options are Photoroom (background removal and replacement), Claid (catalog image enhancement), Flair (lifestyle scene staging), and Nano Banana with CookedBanana (full prompt-level control with locked identity). The key differentiator is where control happens: Photoroom and Claid work at the pixel level from an existing image, while Nano Banana + CookedBanana works at the prompt level, giving you the ability to build entirely new scenes that do not yet exist as photographs.
How do I maintain consistent lighting across multiple product shots?
Lock the lighting in the Environment Layer and reuse the same lighting description across all generations: light source direction, quality (hard/soft), color temperature, and shadow treatment. For a studio-consistent look, a starting point is: soft diffused light from above-left, 5500K, slight shadow detail on product surface, white reflector fill from right. Changing only the surface and background while keeping the lighting fixed produces a cohesive catalog aesthetic.
Topics