Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Creating advertising images is often a labor-intensive and time-consuming
process. Can we automatically generate such images using basic product
information like a product foreground image, taglines, and a target size?
Existing methods mainly focus on parts of the problem and lack a comprehensive
solution. To bridge this gap, we propose a novel product-centric framework for
advertising image design called T-Stars-Poster. It consists of four sequential
stages to highlight product foregrounds and taglines while achieving overall
image aesthetics: prompt generation, layout generation, background image
generation, and graphics rendering. Different expert models are designed and
trained for the first three stages: First, a visual language model (VLM)
generates background prompts that match the products. Next, a VLM-based layout
generation model arranges the placement of product foregrounds, graphic
elements (taglines and decorative underlays), and various nongraphic elements
(objects from the background prompt). Following this, an SDXL-based model can
simultaneously accept prompts, layouts, and foreground controls to generate
images. To support T-Stars-Poster, we create two corresponding datasets with
over 50,000 labeled images. Extensive experiments and online A/B tests
demonstrate that T-Stars-Poster can produce more visually appealing advertising
images.