Text2Tex: Text-driven Texture Synthesis via Diffusion Models

1Technical University of Munich, 2Snap Research

Text2Tex generates high-quality textures for 3D meshes from the given text prompts. Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high resolution partial textures from multiple viewpoints. To avoid artifacts, we propose an automatic view sequence generation scheme to determine the next best view for updating the partial texture. Extensive experiments demonstrate that our method significantly outperforms the existing text-driven approaches and GAN-based methods.

Video

Method Overview

In Text2Tex, we progressively generate the texture via a generate-then-refine scheme

In progressive texture generation, we start by rendering the object from an initial preset viewpoint. We generate a new appearance according to the input prompt via a depth-to-image diffusion model, and project the generated image back to the partial texture. Then, we repeat this process until the last preset viewpoint to output the initial textured mesh.

In the subsequent texture refinement, we update the initial texture from a sequence of automatically selected viewpoints to refine the stretched and blurry artifacts.

Textured Objaverse Objects

"a compass"
"an ambulance"

Creative Texture Synthesis

Porsche in different styles
Porsche as other objects

BibTeX


@article{chen2023text2tex,
    title={Text2Tex: Text-driven Texture Synthesis via Diffusion Models},
    author={Chen, Dave Zhenyu and Siddiqui, Yawar and Lee, Hsin-Ying and Tulyakov, Sergey and Nie{\ss}ner, Matthias},
    journal={arXiv preprint arXiv:2303.11396},
    year={2023},
}