Text2Tex: Text-driven Texture Synthesis via Diffusion Models

Dave Zhenyu Chen¹, Yawar Siddiqui¹, Hsin-Ying Lee², Sergey Tulyakov², Matthias Nießner¹,

¹Technical University of Munich, ²Snap Research

Paper arXiv Video Code

Text2Tex generates high-quality textures for 3D meshes from the given text prompts. Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high resolution partial textures from multiple viewpoints. To avoid artifacts, we propose an automatic view sequence generation scheme to determine the next best view for updating the partial texture. Extensive experiments demonstrate that our method significantly outperforms the existing text-driven approaches and GAN-based methods.

Video

Method Overview

In Text2Tex, we progressively generate the texture via a generate-then-refine scheme

In progressive texture generation, we start by rendering the object from an initial preset viewpoint. We generate a new appearance according to the input prompt via a depth-to-image diffusion model, and project the generated image back to the partial texture. Then, we repeat this process until the last preset viewpoint to output the initial textured mesh.

In the subsequent texture refinement, we update the initial texture from a sequence of automatically selected viewpoints to refine the stretched and blurry artifacts.

Textured Objaverse Objects

Geometry

CLIPMesh

Latent-Paint

Ours

"a compass"

Geometry

CLIPMesh

Latent-Paint

Ours

"an ambulance"

Creative Texture Synthesis

"baroque Porsche"

"wooden Porsche"

"cyberpunk Porsche"

"Lego Porsche"

Porsche in different styles

"airplane

"crocodile"

"hippo"

"sheep"

Porsche as other objects

BibTeX


@article{chen2023text2tex,
    title={Text2Tex: Text-driven Texture Synthesis via Diffusion Models},
    author={Chen, Dave Zhenyu and Siddiqui, Yawar and Lee, Hsin-Ying and Tulyakov, Sergey and Nie{\ss}ner, Matthias},
    journal={arXiv preprint arXiv:2303.11396},
    year={2023},
}