Pix2Vox

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang, Wenxiu Sun

Abstract

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.

Publications

  • Haozhe Xie, Hongxun Yao, Shengping Zhang, Shangchen Zhou, Wenxiu Sun. Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images. International Journal of Computer Vision (IJCV), 128(12): 2919-2935, 2020. (IF=6.071)
    [BibTeX] [PDF]
    @article{xie2020pix2vox++,
      title={Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images},
      author={Xie, Haozhe and Yao, Hongxun and Zhang, Shengping and Zhou, Shangchen and Sun, Wenxiu},
      journal={International Journal of Computer Vision (IJCV)},
      year={2020},
      volume={128},
      number={12},
      pages={2919--2935},
      doi={10.1007/s11263-020-01347-6}
    }
  • Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang. Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images. ICCV, 2019.
    [BibTeX] [PDF] [Code] [Poster]
    @inproceedings{xie2019pix2vox,
      title={Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images},
      author={Xie, Haozhe and Yao, Hongxun and Sun, Xiaoshuai and Zhou, Shangchen and Zhang, Shengping},
      booktitle={ICCV},
      year={2019}
    }

Spotlight Video

Experimental Results

Click the images below to show the corresponding reconstruction result.

  • Rendering
  • Rendering
  • Rendering
  • Rendering
  • Rendering
  • Rendering
  • Rendering
  • Rendering

License

This project is open-sourced under MIT license.

Contact Us
  • Tencent AI Lab, Shenzhen, China
  • cshzxie [at] gmail [dot] com