Research

CityDreamer: Compositional Generative Model of Unbounded 3D Cities Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu S-Lab, Nanyang Technological University TL;DR: CityDreamer learns to generate unbounded 3D cities from Google Earth imagery and OpenStreetMap. Abstract In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the greater challenges posed by 3D city generation, mainly because humans are more sensitive to structural distortions in urban environments.

RMNet

Efficient Regional Memory Network for Video Object Segmentation Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Wenxiu Sun Abstract Recently, several Space-Time Memory based networks have shown that the object cues (e.g. video frames as well as the segmented object masks) from the past frames are useful for segmenting objects in the current frame. However, these methods exploit the information from the memory by global-to-global matching between the current and past frames, which lead to mismatching to similar objects and high computational complexity.
GRNet: Gridding Residual Network for Dense Point Cloud Completion Haozhe Xie, Hongxun Yao, Shangchen Zhou, Jiageng Mao, Shengping Zhang, Wenxiu Sun Abstract Estimating the complete 3D point cloud from an incomplete one is a key problem in many vision and robotics applications. Mainstream methods (e.g., PCN and TopNet) use Multi-layer Perceptrons (MLPs) to directly process point clouds, which may cause the loss of details because the structural and context of point clouds are not fully considered.
Toward 3D Object Reconstruction from Stereo Images Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Xiaojun Tong, Wenxiu Sun Abstract Inferring the complete 3D shape of an object from an RGB image has shown impressive results, however, existing methods rely primarily on recognizing the most similar 3D model from the training set to solve the problem. These methods suffer from poor generalization and may lead to low-quality reconstructions for unseen objects.
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang, Wenxiu Sun Abstract Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results.
Weighted Voxel: a novel voxel representation for 3D reconstruction Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Xiaojun Tong Abstract 3D reconstruction has been attracting increasing attention in the past few years. With the surge of deep neural networks, the performance of 3D reconstruction has been improved significantly. However, the voxel reconstructed by extant approaches usually contains lots of noise and leads to heavy computation. In this paper, we define a new voxel representation, named Weighted Voxel.