ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
CVPR 2024
UC San Diego
*Equal contribution


ZeroRF is able to perform novel view synthesis from few views (6 as shown in the figure) with exceptional quality, while also being fast, obtaining competitive results within 2 minutes and finishing in around 25 minutes at the full $800^2$ resolution. For common resolutions like $256^2$ or $320^2$ in 3D generation applications, ZeroRF reconstructs an object from sparse-view generations in only 30 seconds.

6-View Reconstruction Speed Comparison


We present ZeroRF, a novel per-scene optimization method addressing the challenge of sparse view 360° reconstruction in neural field representations. Current breakthroughs like Neural Radiance Fields (NeRF) have demonstrated high-fidelity image synthesis but struggle with sparse input views. Existing methods, such as Generalizable NeRFs and per-scene optimization approaches, face limitations in data dependency, computational cost, and generalization across diverse scenarios. To overcome these challenges, we propose ZeroRF, whose key idea is to integrate a tailored Deep Image Prior into a factorized NeRF representation. Unlike traditional methods, ZeroRF parametrizes feature grids with a neural network generator, enabling efficient sparse view 360° reconstruction without any pretraining or additional regularization. Extensive experiments showcase ZeroRF's versatility and superiority in terms of both quality and speed, achieving state-of-the-art results on benchmark datasets. ZeroRF's significance extends to applications in 3D content generation and editing.



Architecture of ZeroRF. It parametrizes TensoRF-VM tensors with randomly-initialized deep generator networks (Sec. 4.3), with the input to the networks set to a frozen Gaussian noise on start of training. The system performs per-scene optimization using the standard volume rendering procedure with a plain rendering loss.

6-View Reconstruction (25min)

Text / Image to 3D (30s)



