Synthesizing 3D views of a scene using multiple camera angles, cameras, and lighting conditions is a challenging task for computer vision models and an important prerequisite for AR and VR applications. A group of researchers from Google Research and Google Brain are working to solve this problem by developing deep learning models that can synthesize complex outdoor scenes using only unstructured collections of in-the-wild photographs.
“We build on neural radiance fields (NeRF), which uses the weights of a multilayer perceptron to implicitly model the volumetric density and color of a scene. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders,” the researchers stated in their paper. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections.
The new model can capture lighting and photometric post-processing data of outdoor scenes, with variation in appearance, without affecting the 3D geometry of the scene.
The researchers trained and implemented their models using TensorFlow 2 and Keras, with eight NVIDIA V100 GPUs. “We optimize all NeRF variants for 300,000 steps on 8 GPUs with the Adam optimizer,” the researchers stated. “For the Lego datasets, we optimize for 125,000 steps on 4 GPUs.” The team used publicly available datasets, as well as images from Flickr.
According to the team, the model shows significant qualitative and quantitative improvement over the previous state-of-the-art approaches.
The researchers say their outdoor scene reconstruction model is still a work in progress but they are confident they’ve made significant strides towards generating novel views of outdoor environments.