Efficient 3D reconstruction of large-scale urban environments from street-level video Public Deposited

Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Gallup, David
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • Recovering the 3-dimensional (3D) structure of a scene from 2-dimensional (2D) images is a fundamental problem in computer vision. This technology has many applications in computer graphics, entertainment, robotics, transportation, manufacturing, security, etc. One application is 3D mapping. For example, Google Earth and Microsoft Bing Maps provide a 3D virtual replica of many of the Earth's cities. However, these 3D models are low-detail and lack ground-level realism. Google Street View and Bing Street Side provide high-resolution panoramas captured from the streets of many cities, but these stills cannot provide free navigation through the virtual world. In this dissertation, I will show how to automatically and efficiently create detailed 3D models of urban environments from street-level imagery. A major goal of this dissertation is to model large urban areas, even entire cities, which is an enormous challenge due to the sheer scale of the problem. Even a partial data capture of the town of Chapel Hill requires millions of frames of street-level video. The methods presented in this dissertation are highly parallel and use little memory, and can therefore utilize modern graphics hardware (GPU) technology to process video at the recording frame rate. Also, the structure in urban scenes such as planarity, orthogonality, verticality, and texture regularity can be exploited to achieve 3D reconstructions with greater efficiency, higher quality, and lower complexity. By examining the structure of an urban scene, a multiple-direction plane-sweep stereo method is performed on the GPU in real-time. An analysis of stereo precision leads to a view selection strategy that guarantees constant depth resolution and improves bounds on time complexity. Depth measurements are further improved by segmenting the scene into piecewise-planar and non-planar regions, a process which is aided by learned planar surface appearance. Finally, depth measurements are fused and the final 3D surface is recovered using a multi-layer heightmap model that produces clean, complete, and compact 3D reconstructions. The effectiveness of these methods is demonstrated by results from thousands of frames of video from a variety of urban scenes.
Date of publication
Resource type
Rights statement
  • In Copyright
  • "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science."
  • Pollefeys, Marc
Place of publication
  • Chapel Hill, NC
  • Open access

This work has no parents.