Downloadable Content

Download PDF
Last Modified
  • March 20, 2019
  • Zheng, Enliang
    • Affiliation: College of Arts and Sciences, Department of Computer Science
  • The goal of image-based 3D reconstruction is to construct a spatial understanding of the world from a collection of images. For applications that seek to model generic real-world scenes, it is important that the reconstruction methods used are able to characterize both static scene elements (e.g. trees and buildings) as well as dynamic objects (e.g. cars and pedestrians). However, due to many inherent ambiguities in the reconstruction problem, recovering this 3D information with accuracy, robustness, and efficiency is a considerable challenge. To advance the research frontier for image-based 3D modeling, this dissertation focuses on three challenging problems in static scene and dynamic object reconstruction. We first target the problem of static scene depthmap estimation from crowd-sourced datasets (i.e. photos collected from the Internet). While achieving high-quality depthmaps using images taken under a controlled environment is already a difficult task, heterogeneous crowd-sourced data presents a unique set of challenges for multi-view depth estimation, including varying illumination and occasional occlusions. We propose a depthmap estimation method that demonstrates high accuracy, robustness, and scalability on a large number of photos collected from the Internet. Compared to static scene reconstruction, the problem of dynamic object reconstruction from monocular images is fundamentally ambiguous when not imposing any additional assumptions. This is because having only a single observation of an object is insufficient for valid 3D triangulation, which typically requires concurrent observations of the object from multiple viewpoints. Assuming that dynamic objects of the same class (e.g. all the pedestrians walking on a sidewalk) move in a common path in the real world, we develop a method that estimates the 3D positions of the dynamic objects from unstructured monocular images. Experiments on both synthetic and real datasets illustrate the solvability of the problem and the effectiveness of our approach. Finally, we address the problem of dynamic object reconstruction from a set of unsynchronized videos capturing the same dynamic event. This problem is of great interest because, due to the increased availability of portable capture devices, captures using multiple unsynchronized videos are common in the real world. To resolve the challenges that arises from non-concurrent captures and unknown temporal overlap among video streams, we propose a self-expressive dictionary learning framework, where the dictionary entries are defined as the collection of temporally varying structures. Experiments demonstrate the effectiveness of this approach to the previously unsolved problem.
Date of publication
Resource type
Rights statement
  • In Copyright
  • Frahm, Jan-Michael
  • Jojic, Vladimir
  • Sheikh, Yaser
  • Berg, Tamara
  • Dunn, Enrique
  • Doctor of Philosophy
Degree granting institution
  • University of North Carolina at Chapel Hill Graduate School
Graduation year
  • 2016

This work has no parents.