Language and Perceptual Categorization in Computational Visual Recognition
Public DepositedAdd to collection
You do not have access to any existing collections. You may create a new collection.
Downloadable Content
Download PDFCitation
MLA
Ordonez Roman, Vicente. Language and Perceptual Categorization In Computational Visual Recognition. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School, 2015. https://doi.org/10.17615/xng7-nb97APA
Ordonez Roman, V. (2015). Language and Perceptual Categorization in Computational Visual Recognition. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/xng7-nb97Chicago
Ordonez Roman, Vicente. 2015. Language and Perceptual Categorization In Computational Visual Recognition. Chapel Hill, NC: University of North Carolina at Chapel Hill Graduate School. https://doi.org/10.17615/xng7-nb97- Last Modified
- March 19, 2019
- Creator
-
Ordonez Roman, Vicente
- Affiliation: College of Arts and Sciences, Department of Computer Science
- Abstract
- Computational visual recognition or giving computers the ability to understand images as well as humans do is a core problem in Computer Vision. Traditional recognition systems often describe visual content by producing a set of isolated labels, object locations, or by even trying to annotate every pixel in an image with a category. People instead describe the visual world using language. The rich visually descriptive language produced by people incorporates information from human intuition, world knowledge, visual saliency, and common sense that go beyond detecting individual visual concepts like objects, attributes, or scenes. Moreover, due to the rising popularity of social media, there exist billions of images with associated text on the web, yet systems that can leverage this type of annotations or try to connect language and vision are scarce. In this dissertation, we propose new approaches that explore the connections between language and vision at several levels of detail by combining techniques from Computer Vision and Natural Language Understanding. We first present a data-driven technique for understanding and generating image descriptions using natural language, including automatically collecting a big-scale dataset of images with visually descriptive captions. Then we introduce a system for retrieving short visually descriptive phrases for describing some part or aspect of an image, and a simple technique to generate full image descriptions by stitching short phrases. Next we introduce an approach for collecting and generating referring expressions for objects in natural scenes at a much larger scale than previous studies. Finally, we describe methods for learning how to name objects by using intuitions from perceptual categorization related to basic-level and entry-level categories. The main contribution of this thesis is in advancing our knowledge on how to leverage language and intuitions from human perception to create visual recognition systems that can better learn from and communicate with people.
- Date of publication
- May 2015
- Keyword
- Subject
- DOI
- Identifier
- Resource type
- Rights statement
- In Copyright
- Advisor
- Frahm, Jan-Michael
- Efros, Alexei
- Berg, Tamara
- Berg, Alexander
- Choi, Yejin
- Degree
- Doctor of Philosophy
- Degree granting institution
- University of North Carolina at Chapel Hill Graduate School
- Graduation year
- 2015
- Language
- Publisher
- Place of publication
- Chapel Hill, NC
- Access right
- There are no restrictions to this item.
- Date uploaded
- June 25, 2015
Relations
- Parents:
This work has no parents.
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
OrdonezRoman_unc_0153D_15135.pdf | 2019-04-08 | Public | Download |