Languages of Perception
Mehdi Dastani

Abstract:
%Nr: DS-1998-05
%Author: Mehdi Dastani
%Title: Languages of Perception

In everyday life, we are confronted with visual information 
provided by the environment. This visual information may 
originate from the scenes of cities or forests, but also from 
television images, computer interfaces, and many other natural or 
artificial sources. In general, we have no difficulties in 
recognizing meaningful entities in the visual information we 
receive, and in organizing it coherently. For example, when we 
look at an urban neighborhood which we have never seen before, we 
perceive individual buildings and separate them from each other 
even when they are continuously bounded to each other. Also, in a 
natural environment, we easily perceive individual flowers, 
plants, or trees and discriminate them from each other even when 
one is partially hidden by the other. Although this ability seems 
to be effortless and direct, it is far from trivial to 
understand, describe, and model it. 
 
In order to understand and model the human visual system, one 
should analyze visual information as it is presented to human 
visual sensors (human eyes) and describe how this information can 
be mapped into meaningful entities for which we have names and 
which we can place in a conceptual framework. We assume two steps 
in mapping visual information into meaningful entities. The first 
step concerns the low-level structuring of visual information. 
This step provides the constituent structure of visual 
information, i.e. it determines: A) constituents of visual 
information and B) how they are composed to build up larger 
wholes. In the second step, the visual constituents resulting 
from the first step should then be interpreted in some conceptual 
framework. The interpretation of visual constituents is based on 
many factors such as reasoning and past experiences. It should be 
noted that these two steps interact with each other: when the 
structured visual information from the first step cannot be 
placed into a conceptual framework coherently, the low-level 
structuring step should provide an alternative constituent 
structure for the visual input.
 
In this thesis, we will concentrate on the first step of 
structuring visual information and investigate the principles on 
the basis of which visual constituents are composed to form 
larger wholes. However, we do not discuss how primitive visual 
constituents are determined. In an ultimate theory, these should 
probably be pixels, line-segments, and/or edges between 
contrastive areas. But for the moment we will avoid commitments 
about this issue, by focusing on classes of pictures for which a 
particular type of higher-level units may be assumed as 
primitives. What we focus on in this thesis is the problem of 
gestalt perception: assuming a set of primitive elements, we try 
to account for the phenomena that pictures built up of these 
elements are perceived by humans as having a particular 
hierarchical constituent structure. In the study of gestalt 
perception primitive elements are assumed to be composed and 
structured unconsciously and directly according to some innate 
principles that are believed to underlie the human visual system. 
During the last century, there have been several formulations for 
these suggested innate principles. We start with a recent 
formulation of the innate principles of the human visual system 
and develop a mathematical model for gestalt perception. We 
discuss various aspects of gestalt perception and work out an 
application in which a model of human visual system is 
indispensable.