BurningEyedeas is a research group that investigates perceptual organization, human vision, natural vision processing (computer vision), image labeling and visual taxometrics. Our research is primarily based ongoing experiments conducted at the Burning Man Arts festival – hence the name “burning” “eye” (i)deas. Our data (click here to view) is available to all. Please cite: Barghout, Lauren; Winter, Haley; Riegal, Yurik. Empirical data on the configural architecture of human scene perception and linguistic labels using natural images and ambiguous figures. Vision Science Society Annual meeting. 2011.
We are looking for volunteers to record and analyze our large quantity of data. Graduate students who wish to supplement their data or undergrads looking for a research project are particularly welcome. Please call Lauren at 510 919 9255 or email email@example.com
Figure-ground organization requires a combination of local and configural processing. Local processes include bottom-up analyses that fuse the smaller regions of a figure to the larger whole. Top-down configural processes use contextual scene information, prior experience and expectancy. The Berkeley Segmentation Data Set (Martin, Fowlkes, Tal & Malik, 2001) provides a corpus of hand-segmented images that are annotated for figure-ground status. Though useful for studying local mechanisms, an additional dataset designed to capture configural information is also needed.
Once we began collecting data, it became clear that we required a broader definition of figural status than that used in the literature. We coined the term spatial taxon, which refers to objects and object groups that have the Gestalt status a being perceived as figural. Spatial taxons allow an image to have multiple regions of interest, which may overlap as a hierarchy. Figure one illustrates two overlapping spatial taxons. In the figure, the height of the spatial taxon layer indicates it’s inclusiveness (also referred to as level of abstraction).
Figure 1: Spatial Taxons. An image can be organized by spatial taxon* and parsed into an information architecture that allows for a range of inclusiveness in the status of Figure. In the photograph above, Figure can include the butterfly and flower or the butterfly alone. The spatial taxons are defined relationally in "layers of abstraction."
The question of how to best label images becomes more relevant as the number of unlabelled photographs available grows. Common image labeling methodologies depend strongly on object identification, such as the pyramid image labeling model ( Jörgensen, Jaimes, Benitez, and Chang (2001)) and nine class image content system ( Burford, Briggs, and Eakins (2003)). Automating such labeling models to handle large numbers of photographs, would require heavy use of object recognition.
Our research, however, suggests alternative labeling system, based on a visual taxonomy that is invariant with respect to image content. In other words, the taxonomy is always the same – regardless of what is in the picture. Labeling such a taxonomy would employ simultaneous laws of least effort in linguistics and visual perception. See eyegorithm.com for image segmentation and labeling technology based on this approach.
In a seminal study of pictorial object naming, Jolicoeur, Gluck and Kosslyn (1984) found that objects were identified first at an “entry point” level of abstraction. This entry point, analogous to the basic level category described by Rosch (1976, 1977), sits between a general level of abstraction and subordinate (detailed) level of abstraction. The basic concept methodology of image labeling proposes that each “concept” corresponds to an entrylevel category.
Analysis of the results presented in this study, find an organization of spatial taxons within images similar to nested levels of abstraction found by Rosch (1976, 1977) and Jolicoeur et al (1984). Human subjects organized scenes according to a nested hierarchy of abstraction, whereby the superordinate level corresponded to the foreground, the entry level corresponded to the figure and the subordinate level corresponded to highly salient object parts (or individual objects for scenes comprised of multiple objects. The structure of these regions of abstractions, spatial taxons, were invariant with object type and that their frequency distribution mimicked those found in object category studies.
If scene architecture follows a consistent structure which is largely independent of the objects contained in the scenes, then object recognition may not be required for configural meaningfulness processes of scene segmentation. Meaningfulness instead would depend on the abstraction level at which the human (or machine) segments the scene. If scene segmentation follows a law of least effort, similar to that found in other cognitive processes (Cancho and Sole 2002), then the level of abstraction would result from a trade-off in utility concerning the perceptual requirements for understanding the scene and the visual resources required to processes and perceptually organize that scene. The power-law and exponential distributions found between word-frequency and rank is thought to result from this principal of least effort.
General Methodology of Experiments
Paper surveys consist of a series of photographs with the instructions: "Please put an X at the center of the subject of the photograph and write a few words to describe it." See ambiguous figure study (Mike put link here) for further analysis of this methodology.
Operational Definition - Spatial Taxon
We use the term "spatial taxon" as opposed to "figure" to refer to a region centered around a position indicated by survey participants. In many ways "spatial taxon" is analogous to the term "figure" as defined in vision science literature, but it is a broader term that includes discrete regions within "figure" and regions comprised of several smaller subject (foreground). This allows for figural status to be defined in cases where the "subject" of an image is ambiguous, for example, in images where there is more than one "figure" and/or a "figure" within a "figure". Our survey method assumes that the figural status of region is defined by proxy, as centered around the point that survey participants mark as being the "the center of the subject."
Method of analysis
Spatial taxons were determined via k-means clustering of location measurements. Word and word phrases for each spatial taxon were grouped, counted and are reported in a table on the separate webpage designated for each image reported. Words were counted and ranked by frequency, such that the most commonly occurring word was given a rank of one. In the ambiguous figure study, words with the same count were rank sequentially. In the numerosity and burningman study, words with the same count were given the same rank. This procedure enables us to examine if the frequency of the spatial taxons and their corresponding words decays as a power law of its rank - as would be expected by system governed by a principle of least effort. To explore the hypothesis of spatial taxons resulting from a law of least effort, the spatial taxon frequency and rank were plotted on log coordinates. The words frequency- rank for each spatial taxon was also plotted on log-log coordinates to look for powerlaw behavior.
Specific Methods for individual studies
The images used for this study (with permission by Brad Templeton) were chosen for their familiarity to participants of the Burningman Art Festival. Populations were sampled at Burningman, MacWorld Expo (San Francisco) and to people waiting on line outside the department of motor vehicle in Oakland CA and Raleigh N.C.
The study attempts to manipulate the likely “entry point category” by sampling populations with different general knowledge of the objects within the images. Data collected from more photographs still need to be coded, analyzed and added to the sight. Check back for more data.
The images used for the "numerosity" study were chosen for object type and configuration to match an N by M factorial design, where N represents the number of objects of the same type. For each survey there are instances of single subject images, and images with two and three subjects, as well as images with a large group of subjects. At least 50 surveys were collected for each photographic image. Surveys were conducted at the Burning Man art festival in Gerlach, Nevada, at the MacWorld Expo in San Francisco, at public venues within the local community in Oakland, California and at public venues in Raleigh North Carolina.
Ambiguous figure study
The methodology of these studies incorporate a paradigm that assumes that asking someone to mark the center of the subject of the photograph serves as a proxy of the figural status (and as explained above - the level of abstraction) of the region centered at the point marked. Because this method does not distinguish between a foreground, a single object or an object within an object, the term spatial taxon, was coined to refer to the object or object group centered at the position indicated.
In this study we test this underlying assumption, by repeating the experiment with ambiguous figures. Because ambiguous figures are not perceived simultaneously, the data collected can be grouped according to which ambiguous figure the subject identifies. Additionally, because figure interpretations are mutually exclusive, they do not belong to the same nested spatial taxon hierarchy. Thus if our assumption that the center of the subject serves as a proxy of figural status is correct, then the words used to label the spatial taxon yield significantly higher counts in the mutually exclusive taxons.
In this study we examine if the entry level category for clothed and unclothed human subjects follow the same study.
The data collected for each image is on it’s own webpage. Click on the image to be directed to the data pages.
Ambiguous Figure Series
Woman Sax Player
As expected, the ambiguous figure study supports by the assumption underlying the experimental methods because as predicted by our research hypothesis, the relevant word count was significantly higher in correlated spatial taxons of the ambiguous figures. This verifies the assumption underlying operational definitions utilized by this study.
The hypothesis that spatial taxons result from a law of least effort, whereby the entry point segmentation minimizes visual resources and maximizes utility, was explored by plotting the frequency-rank distributions. As shown in the word frequency vs. word rank plots, ambiguous figures produce an exponential rather than a power-law distribution. This differs from the distributions found in natural images shown in the numerosity study. However, the definition of word rank differed in studies, which may explain the difference in distributions.
In linguistics, Zipf’s Law is thought to result from processes minimizing the effort required to utter a word while simultaneously maximizing the utility of that word. In language these tradeoffs occur at many levels. At the phonological level, the system minimizes articulation effort and listeners minimize effort of understanding. At the lexical level, words with multiple meanings require less effort for the speaker, but more effort for the listener for determining context meaning. In natural language, these constraints resulted in the evolution of language usage patterns that reward word brevity, but encourage multiple word maps to unique objects. This one-to-many-mapping is required for least effort systems (Cancho & Sole (2003)). If the spatial taxons result from a process of least effort, we expect this process to also occur at multiple visual processing levels for images with multiple interpretations. It is possible, that the difference between the ambiguous figure results and the natural image results are due to the constrained segmentation to unique object mapping of the ambiguous figures.
In conclusion, this study supports the methodology for operationally defining spatial taxons and linking them to word sets. The combined analysis of these the results indicate that a power-law distribution explains scene segmentation when a high number of nested hierarchical spatial taxons are possible interpretations, but that an exponential distribution fits images with fewer possible interpretations. We speculate that law of least effort may explain some of these results.
To cite this work:
Barghout, L. (2009) Empirical data on the configural architecture of human scene perception using natural images. J Vis August 5, 2009 9(8): 964; doi:10.1167/ 9.8.964
Barghout, Lauren; Winter, Haley; Riegal, Yurik. Empirical data on the configural architecture of human scene perception and linguistic labels using natural images and ambiguous figures. Vision Science Society Annual meeting. 2011.PDF
Please cite this website and link back to this page. Thank you.
Barghout, Lauren; Winter, Haley; Riegal, Yurik. Empirical data on the configural architecture of human scene perception and linguistic labels using natural images and ambiguous figures. Vision Science Society Annual meeting. 2011.
Barghout, Lauren. Empirical data on the configural architecture of human scene perception. Vision Science Society Annual meeting. 2009.
Barghout, Lauren. How Global Perceptual Context Changes Local Contrast Processing. Ph.D. Dissertation, University of California at Berkeley. 2003.
B. Burford, P. Briggs, and J.P. Eakins, A taxonomy of the image: On the classification of content for imageretrieval, Visual Communication 2(2) (2003) 123-161.
Cancho, Ramon & Sole, Ricard. Least effort and the origins of scaling in human language. Proc Natl Acad Sci U S A. 2003 February 4; 100(3): 788-791.
C. Jörgensen, A. Jaimes, A.B. Benitez, and S.-F. Chang, A conceptual framework and empirical research for classifying visual descriptors, Journal of the American Society for Information Science and Technology 52(11) (2001) 938-947.
Lee, Hyuk-Jin and Neal, Diane M., "A New Model For Semantic Photograph Description Combining Basic Levels and User-assigned Descriptors" (2010). FIMS Library and Information Science Publications. Paper 18. http://ir.lib.uwo.ca/ fimspub/18 pp. 1–22
Palmer, Stephen. (1999). Vision Science: Photons to Phenomenology. MIT Press, Cambridge, MA .
Rosch, R.H. (1975) Cognitive representation of semantic categories,” Journal of Experimental Psychology 104(3): 192-233.
Rosch, E.H.; Mervis, C.B.; Gray, W.D.; Johnson, D.M.; Boyes-Braem, P. (1976). “Basic objects in natural categories”. Cognitive Psychology 8 (3): 382- 439.
Zipf, G. K. (1972) Human Behaviour and the Principle of Least Effort: An introduction to human Ecology. Addison-Wesley, Cambridge, MA).
Goast-woman: O’Regan, Kevin (2001) Experience is not something we feel but something we do: a principled way of explaining sensory phenomenology, with Change Blindness and other empirical consequences. Talk given at Bressanone on 24 Jan 2001.
Saxman-woman: Investigating the Relative Influence of Top-Downversus Bottom-Up Processing on Viewing Ambiguous Figures. Website reference: http:/ /www.laurenscharff.com/courseinfo/SL2000/tdbuexp.html (Illusionworks, L. L. C. (1997)).
Heart-moon. By Lauren Barghout 2007.
- Ana Da Silva - project management, writing, data analysis and illustrations
- Yurik Regal - data analysis and illustrations, http://yurikriegel.com/
- Haley Winter - data analysis, http://www.haleywinter.com
Special thanks to Ken Schwinghammer of St. Clould, MN for his financial support of the 2008 burningman study.