“The question of why we see things the way we do in large measure still eludes us” (p.341). This article explores the elusive relationship between vision and cognition. Is what we see solely a function of the stimulation we receive on our retina, or is it “cognitively penetrable” and influenced by what we “expect” to see? Furthermore, does what we see change our beliefs and representations of the world? Pylyshyn’s hypothesis is that early vision encompasses the stage of vision in which computational and top-down processing produces a 3D-image, but this stage is impenetrable to cognitive influence. Rather, cognition is constrained to two points: 1) pre-perception allocation of attention and 2) post-perception pattern-recognition decisions.

Questioning the Continuity Thesis

Pylyshyn puts forward four reasons for questioning whether vision is continuous with cognition. First, he argues that perception is resistant to rational influence. To support this claim, he cites optical illusions – even when you “know” two lines are the same length, you perceive them as uneven. Second, he explains that the principles of perception differ from principles of inference, which follow rational rules of reasoning. Principles of perception are responsive only to visually presented information; they do not reflect simplicity. For example, even if parts of an image are blocked, we can picture what is behind the blocked areas quite accurately. Therefore, these principles are “insensitive to knowledge, expectations, and even to the effects of learning” (p.345). Third, Pylyshyn cites clinical evidence from neuroscience that suggests at least partial dissociation of cognitive and visual functions. Finally, Pylyshyn presents methodological arguments that acknowledge the observed effects of expectations, beliefs, and contextual cues, but designates these influences to stages of processing that lie outside of what is called “early vision” (pg. 345).

Arguments For/Against Continuity

Pylyshyn explores three fields which have also explored the continuity thesis. First, he considers progress in artificial intelligence, in which the goal is to design a system that can “see.” The major progress that has come out of this field has been the development of a knowledge-based/model-based systems approach. These systems use stored general knowledge regarding objects to help determine whether said object appears in the field of view. Though this supports the continuity thesis, Pylyshyn argues that in addition to these systems, there needs to be development of systems that contain constraints on interpretation, which are congruent with the impenetrability thesis.

Next, he introduces discoveries in neuroscience indicating that attention can “sensitize or gate the visual field” (p. 347). This research partially favors the continuity theory, as there is evidence for top-down effects that modulate attention. However, there have been no discoveries of cells that influence the interpretation or emotional part of vision. Pylyshyn argues that what they illustrate is a pattern/motion response, not a content (cognitive) response.

Finally, Pylyshyn offers examples from clinical neurology in which pathology, specifically visual agnosia, strongly indicates a dissociation of vision and cognition. Upon impairment of one of the systems, the other continues to function. This view contradicts the continuity theory. Pylyshyn concurs with this evidence, although admits that these observations could certainly be correlational, not causal.

Determining Visual Stages: Methodological Issues

Pylyshyn acknowledges that all evidence provided above has been highly debated due to the complexity of the visual process. He highlights the importance of clarifying the phases of vision to better interpret these and other findings. He reviews methodological problems associated with distinguishing the stages of the visual process. The signal detection theory explores the ways in which humans make decisions about and respond to stimuli. Two phases of vision have been defined: a “perceptual” phase recognizes a stimulus (cognitively impenetrable), and a “decision” phase formulates a response (cognitively penetrable). The former is represented by the sensitivity parameter (d’), which represents the statistical relationship between the presence of the tone and a person experiencing the tone. The latter is represented by the response bias/criterion measure (ẞ), which represents the statistical relationship between the presence of the tone and the formulation of a response. While this interpretation locates such effects in a post-perceptual stage, Pylyshyn argues that this decomposition is generally too coarse to accurately establish when cognitive influences play a role.

It is important to note that the way we usually determine detection of a signal (sensitivity) is through a response (criterion measure), and therefore, being able to determine the location of cognition fails here as well. However, one way to compensate for this is through measurement of event-related potentials, which allows for measurement of stimulus evaluation uncontaminated by the response decision-making process. Pylyshyn also finds fault with this method because it encompasses everything except the response selection process, including memory retrieval for recognition, decisions, and inferences. He argues that “we need to make further distinctions within the stimulus evaluation stage so as to separate functions such as categorization and identification, which require accessing memory and making judgments, from functions that do not” (p. 351).

He concludes this section by further questioning the relationship between the stages of perception and sensitivity, which seems to be rather inconclusive. It is obvious that we need to determine a sort of mechanism that can lead to specificity in our sensitivity. Therefore, we need some sort of filtering to formulate the hypothesis generation stage of visual perception.

Constraints and Attention

Pylyshyn discusses several examples in which vision “appears on the surface to be remarkably like cases of inference” (p. 354). In these cases the visual system appears to “choose” one interpretation over other possible ones, and the choice appears remarkably “rational” (p. 354). However, Pylyshyn insists the examples do not actually constitute cognitive penetration for two reasons. First, he argues against cognitive penetration of natural constraints, which “are typically stated as if they were assumptions about the physical world” (p. 354). Natural constraints fail to demonstrate inference because the visual system evolved to work as it does, and principles of the visual system are internal to the system. They are neither sensitive to beliefs and knowledge, nor accessible to the cognitive network (p. 355). Next, Pylyshyn argues that cases of so-called perceptual “intelligence” and “problem-solving” also fail, and for the same reasons as natural constraints (also p.355). Specifically, the visual system often fails a simple test of rationality when certain basic facts about the world known to every observer.

How Knowledge Affects Perception

Pylyshyn discusses apparent counterexamples to the discontinuity theory. While he acknowledges the value of these as having an impact on response time, he asserts that the improved response is the result of knowing where to look or what to look for, limiting cognition to the pre-perceptual and post-perceptual stages. Hints of finding meaningful images do not actually help and does not affect the content that is seen (which is required for cognitive penetration). “Expert perceivers” (p. 358) appear to have knowledge that increases their ability to perceive certain patterns and with increased speed, but this seems to result solely from learned classification of visual patterns to enhance recognition and identification. He argues that this is part of the post-perceptual process. Additionally, although findings show that what people see is altered through experience, he argues that cognition plays a role only in pre-early vision by indexing spatially relevant locations. This development of focal attention is an important mechanism by which vision is malleable to the transient external world, and represents the main interface between vision and cognition. Further, he allows that cognition can influence post-perception decision making based on knowledge and experience (although with practice, this can become automated and cognitively impenetrable).

Output of Visual Systems

Pylyshyn makes the case for the visual system being a single system with two outputs, neither of which is knowledge-dependent. 

He defines “early vision” from a functional perspective, as the “attentionally modulated activity of the eyes” (p. 361). In this way, he’s acknowledging that early vision happens after attentional gating, and depends on inputs from other modalities, including non-retinal spatial information. Pylyshyn presents research showing that the output of the visual system in categories, such as shape-classes. Pylyshyn outlines that we form a 3D representation of surfaces (independent of knowledge), encode the layout of a scene (again, without knowledge or reasoning), and perceive a set of surfaces in depth. However, he stresses that “computing what the stimulus before you looks like…does not itself depend upon knowledge” (p.361).  Identifying a shape is not the same as recognizing what it actually is. For that, we need to draw upon other knowledge in memory and perform top-down processing, which can form a bridge between seeing the shape and knowing what it represents.

A second type of visual output is that which affects motor actions. These, too, Pylyshyn argues, happen separately from cognition, and he uses examples from clinical neurology to illustrate the “fractionation of the output of vision” (p.362) that allows patients to act as if they can see even when they don’t “think” that they can.

Questions

  • If vision is not continuous with cognition, does that suggest that perception cannot constitute basic beliefs, thus undermining Foundations Theory?
  • Is Pylyshyn’s hypothesis consistent with Traditional Epistemology or Natural Epistemology?
  • Given the information put forth in this article, as scientists, do you think that we can accurately “observe” any visual data?
  • Do you believe that there are two separate forms of visual output? If so, where do you think the motor-function output is derived from?

9 thoughts on ““Is Vision Continuous with Cognition?” by Zenon Pylyshyn – Emily Goins & Porter Knight

  1. I find Pylyshyn’s argument about the cognitive impenetrability of early vision convincing. While some may object that pre-visual attention gating or post-visual response selection, when they can unconsciously change our perception of a stimulus, in effect constitute a form of cognitive penetrability, I disagree. Pylyshyn’s point, to me, appeared to be that early vision is a cognitively impenetrable process in the sense that our beliefs or expectations cannot affect the actual sensory impression – the strict translation of retinal stimulation to phenomenological impressions – produced by that process. However, pre-visual attention and post-visual selection can still affect the conscious perception (classification) of the object or scene. Knowing where to focus could, for example, determine whether someone concluded a chick was male or female, but even expecting a chicken to be female would not cause you to be unable to see a telltale bump (or however they evaluate these things) if you were looking at the right spot. Being told to look at a certain characteristic of a figure might influence which response you gave as to the identity of an ambiguous figure, but it would not change the shape or size of that characteristic as you saw it. While I remain unconvinced by foundationalism in general, I feel that the cognitive impenetrability of vision is more coherent with foundationalism than a contradiction of it. If our most basic sensory impressions, our appearance beliefs, were cognitively penetrable, then they couldn’t be basic beliefs as they would be predicated on the other beliefs that were cognitively penetrating the vision process. Finally, I don’t think that the presence of certain “top-down” processes within early vision negates their cognitive impenetrability. These “top-down” processes, as asserted by Pylyshyn, are simply clearly defined rules of visual processing within early vision. A sensory impression being partly defined by its relation to other concurrent sensory impressions in the same field of vision is not penetration by outside cognitions the way it would be if sensory impressions (by which I mean the end-product of early vision, before post-visual processing) were defined in any way by beliefs or expectations external to the early vision system.

  2. As I was reading the introduction of the article, I formulated the same opinion about early vision that Pylyshyn presents later on. I find this separation of early vision and higher cognitive processes appealing, because it is supported by the neuroscientific evidence that the system of information processing in the occipital lobe is separate from the more rudimentary structures that allow information to reach this structure (the eye, retina, rods, cones, etc.). Where I got confused, however, was where Pylyshyn began differentiating between top-down and bottom-up information processing, and how it fit in with the early vision hypothesis. If vision is a result of a system that flows linearly from stimulus, to perception of that stimulus, to interpretation of that stimulus, then where is the “top down” approach applicable? Does the problem of optical illusions not occur in the “interpretation of the stimulus” phase? I think it’s extremely difficult to this of these stages as separate events, but I think this is maybe due to the fact that they take place in such quick succession, that it is difficult for us to mentally separate interpretation from pure perception.

    Additionally, on p. 453, Pylyshyn discusses the difficulties in seeing early vision as a phenomenon separate from cognition, because of the “filtering out” that takes place when we look at different objects or scenes. My question is how can we know whether this takes place before or after visual cues have reached more complex brain structures? Would we be able to use new technologies that are able to register visual stimuli from the eye to prove definitively that the actual physical information that the brain receives is “untainted” by cognitive bias?

  3. I would like to add to the above discussion on Pylynshyn’s assertion that early vision is cognitively impenetrable and arises from top-down processing. I disagree with his initial argument that the visual system functions differently than the principles of inference:

    According to Pollock & Cruz, when one inferences, one considers how perception of local objects or events justify a generalization of the objects or events. The authors provide the example of seeing an object that is red and attributing the particular state of that object to define what is “being red” (56). These authors acknowledge that there is some arbitrary amount of evidence, some of which is perceptual, required for one to be confident in their inference.

    Pylynshyn proposes two differences between the principles of inference and those of perception: that “perceptual principles…are responsive only to visually presented information” and that “the principles of visual perception are different…[because] they do not appear to conform to what might be thought of as tenets of ‘rationality’” (344). I believe Pylynshyn contradicts both of these arguments in his paper. He describes how individuals who view an ambiguous object maintain certain features of the object when it is reversed due to “perceptual coupling” (344). Does not this support the idea that vision is based in expectations of an object’s appearance based on previous perceptions of the object?

    Secondly, Pylynshyn references “amodal completion” to support his second argument for the difference between the principles of vision and those of inference. “Amodal completion” describes how individuals view line fragments connected to solid figures as being complete objects obstructed by the solid figures (344-345). Does not this example suggest that individuals use inferences from previous perceptual experiences of occluded objects and the present object that it is blocked by objects in the foreground?

    Similarly, Pylynshyn writes that his paper is focused on the top-down influences on vision, or influences that affect “the content of visual perception…in a certain meaning-dependent way that we call cognitive penetration” (343). By referencing “meaning-dependent” scenarios, Pylynshyn acknowledges that expectations are involved in our visual perceptions. As expectations combined with sensory input lead to our visual interpretation, is not this process synonymous with the principles of inference?

    Ultimately, my objections with Pylynshyn’s arguments might reflect our different interpretations of rationalization. From my perspective, rationalizing is not limited to conscious thinking. It is inevitable that our unconscious is involved in our perceptions and conclusions about our environment.

  4. One of Pylyshyn’s claims is “the early vision system carries out complex computations… many of these computations involve what is called top-down processing” (343). He continues that top-down processing involves Gestalt psychology, in which elements in the environment are grouped or organized to create a rational view of our environment. From my understanding, top-down processing is the influence of knowledge on the perceptual process, which can also include attentional visual selection. Pylyshyn proposes that early vision in particular is cognitively impenetrable, and attentional selection is only occurring before and after early vision, so what visual processes are defined as part of early vision? How and why is early vision isolated from Pylyshyn’s definition of a pre- early vision, in which attentional selection is a part of?

    To expand upon the first proposed question of “If vision is not continuous with cognition, does that suggest that perception cannot constitute basic beliefs, thus undermining Foundations Theory?” I’d like to ask, does Pylyshyn’s argument that early vision is reliably separated from cognition support evidence of physical-object beliefs and/or appearance beliefs or not?

  5. As was mentioned above, Pylyshyn states that early vision is “cognitively impenetrable” (344), and that cognitive penetration occurs at two stages of perception: both prior to, and after, the operation of early vision (344). He elaborates on this at several points throughout the reading, giving the example of focused attention on page 353, saying that if our attention is focused on the point in space in which the stimulus will appear, that “it can be used to direct attention… to that location, thus increasing the signal-to-noise ratio for signals falling in that region” (353); d’ can be altered depending on the attention of the subject.

    Could one not argue, then, that the influence of knowledge during the pre-perceptual stage of the initial response to a stimulus does in fact demonstrate an example of the cognitive penetrability of early vision? Where is the distinction between the idea that knowledge may have an influence on early vision before it takes place, and the (rejected) hypothesis of the cognitive penetrability of early vision?

    In contrast to this, however, is the example of optical illusions, which show that often we cannot change how we perceive something even if we know that, rationally, what we are “seeing” cannot possibly be correct.

    Is it possible, then, that early vision is too broad a term? Perhaps there is more than one type of early vision, depending on what is being seen and the viewpoint of the viewer, and that knowledge plays a different role in each, meaning that some forms of early vision are cognitively penetrable and others are not.

    I also had trouble with the concept of “event-related potentials (ERPs)” (351). If “both the amplitude and the latency of the P300 measure vary with certain cognitive states evoked by the stimulus” (351), then why are they not included as a measurement of reaction time? Pylyshyn states that the P300 measure is affected by different variables than is reaction time, but what is the difference between latency of a response and reaction time to a stimulus?

  6. While this article focused on vision as a source of perception, I couldn’t help but think about the topics discussed in terms of our other senses. The quote, “perceptual principles…are responsive only to visually presented information” confused me because we have other vehicles of perception(p. 344)… such as taste, sound, touch, and smell. How are those incorporated/would they follow the same guidelines and conclusions as visual perception?

    Pylyshyn provides arguments both for and against the continuity thesis. However, it seems as if he concluded that early vision is cognitively impenetrable. I interpreted this as meaning no outside influence can affect your early vision. But what happens if you have a brain lesion on your visual processing center? The lesion will obviously affect what you’re seeing, therefore “penetrating” the early vision… This example makes it seem like early vision is in fact penetrable.

  7. Pylyshyn’s assertion that there are two forms visual outputs, “early vision” and “motor-function,” parallel what we know about the visual system neurologically. When a visual stimulus is sensed, the visual pathway takes it through a visual reaction center (the superior colliculus), which responds reflexively—your motor-function. This pathway, however, does not seem to justify vision as continuous with cognition, as only reflexive, survival-based outputs are performed. If a Frisbee catches your brain’s attention from the corner of your eye and you jump out of the way before it hits you, higher-level processing of the visual input has not taken place. Therefore, I’m uncertain that this form of visual output can form basic beliefs. Taking the Frisbee example, by the time your brain has responded to get you out of the way, you have yet to make the visual connection that it is a Frisbee, not a bird or a pillow or a rocket flying towards you; the higher-level processes of understanding what the object is and decision making centers—whether to get out of the way, catch the object, or let it hit you—have not been consulted before survival mechanisms kick in. Thus, a basic belief in this situation cannot be: “I saw a Frisbee flying towards me, so I decided to jump out of the way.” Instead, I wonder if a basic belief can be stripped down to the very vague and basic: “There was a dangerous situation from which I removed myself reflexively.” Can reflex actions—“motor-function”—be a source of basic beliefs?

    On the other hand, Pylyshyn recognizes top-down processing of visual stimuli. Maybe you and your friend are playing Frisbee together; you recognize your friend is throwing something to you, that something is a Frisbee, and that you should decide to catch it and throw it back. All of these involve integration, recognition, and decision-making centers throughout the forebrain. In this scenario, I would consider vision capable of contributing to basic beliefs.

    If one form of visual output does not constitute basic beliefs, can vision as a whole constitute basic beliefs? Is it possible to argue that reflex actions are performed based on very rudimentary basic beliefs?

  8. Pylyshyn, in his discussion of the categories of outputs of Visual Systems, distinguishes between “what we see- the content of our phenomenological experience” and “the output of the visual system itself” (p. 362). He also claims that “we form a 3D representation of surfaces, encode the layout of a scene, and perceive a set of surfaces in depth” (as described in “Output of Visual Systems” above).

    Does this presuppose a Foundational approach in visual perceptions?

    As if these “3D representations of surfaces” are generated by all Visual Systems (independent of knowledge and reasoning), then to what extent can these “representations” serve as “basic beliefs” regarding visual perception?

  9. On page 344, Pylyshyn describes how the “early vision system is cognitively impenetrable” while “vision as a whole is cognitively penetrable” (Pylyshyn 344). Pylyshyn then shares that “early vision is defined functionally and the neuroanatomic locus of early vision is not known with any precision” (Pylyshyn 344). How can we know that the early vision is cognitively impenetrable if there is not even a clear definition for early vision?

    On page 345, Pylyshyn discusses the perspective from artificial intelligence regarding early approaches to computer vision. He classifies this as a primarily “bottom-up” approach. Based on how a computer functions, is it possible to have a “top-down” approach in computer vision? It seems as if the input has to be at a lower level before a larger perspective can be seen.

    Throughout the reading, there are valid opinions in favor of “top-down” processing and “bottom-up” processing. Why does it have to be one or the other? Everyone could be different and have different response strategies.

    Pylyshyn discusses “expert perceivers” on page 358. Can anyone be an “expert perceiver” if they invest enough time? Do you think IQ has any impact on how quickly you perceive?

Leave a Reply