How does seeing become knowing?
- Entering high-level visual processing — Chapter 24 of Principles of Neural Science
- We already know what V1 & V2 do (intermediate-level processing) — now we move beyond, into the regions that build actual perception and meaning
- The ventral stream's high-level pipeline: V4 → Posterior IT → Anterior IT → Amygdala / Hippocampus — each stage adds a new layer of understanding
The High-Level Visual Pipeline
Four stages transform raw visual features into cognition. Each stage builds on the last — from shape to object to meaning to memory.
invariance begins
representation (perception)
to meaning
cognition
V4 — Shape, Colour Constancy & the Beginning of Invariance
V4 is the bridge between intermediate features and object perception. It does three critical things.
- 1) Colour constancy — computes the actual colour of a surface by factoring out illumination across the whole scene. A red apple looks red in sunlight and under fluorescent light because V4 compares colour relative to the surroundings, not in isolation
- 2) Shape & curve extraction — understands geometric properties of contours, curves, and angles. Not just detecting edges (that's V1/V2) but understanding the shape they form
- 3) Transformation invariance begins — small changes in position, rotation, and size are tolerated. The same shape shifted slightly is still recognised as the same shape
- V4 also receives heavy top-down signals from IT — higher regions feed predictions back down, telling V4 what to expect
- Damage to V4 produces achromatopsia: total loss of colour perception while everything else (edges, motion, shape) survives — proving V4 is the dedicated colour computation centre
Posterior IT — Complete Object Representation (Perception)
This is at the end of the V1 → V2 → V4 pipeline. Here, the brain builds complete object percepts — not features, but things.
- Posterior IT neurons respond to complex things — not just colour or orientation, but a crescent with a particular texture, a hand with specific fingers extended
- Some IT neurons are specialised for understanding faces — dedicated face-selective regions exist here
- V4 begins invariance, but in IT the invariance is full — whether the face is large or small, near or far, it doesn't matter. The same neuron fires regardless
IT Cortical Columns — The Architecture of Object Knowledge
Push an electrode straight down through the thickness of IT cortex — every neuron you hit responds to similar stimuli. That's a column: ~400 μm wide, running the full depth of cortex.
- This columnar organisation is similar to V1 (where horizontal connections link all 45° neurons). But there's a critical difference: IT columns overlap on each other
- There's no distinct hard border between columns — unlike V1 where 45° neurons and 90° neurons have clear boundaries, IT columns blend into each other
- This overlap is not a bug — it's a feature. Columns "share" different aspects of objects so that the cortex, with limited neurons, can understand ~30,000 different objects. They share knowledge
Population Coding — How IT Actually Recognises Objects
Same principle as artificial neural networks — this is all probabilistic, not binary.
- No single neuron says "this is a face" — instead, a population of neurons each fire at different rates, and the pattern across the population encodes the object
- Vector similarity happens here — the firing pattern for a cat is more similar to a dog than to a chair. Objects that share features have overlapping neural representations
- This is why ambiguous or partially occluded objects can still be recognised — the population code is robust to noise because it's a distributed representation
Horizontal Connections in IT — Distributed Recognition
- There are long-range horizontal connections across IT cortex — recognition isn't a single-column operation
- Recognising a face isn't a column thing — it's distributed. You have to combine a lot of geometrical features (eyes, nose, mouth, spacing) which are encoded across many columns
- This is why face recognition is so robust — the distributed representation means damage to any single column doesn't destroy the ability entirely
Damage to Posterior IT — Apperceptive Agnosia
When posterior IT is damaged, perception itself fails.
- Patients can see edges, colours, and motion (all from V1 & V2) — low-level vision is intact
- But they can't assemble these features into a coherent object — they can't draw anything from memory because their perception fails to create the representation
- They see the parts but not the whole — the binding of features into objects is broken
Anterior IT — Where Percepts Connect to Meaning
Posterior IT handles perception ("I see a face"), but anterior IT relates it to memory ("this face belongs to my mother").
- This is the transition from seeing to knowing — the percept gets connected to stored semantic knowledge
- Damage to anterior IT causes associative agnosia: the patient can draw an object from memory perfectly, but can't tell you what it's called or what it's used for
- The representation is intact (they can perceive and copy) — but the link to meaning is severed
Prosopagnosia — The Face Case
A specific agnosia just for faces — patients can recognise that something is a face, see its expression, describe it in detail, but can't assign it to a person.
- Face recognition is critical from an evolutionary standpoint — it tells us who is friend, enemy, or threat. It conveys emotions
- There are thousands of cortical regions dedicated to understanding patterns and subtle changes in eyes, noses, cheeks — this is a hyperfocused area for human face recognition
- Prosopagnosia proves that face recognition uses dedicated neural machinery separate from general object recognition
Category-Specific Agnosias
Different regions of IT understand different categories of objects — and they can be selectively damaged.
- Some patients lose the ability to recognise living things but can still recognise tools. Others lose vegetables/fruits specifically
- This isn't random — it's deliberate organisation. All living things share certain common traits (eyes, limbs, organic texture), so the brain groups them together
- Non-living things (tools, vehicles) share different traits (rigid geometry, manufactured surfaces) and are processed by different regions
- This category-based organisation means the brain can efficiently reuse feature detectors within a category
Amygdala & Hippocampus — Vision Becomes Cognition
The final stage of the ventral stream — where visual perception connects to emotion and memory.
- After anterior IT, the signal reaches the amygdala and hippocampus — vision is no longer just about seeing, it's about knowing and feeling
- The amygdala assigns emotional significance: is this face threatening? Is this object desirable?
- The hippocampus stores the context: when did I see this, where was I, who was I with?
- This is where the ventral stream's journey ends — from photons hitting the retina to a fully contextualised, emotionally tagged memory