← All Days
Day 6 — Fri, Mar 27

How does seeing become knowing?

  • Entering high-level visual processing — Chapter 24 of Principles of Neural Science
  • We already know what V1 & V2 do (intermediate-level processing) — now we move beyond, into the regions that build actual perception and meaning
  • The ventral stream's high-level pipeline: V4 → Posterior IT → Anterior IT → Amygdala / Hippocampus — each stage adds a new layer of understanding

The High-Level Visual Pipeline

Four stages transform raw visual features into cognition. Each stage builds on the last — from shape to object to meaning to memory.

V4
shape, colour constancy
invariance begins
Posterior IT
complete object
representation (perception)
Anterior IT
percepts connect
to meaning
Amygdala / Hippo
vision becomes
cognition
features
meaning

V4 — Shape, Colour Constancy & the Beginning of Invariance

V4 is the bridge between intermediate features and object perception. It does three critical things.

1
Colour Constancy
sunlight
=
fluorescent
factors out illumination — computes colour relative to surroundings
2
Shape & Curves
extracts geometric properties — contours, curves, angles
3
Invariance Begins
small changes in position, rotation & size are tolerated
V4 damaged → achromatopsia: total loss of colour, everything else survives
  • 1) Colour constancy — computes the actual colour of a surface by factoring out illumination across the whole scene. A red apple looks red in sunlight and under fluorescent light because V4 compares colour relative to the surroundings, not in isolation
  • 2) Shape & curve extraction — understands geometric properties of contours, curves, and angles. Not just detecting edges (that's V1/V2) but understanding the shape they form
  • 3) Transformation invariance begins — small changes in position, rotation, and size are tolerated. The same shape shifted slightly is still recognised as the same shape
  • V4 also receives heavy top-down signals from IT — higher regions feed predictions back down, telling V4 what to expect
  • Damage to V4 produces achromatopsia: total loss of colour perception while everything else (edges, motion, shape) survives — proving V4 is the dedicated colour computation centre

Posterior IT — Complete Object Representation (Perception)

This is at the end of the V1 → V2 → V4 pipeline. Here, the brain builds complete object percepts — not features, but things.

  • Posterior IT neurons respond to complex things — not just colour or orientation, but a crescent with a particular texture, a hand with specific fingers extended
  • Some IT neurons are specialised for understanding faces — dedicated face-selective regions exist here
  • V4 begins invariance, but in IT the invariance is full — whether the face is large or small, near or far, it doesn't matter. The same neuron fires regardless

IT Cortical Columns — The Architecture of Object Knowledge

Push an electrode straight down through the thickness of IT cortex — every neuron you hit responds to similar stimuli. That's a column: ~400 μm wide, running the full depth of cortex.

V1 Columns (Distinct)
45°
90°
135°
hard borders between columns
vs
IT Columns (Overlapping)
face
hand
body
columns blend — shared knowledge
~400 μm wide vertical stack · full depth of cortex · ~30,000 objects recognised
  • This columnar organisation is similar to V1 (where horizontal connections link all 45° neurons). But there's a critical difference: IT columns overlap on each other
  • There's no distinct hard border between columns — unlike V1 where 45° neurons and 90° neurons have clear boundaries, IT columns blend into each other
  • This overlap is not a bug — it's a feature. Columns "share" different aspects of objects so that the cortex, with limited neurons, can understand ~30,000 different objects. They share knowledge

Population Coding — How IT Actually Recognises Objects

Same principle as artificial neural networks — this is all probabilistic, not binary.

cat
≈ high
dog
≈ low
chair
← each bar = one neuron's firing rate →
similar objects have similar firing patterns — vector similarity in the brain
  • No single neuron says "this is a face" — instead, a population of neurons each fire at different rates, and the pattern across the population encodes the object
  • Vector similarity happens here — the firing pattern for a cat is more similar to a dog than to a chair. Objects that share features have overlapping neural representations
  • This is why ambiguous or partially occluded objects can still be recognised — the population code is robust to noise because it's a distributed representation

Horizontal Connections in IT — Distributed Recognition

  • There are long-range horizontal connections across IT cortex — recognition isn't a single-column operation
  • Recognising a face isn't a column thing — it's distributed. You have to combine a lot of geometrical features (eyes, nose, mouth, spacing) which are encoded across many columns
  • This is why face recognition is so robust — the distributed representation means damage to any single column doesn't destroy the ability entirely

Damage to Posterior IT — Apperceptive Agnosia

When posterior IT is damaged, perception itself fails.

Apperceptive Agnosia
Posterior IT damaged
✓ edges, colours, motion
✓ V1 & V2 intact
✗ can't form objects
✗ can't draw from memory
Associative Agnosia
Anterior IT damaged
✓ can see & draw objects
✓ perception intact
✗ can't name objects
✗ can't say what it's for
Prosopagnosia
Face regions damaged
✓ sees faces
✓ reads expressions
✗ can't assign identity
✗ "whose face is this?"
  • Patients can see edges, colours, and motion (all from V1 & V2) — low-level vision is intact
  • But they can't assemble these features into a coherent object — they can't draw anything from memory because their perception fails to create the representation
  • They see the parts but not the whole — the binding of features into objects is broken

Anterior IT — Where Percepts Connect to Meaning

Posterior IT handles perception ("I see a face"), but anterior IT relates it to memory ("this face belongs to my mother").

  • This is the transition from seeing to knowing — the percept gets connected to stored semantic knowledge
  • Damage to anterior IT causes associative agnosia: the patient can draw an object from memory perfectly, but can't tell you what it's called or what it's used for
  • The representation is intact (they can perceive and copy) — but the link to meaning is severed

Prosopagnosia — The Face Case

A specific agnosia just for faces — patients can recognise that something is a face, see its expression, describe it in detail, but can't assign it to a person.

  • Face recognition is critical from an evolutionary standpoint — it tells us who is friend, enemy, or threat. It conveys emotions
  • There are thousands of cortical regions dedicated to understanding patterns and subtle changes in eyes, noses, cheeks — this is a hyperfocused area for human face recognition
  • Prosopagnosia proves that face recognition uses dedicated neural machinery separate from general object recognition

Category-Specific Agnosias

Different regions of IT understand different categories of objects — and they can be selectively damaged.

Living Things
eyes, limbs, organic texture, bilateral symmetry
animals, faces, plants, veggies/fruits
Non-Living Things
rigid geometry, manufactured surfaces, functional parts
tools, vehicles, buildings, instruments
separate IT regions — damage to one category leaves others intact
  • Some patients lose the ability to recognise living things but can still recognise tools. Others lose vegetables/fruits specifically
  • This isn't random — it's deliberate organisation. All living things share certain common traits (eyes, limbs, organic texture), so the brain groups them together
  • Non-living things (tools, vehicles) share different traits (rigid geometry, manufactured surfaces) and are processed by different regions
  • This category-based organisation means the brain can efficiently reuse feature detectors within a category

Amygdala & Hippocampus — Vision Becomes Cognition

The final stage of the ventral stream — where visual perception connects to emotion and memory.

  • After anterior IT, the signal reaches the amygdala and hippocampus — vision is no longer just about seeing, it's about knowing and feeling
  • The amygdala assigns emotional significance: is this face threatening? Is this object desirable?
  • The hippocampus stores the context: when did I see this, where was I, who was I with?
  • This is where the ventral stream's journey ends — from photons hitting the retina to a fully contextualised, emotionally tagged memory