Day 6: How does seeing become knowing? — Understanding the Visual Cortex

Entering high-level visual processing — Chapter 24 of Principles of Neural Science
We already know what V1 & V2 do (intermediate-level processing) — now we move beyond, into the regions that build actual perception and meaning
The ventral stream's high-level pipeline: V4 → Posterior IT → Anterior IT → Amygdala / Hippocampus — each stage adds a new layer of understanding

The High-Level Visual Pipeline

Four stages transform raw visual features into cognition. Each stage builds on the last — from shape to object to meaning to memory.

shape, colour constancy
invariance begins

→

Posterior IT

complete object
representation (perception)

→

Anterior IT

percepts connect
to meaning

→

Amygdala / Hippo

vision becomes
cognition

features

meaning

V4 — Shape, Colour Constancy & the Beginning of Invariance

V4 is the bridge between intermediate features and object perception. It does three critical things.

Colour Constancy

sunlight

fluorescent

factors out illumination — computes colour relative to surroundings

Shape & Curves

extracts geometric properties — contours, curves, angles

Invariance Begins

△ △ △ ≈

small changes in position, rotation & size are tolerated

V4 damaged → achromatopsia: total loss of colour, everything else survives

1) Colour constancy — computes the actual colour of a surface by factoring out illumination across the whole scene. A red apple looks red in sunlight and under fluorescent light because V4 compares colour relative to the surroundings, not in isolation
2) Shape & curve extraction — understands geometric properties of contours, curves, and angles. Not just detecting edges (that's V1/V2) but understanding the shape they form
3) Transformation invariance begins — small changes in position, rotation, and size are tolerated. The same shape shifted slightly is still recognised as the same shape
V4 also receives heavy top-down signals from IT — higher regions feed predictions back down, telling V4 what to expect
Damage to V4 produces achromatopsia: total loss of colour perception while everything else (edges, motion, shape) survives — proving V4 is the dedicated colour computation centre

Posterior IT — Complete Object Representation (Perception)

This is at the end of the V1 → V2 → V4 pipeline. Here, the brain builds complete object percepts — not features, but things.

Posterior IT neurons respond to complex things — not just colour or orientation, but a crescent with a particular texture, a hand with specific fingers extended
Some IT neurons are specialised for understanding faces — dedicated face-selective regions exist here
V4 begins invariance, but in IT the invariance is full — whether the face is large or small, near or far, it doesn't matter. The same neuron fires regardless

IT Cortical Columns — The Architecture of Object Knowledge

Push an electrode straight down through the thickness of IT cortex — every neuron you hit responds to similar stimuli. That's a column: ~400 μm wide, running the full depth of cortex.

V1 Columns (Distinct)

45°

90°

135°

hard borders between columns

IT Columns (Overlapping)

face

hand

body

columns blend — shared knowledge

~400 μm wide vertical stack · full depth of cortex · ~30,000 objects recognised

This columnar organisation is similar to V1 (where horizontal connections link all 45° neurons). But there's a critical difference: IT columns overlap on each other
There's no distinct hard border between columns — unlike V1 where 45° neurons and 90° neurons have clear boundaries, IT columns blend into each other
This overlap is not a bug — it's a feature. Columns "share" different aspects of objects so that the cortex, with limited neurons, can understand ~30,000 different objects. They share knowledge

Population Coding — How IT Actually Recognises Objects

Same principle as artificial neural networks — this is all probabilistic, not binary.

cat

≈ high

dog

≈ low

chair

← each bar = one neuron's firing rate →

similar objects have similar firing patterns — vector similarity in the brain

No single neuron says "this is a face" — instead, a population of neurons each fire at different rates, and the pattern across the population encodes the object
Vector similarity happens here — the firing pattern for a cat is more similar to a dog than to a chair. Objects that share features have overlapping neural representations
This is why ambiguous or partially occluded objects can still be recognised — the population code is robust to noise because it's a distributed representation

Horizontal Connections in IT — Distributed Recognition

There are long-range horizontal connections across IT cortex — recognition isn't a single-column operation
Recognising a face isn't a column thing — it's distributed. You have to combine a lot of geometrical features (eyes, nose, mouth, spacing) which are encoded across many columns
This is why face recognition is so robust — the distributed representation means damage to any single column doesn't destroy the ability entirely

Damage to Posterior IT — Apperceptive Agnosia

When posterior IT is damaged, perception itself fails.

Apperceptive Agnosia

Posterior IT damaged

✓ edges, colours, motion

✓ V1 & V2 intact

✗ can't form objects

✗ can't draw from memory

Associative Agnosia

Anterior IT damaged

✓ can see & draw objects

✓ perception intact

✗ can't name objects

✗ can't say what it's for

Prosopagnosia

Face regions damaged

✓ sees faces

✓ reads expressions

✗ can't assign identity

✗ "whose face is this?"

Patients can see edges, colours, and motion (all from V1 & V2) — low-level vision is intact
But they can't assemble these features into a coherent object — they can't draw anything from memory because their perception fails to create the representation
They see the parts but not the whole — the binding of features into objects is broken

Anterior IT — Where Percepts Connect to Meaning

Posterior IT handles perception ("I see a face"), but anterior IT relates it to memory ("this face belongs to my mother").

This is the transition from seeing to knowing — the percept gets connected to stored semantic knowledge
Damage to anterior IT causes associative agnosia: the patient can draw an object from memory perfectly, but can't tell you what it's called or what it's used for
The representation is intact (they can perceive and copy) — but the link to meaning is severed

Prosopagnosia — The Face Case

A specific agnosia just for faces — patients can recognise that something is a face, see its expression, describe it in detail, but can't assign it to a person.

Face recognition is critical from an evolutionary standpoint — it tells us who is friend, enemy, or threat. It conveys emotions
There are thousands of cortical regions dedicated to understanding patterns and subtle changes in eyes, noses, cheeks — this is a hyperfocused area for human face recognition
Prosopagnosia proves that face recognition uses dedicated neural machinery separate from general object recognition

Category-Specific Agnosias

Different regions of IT understand different categories of objects — and they can be selectively damaged.

Living Things

eyes, limbs, organic texture, bilateral symmetry

animals, faces, plants, veggies/fruits

Non-Living Things

rigid geometry, manufactured surfaces, functional parts

tools, vehicles, buildings, instruments

separate IT regions — damage to one category leaves others intact

Some patients lose the ability to recognise living things but can still recognise tools. Others lose vegetables/fruits specifically
This isn't random — it's deliberate organisation. All living things share certain common traits (eyes, limbs, organic texture), so the brain groups them together
Non-living things (tools, vehicles) share different traits (rigid geometry, manufactured surfaces) and are processed by different regions
This category-based organisation means the brain can efficiently reuse feature detectors within a category

Amygdala & Hippocampus — Vision Becomes Cognition

The final stage of the ventral stream — where visual perception connects to emotion and memory.

After anterior IT, the signal reaches the amygdala and hippocampus — vision is no longer just about seeing, it's about knowing and feeling
The amygdala assigns emotional significance: is this face threatening? Is this object desirable?
The hippocampus stores the context: when did I see this, where was I, who was I with?
This is where the ventral stream's journey ends — from photons hitting the retina to a fully contextualised, emotionally tagged memory