Day 5: How does the brain perceive depth? — Understanding the Visual Cortex

← All Days

Depth perception — how does the brain figure out how far away things are?
We have two eyes separated by ~6 cm — this gives us binocular vision and the ability to compute depth from disparity
When we fixate on an object, both eyes converge on it — zero disparity at the fixation plane. Everything else is either in front or behind

Three Types of Depth Neurons

The brain has specialised neurons that respond to different depth relationships relative to the fixation plane.

Tuned Excitatory

near

far

fires at one specific depth only

Tuned Inhibitory

near

far

silent at fixation plane, fires everywhere else

Near / Far Cells

near

far

← near cells far cells →

Tuned excitatory — fires at only one specific depth. Like a depth-selective filter
Tuned inhibitory — goes silent at one specific depth, fires everywhere else. Useful for detecting the exact fixation plane (where disparity = 0)
Near / Far cells — broadly tuned. Near cells fire for anything closer than fixation, far cells fire for background. Coarse depth categorisation

The Occlusion Problem — Amodal Completion

A cup in front of a book — each eye sees a different part hidden. Yet we perceive one complete book, not two halves. The brain "fills the gaps."

Left eye sees

right side visible, left hidden by cup

Right eye sees

left side visible, right hidden by cup

Brain perceives

one complete book — gap filled in

Modal perception is based on actual stimulation — what we physically see in front of our eyes
Amodal perception is what we don't see — the brain just fills it in. Hence "a-modal" (without a mode of sensation)
How does amodal completion work? Three cues: (1) Depth — closer object is in front, so brain infers continuation behind it, (2) Alignment & continuity — edges align via horizontal connections in V1, smooth continuation, (3) Contrast & texture — same colour/brightness = same object

Kanizsa Triangles — Seeing Edges That Don't Exist

The brain perceives a complete triangle even though no edges are drawn — pure illusory contours from amodal completion.

no edges drawn — yet you see a triangle

There's no direct edge between the pac-man shapes — yet the brain perceives a bright white triangle on top
This demonstrates that intermediate-level vision actively constructs boundaries from partial information

Global Correspondence Problem — V1 to V2

Moving one step higher: V1 handles local edge disparity, V2 handles global disparity.

V1 detects edges and computes basic (local) disparity — matching features between the two eyes at a small scale
V2 handles global disparity — combining multiple V1 outputs to build a coherent depth map of the whole scene
As scenes get more complex, local edge matching gets messy — too many possible combinations. We need global detection (V2) to resolve ambiguity

Disparity Capture — Depth Propagates Across Surfaces

Depth is coherent across a surface. The brain exploits this.

high
confidence

← ← propagate ● propagate → →

depth estimates spread outward from high-confidence anchor points

Brain starts with high-confidence regions: high-contrast edges, distinct features, clear texture boundaries
Then propagates depth estimates outward to ambiguous regions — neighbouring regions "capture" their depth from confident neighbours
This propagation effect is called disparity capture
Stereograms & 3D films exploit exactly this — two images, one slightly shifted. One eye alone can't see depth, but together the global disparity region activates and we see 3D!
The brain understands depth based on coherent observations across the visual field

Da Vinci Stereopsis

Da Vinci observed that depth is analysed by things which are "not seen" — hidden regions.

Key insight: Each eye sees a sliver of background the other can't — the occluded region itself becomes a depth signal. No feature matching needed — absence of information = depth.

The brain doesn't need matching features in both eyes. The occluded area in each eye is itself a signal for depth
In Da Vinci stereopsis, features appear in only one eye — there's nothing to compare it with, yet the brain still extracts depth
This is remarkable — absence of information becomes information

Monocular Cues — Depth With One Eye

Even with one eye closed, we perceive depth fairly well — because higher brain regions "know" the approximate size & shape of familiar objects
Depth & shadows, familiar size, comparative understanding — we can "guess" the depth
This is top-down processing: deeper regions (IT, etc.) that know what a car is tell even V1 neurons "what to see"
There are also motion parallax (nearby things move faster) and other monocular stereopsis cues

Border Ownership — Image Segmentation in V2

Based on contrast, brightness, lighting & shading, the brain performs a kind of image segmentation.

↑

→

↓

←

figure (owns the border) ground (continues behind)

When we see a black dot on a white surface, the border belongs to the dot — not the white background
The background is continuous and extends behind the dot. The dot is in front and "owns" its edge
V2 neurons signal which side of an edge is the "figure" and which is the "ground" — this is border ownership
This marks the end of depth estimation — the brain has now segmented the scene into objects and background

Other Key Points from Chapter 23

Colour & Brightness Are Not Fixed

We don't have "pixel values" — the brain uses comparative perception, not absolute measurement.

looks darker

looks lighter

same grey — different perception depending on background

A grey box on a white background looks darker, but the same grey on a black background looks fainter — identical pixel value, completely different perception
This is why colour constancy matters — the brain always judges relative to context, never in isolation

The Brain Only Responds to Edges

Most neurons don't respond to flat, uniform surfaces at all.

The brain only responds to edges — it then "fills in" the big blank surfaces between them
This is remarkably efficient: instead of encoding every pixel, just encode the boundaries and let the brain interpolate the rest

Classical vs Non-Classical Receptive Fields

●

Classical

responds only to own field

→

●

Non-Classical

influenced by surround via horizontal connections

In the classical view, neurons only activate based on what falls inside their own receptive field
In the non-classical view, they have a much broader influence — via horizontal connections to same-layer neurons
This is the mechanism behind contextual modulation we saw on Day 4

Horizontal Connections & Feedback

Horizontal connections: neurons consider what their neighbours are seeing (same level, lateral communication)
Feedback connections: come from the top down — higher regions tell lower regions "what you should be seeing"
These two together explain most of the brain's ability to resolve ambiguity in visual scenes

Perceptual Learning

V1 is plastic — we can actually rewire early visual processing through practice.

We can rewire V1 by practising specific tasks — e.g. detecting slanted lines in parallel becomes faster with training
But this learning is highly specific and not really transferable to other orientations or positions
This is exactly why evolution pushed processing closer to the cortex — the cortex is plastic and connected to millions of things, so it can adapt

Pop-Out & Visual Search

Sometimes things jump out instantly, sometimes you have to search one by one.

● ● ● ● ● ● ● ● ●

instant pop-out ✓

6 6 6 9 6 6 6 6 6

serial search needed ✗

A red dot among yellow dots pops out instantly — because we process basic features (colour, orientation) simultaneously across the whole image
But when features are combined (searching for 9 among 6s), pop-out fails — we have to search serially, item by item
Visual search is context-aware: we know what 6 and 9 mean, so context helps. But tilt the paper 90° and the contextual (top-down) signal fades — does the struggle increase when digits must be mentally rotated?

Attention — The Final Gate

Similar to attention in neural networks — we might miss parts of an image that are changing because we're not paying attention.

Attention is the top-down signal telling us what to focus on — it selects which parts of the visual field get priority processing
The top-down attention signal reaches all the way down to the last neuron in V1 — it's not just a high-level filter, it modulates the very first stages of visual processing
This concludes Chapter 23 of Principles of Neural Science — intermediate-level visual processing