← All Days
Day 5 — Thu, Mar 26

How does the brain perceive depth?

  • Depth perception — how does the brain figure out how far away things are?
  • We have two eyes separated by ~6 cm — this gives us binocular vision and the ability to compute depth from disparity
  • When we fixate on an object, both eyes converge on it — zero disparity at the fixation plane. Everything else is either in front or behind

Three Types of Depth Neurons

The brain has specialised neurons that respond to different depth relationships relative to the fixation plane.

Tuned Excitatory
near
far
fires at one specific depth only
Tuned Inhibitory
near
far
silent at fixation plane, fires everywhere else
Near / Far Cells
near
far
← near cells far cells →
  • Tuned excitatory — fires at only one specific depth. Like a depth-selective filter
  • Tuned inhibitory — goes silent at one specific depth, fires everywhere else. Useful for detecting the exact fixation plane (where disparity = 0)
  • Near / Far cells — broadly tuned. Near cells fire for anything closer than fixation, far cells fire for background. Coarse depth categorisation

The Occlusion Problem — Amodal Completion

A cup in front of a book — each eye sees a different part hidden. Yet we perceive one complete book, not two halves. The brain "fills the gaps."

Left eye sees
right side visible, left hidden by cup
+
Right eye sees
left side visible, right hidden by cup
=
Brain perceives
one complete book — gap filled in
  • Modal perception is based on actual stimulation — what we physically see in front of our eyes
  • Amodal perception is what we don't see — the brain just fills it in. Hence "a-modal" (without a mode of sensation)
  • How does amodal completion work? Three cues: (1) Depth — closer object is in front, so brain infers continuation behind it, (2) Alignment & continuity — edges align via horizontal connections in V1, smooth continuation, (3) Contrast & texture — same colour/brightness = same object

Kanizsa Triangles — Seeing Edges That Don't Exist

The brain perceives a complete triangle even though no edges are drawn — pure illusory contours from amodal completion.

no edges drawn — yet you see a triangle
  • There's no direct edge between the pac-man shapes — yet the brain perceives a bright white triangle on top
  • This demonstrates that intermediate-level vision actively constructs boundaries from partial information

Global Correspondence Problem — V1 to V2

Moving one step higher: V1 handles local edge disparity, V2 handles global disparity.

  • V1 detects edges and computes basic (local) disparity — matching features between the two eyes at a small scale
  • V2 handles global disparity — combining multiple V1 outputs to build a coherent depth map of the whole scene
  • As scenes get more complex, local edge matching gets messy — too many possible combinations. We need global detection (V2) to resolve ambiguity

Disparity Capture — Depth Propagates Across Surfaces

Depth is coherent across a surface. The brain exploits this.

?
?
high
confidence
?
?
← ← propagate propagate → →
depth estimates spread outward from high-confidence anchor points
  • Brain starts with high-confidence regions: high-contrast edges, distinct features, clear texture boundaries
  • Then propagates depth estimates outward to ambiguous regions — neighbouring regions "capture" their depth from confident neighbours
  • This propagation effect is called disparity capture
  • Stereograms & 3D films exploit exactly this — two images, one slightly shifted. One eye alone can't see depth, but together the global disparity region activates and we see 3D!
  • The brain understands depth based on coherent observations across the visual field

Da Vinci Stereopsis

Da Vinci observed that depth is analysed by things which are "not seen" — hidden regions.

background wall L R occluder L R ~6 cm only right eye sees this only left eye sees this
Key insight: Each eye sees a sliver of background the other can't — the occluded region itself becomes a depth signal. No feature matching needed — absence of information = depth.
  • The brain doesn't need matching features in both eyes. The occluded area in each eye is itself a signal for depth
  • In Da Vinci stereopsis, features appear in only one eye — there's nothing to compare it with, yet the brain still extracts depth
  • This is remarkable — absence of information becomes information

Monocular Cues — Depth With One Eye

  • Even with one eye closed, we perceive depth fairly well — because higher brain regions "know" the approximate size & shape of familiar objects
  • Depth & shadows, familiar size, comparative understanding — we can "guess" the depth
  • This is top-down processing: deeper regions (IT, etc.) that know what a car is tell even V1 neurons "what to see"
  • There are also motion parallax (nearby things move faster) and other monocular stereopsis cues

Border Ownership — Image Segmentation in V2

Based on contrast, brightness, lighting & shading, the brain performs a kind of image segmentation.

figure (owns the border) ground (continues behind)
  • When we see a black dot on a white surface, the border belongs to the dot — not the white background
  • The background is continuous and extends behind the dot. The dot is in front and "owns" its edge
  • V2 neurons signal which side of an edge is the "figure" and which is the "ground" — this is border ownership
  • This marks the end of depth estimation — the brain has now segmented the scene into objects and background

Other Key Points from Chapter 23

Colour & Brightness Are Not Fixed

We don't have "pixel values" — the brain uses comparative perception, not absolute measurement.

looks darker
=
looks lighter
same grey — different perception depending on background
  • A grey box on a white background looks darker, but the same grey on a black background looks fainter — identical pixel value, completely different perception
  • This is why colour constancy matters — the brain always judges relative to context, never in isolation

The Brain Only Responds to Edges

Most neurons don't respond to flat, uniform surfaces at all.

  • The brain only responds to edges — it then "fills in" the big blank surfaces between them
  • This is remarkably efficient: instead of encoding every pixel, just encode the boundaries and let the brain interpolate the rest

Classical vs Non-Classical Receptive Fields

Classical
responds only to own field
Non-Classical
influenced by surround via horizontal connections
  • In the classical view, neurons only activate based on what falls inside their own receptive field
  • In the non-classical view, they have a much broader influence — via horizontal connections to same-layer neurons
  • This is the mechanism behind contextual modulation we saw on Day 4

Horizontal Connections & Feedback

  • Horizontal connections: neurons consider what their neighbours are seeing (same level, lateral communication)
  • Feedback connections: come from the top down — higher regions tell lower regions "what you should be seeing"
  • These two together explain most of the brain's ability to resolve ambiguity in visual scenes

Perceptual Learning

V1 is plastic — we can actually rewire early visual processing through practice.

  • We can rewire V1 by practising specific tasks — e.g. detecting slanted lines in parallel becomes faster with training
  • But this learning is highly specific and not really transferable to other orientations or positions
  • This is exactly why evolution pushed processing closer to the cortex — the cortex is plastic and connected to millions of things, so it can adapt

Pop-Out & Visual Search

Sometimes things jump out instantly, sometimes you have to search one by one.

instant pop-out ✓
6 6 6 9 6 6 6 6 6
serial search needed ✗
  • A red dot among yellow dots pops out instantly — because we process basic features (colour, orientation) simultaneously across the whole image
  • But when features are combined (searching for 9 among 6s), pop-out fails — we have to search serially, item by item
  • Visual search is context-aware: we know what 6 and 9 mean, so context helps. But tilt the paper 90° and the contextual (top-down) signal fades — does the struggle increase when digits must be mentally rotated?

Attention — The Final Gate

Similar to attention in neural networks — we might miss parts of an image that are changing because we're not paying attention.

  • Attention is the top-down signal telling us what to focus on — it selects which parts of the visual field get priority processing
  • The top-down attention signal reaches all the way down to the last neuron in V1 — it's not just a high-level filter, it modulates the very first stages of visual processing
  • This concludes Chapter 23 of Principles of Neural Science — intermediate-level visual processing