How does the brain perceive depth?
- Depth perception — how does the brain figure out how far away things are?
- We have two eyes separated by ~6 cm — this gives us binocular vision and the ability to compute depth from disparity
- When we fixate on an object, both eyes converge on it — zero disparity at the fixation plane. Everything else is either in front or behind
Three Types of Depth Neurons
The brain has specialised neurons that respond to different depth relationships relative to the fixation plane.
- Tuned excitatory — fires at only one specific depth. Like a depth-selective filter
- Tuned inhibitory — goes silent at one specific depth, fires everywhere else. Useful for detecting the exact fixation plane (where disparity = 0)
- Near / Far cells — broadly tuned. Near cells fire for anything closer than fixation, far cells fire for background. Coarse depth categorisation
The Occlusion Problem — Amodal Completion
A cup in front of a book — each eye sees a different part hidden. Yet we perceive one complete book, not two halves. The brain "fills the gaps."
- Modal perception is based on actual stimulation — what we physically see in front of our eyes
- Amodal perception is what we don't see — the brain just fills it in. Hence "a-modal" (without a mode of sensation)
- How does amodal completion work? Three cues: (1) Depth — closer object is in front, so brain infers continuation behind it, (2) Alignment & continuity — edges align via horizontal connections in V1, smooth continuation, (3) Contrast & texture — same colour/brightness = same object
Kanizsa Triangles — Seeing Edges That Don't Exist
The brain perceives a complete triangle even though no edges are drawn — pure illusory contours from amodal completion.
- There's no direct edge between the pac-man shapes — yet the brain perceives a bright white triangle on top
- This demonstrates that intermediate-level vision actively constructs boundaries from partial information
Global Correspondence Problem — V1 to V2
Moving one step higher: V1 handles local edge disparity, V2 handles global disparity.
- V1 detects edges and computes basic (local) disparity — matching features between the two eyes at a small scale
- V2 handles global disparity — combining multiple V1 outputs to build a coherent depth map of the whole scene
- As scenes get more complex, local edge matching gets messy — too many possible combinations. We need global detection (V2) to resolve ambiguity
Disparity Capture — Depth Propagates Across Surfaces
Depth is coherent across a surface. The brain exploits this.
confidence
- Brain starts with high-confidence regions: high-contrast edges, distinct features, clear texture boundaries
- Then propagates depth estimates outward to ambiguous regions — neighbouring regions "capture" their depth from confident neighbours
- This propagation effect is called disparity capture
- Stereograms & 3D films exploit exactly this — two images, one slightly shifted. One eye alone can't see depth, but together the global disparity region activates and we see 3D!
- The brain understands depth based on coherent observations across the visual field
Da Vinci Stereopsis
Da Vinci observed that depth is analysed by things which are "not seen" — hidden regions.
- The brain doesn't need matching features in both eyes. The occluded area in each eye is itself a signal for depth
- In Da Vinci stereopsis, features appear in only one eye — there's nothing to compare it with, yet the brain still extracts depth
- This is remarkable — absence of information becomes information
Monocular Cues — Depth With One Eye
- Even with one eye closed, we perceive depth fairly well — because higher brain regions "know" the approximate size & shape of familiar objects
- Depth & shadows, familiar size, comparative understanding — we can "guess" the depth
- This is top-down processing: deeper regions (IT, etc.) that know what a car is tell even V1 neurons "what to see"
- There are also motion parallax (nearby things move faster) and other monocular stereopsis cues
Border Ownership — Image Segmentation in V2
Based on contrast, brightness, lighting & shading, the brain performs a kind of image segmentation.
- When we see a black dot on a white surface, the border belongs to the dot — not the white background
- The background is continuous and extends behind the dot. The dot is in front and "owns" its edge
- V2 neurons signal which side of an edge is the "figure" and which is the "ground" — this is border ownership
- This marks the end of depth estimation — the brain has now segmented the scene into objects and background
Other Key Points from Chapter 23
Colour & Brightness Are Not Fixed
We don't have "pixel values" — the brain uses comparative perception, not absolute measurement.
- A grey box on a white background looks darker, but the same grey on a black background looks fainter — identical pixel value, completely different perception
- This is why colour constancy matters — the brain always judges relative to context, never in isolation
The Brain Only Responds to Edges
Most neurons don't respond to flat, uniform surfaces at all.
- The brain only responds to edges — it then "fills in" the big blank surfaces between them
- This is remarkably efficient: instead of encoding every pixel, just encode the boundaries and let the brain interpolate the rest
Classical vs Non-Classical Receptive Fields
- In the classical view, neurons only activate based on what falls inside their own receptive field
- In the non-classical view, they have a much broader influence — via horizontal connections to same-layer neurons
- This is the mechanism behind contextual modulation we saw on Day 4
Horizontal Connections & Feedback
- Horizontal connections: neurons consider what their neighbours are seeing (same level, lateral communication)
- Feedback connections: come from the top down — higher regions tell lower regions "what you should be seeing"
- These two together explain most of the brain's ability to resolve ambiguity in visual scenes
Perceptual Learning
V1 is plastic — we can actually rewire early visual processing through practice.
- We can rewire V1 by practising specific tasks — e.g. detecting slanted lines in parallel becomes faster with training
- But this learning is highly specific and not really transferable to other orientations or positions
- This is exactly why evolution pushed processing closer to the cortex — the cortex is plastic and connected to millions of things, so it can adapt
Pop-Out & Visual Search
Sometimes things jump out instantly, sometimes you have to search one by one.
- A red dot among yellow dots pops out instantly — because we process basic features (colour, orientation) simultaneously across the whole image
- But when features are combined (searching for 9 among 6s), pop-out fails — we have to search serially, item by item
- Visual search is context-aware: we know what 6 and 9 mean, so context helps. But tilt the paper 90° and the contextual (top-down) signal fades — does the struggle increase when digits must be mentally rotated?
Attention — The Final Gate
Similar to attention in neural networks — we might miss parts of an image that are changing because we're not paying attention.
- Attention is the top-down signal telling us what to focus on — it selects which parts of the visual field get priority processing
- The top-down attention signal reaches all the way down to the last neuron in V1 — it's not just a high-level filter, it modulates the very first stages of visual processing
- This concludes Chapter 23 of Principles of Neural Science — intermediate-level visual processing