Why My Neural Network Only Learned the Brightest Pixels
I spent weeks debugging a neural network that looked like it was training perfectly. The loss was decreasing. The curves were smooth. Everything seemed fine—until I visualized the predictions.
The model had learned to reconstruct the brightest regions beautifully. But everything else? Dark, featureless mud. The deeper tissue layers, the subtle boundaries, the actual structures I cared about—all gone.
The bug wasn’t in my architecture. It wasn’t in my data loader. It was in a single assumption I had carried over from years of working with natural images:
Just normalize to [0, 1] and you’re good.
That assumption works for photographs. It doesn’t work for medical imaging. And learning why taught me more about signal processing than any textbook.
The RGB Comfort Zone
When you work with natural images—photographs, videos, anything from a standard camera—life is simple. Pixel values live in a comfortable 8-bit range: 0 to 255. That’s 256 possible values per channel.
Normalization is trivial: divide by 255, done. Your neural network sees values between 0 and 1, gradients flow nicely, everyone’s happy.
This works because cameras are designed for human perception. The sensor, the processing pipeline, the compression—everything conspires to produce images that look reasonable to our eyes, which already perceive brightness logarithmically. The hard work of dynamic range compression happened before you ever loaded the JPEG.
Medical imaging doesn’t give you that luxury.
Welcome to the Real Dynamic Range
The first time I loaded raw OCT (Optical Coherence Tomography) data, I naively checked the value range:
print(f"min: {data.min()}, max: {data.max()}")
# min: 234, max: 4847291
That’s not 0-255. That’s not even close. OCT signals can span four to eight orders of magnitude—from near-zero noise floors to intense surface reflections.
And here’s what makes it worse: this range isn’t uniformly distributed. Due to optical attenuation, signal intensity decays exponentially with depth. The tissue surface might have values in the millions. One millimeter deeper, you’re looking at thousands. Another millimeter, hundreds.
If you normalize this linearly to [0, 1], here’s what happens:
| Depth | Raw Intensity | After Linear Normalization |
|---|---|---|
| Surface | 1,000,000 | 1.000 |
| 0.5mm | 100,000 | 0.100 |
| 1.0mm | 10,000 | 0.010 |
| 1.5mm | 1,000 | 0.001 |
Your neural network now sees the deep tissue as essentially zero. And when you train with MSE loss, what does the network learn? Ignore the zeros. The gradient signal from getting those deep regions wrong is microscopic compared to the gradient from the bright surface.
This is exactly what happened to me. My model wasn’t broken. It was rationally optimizing for what I told it to optimize for—and what I told it, accidentally, was that only the brightest pixels matter.
The Decibel Solution
The fix is deceptively simple in retrospect: work in logarithmic scale.
I_dB = 10 * np.log10(I_linear + epsilon)
This is the decibel (dB) transformation, and it’s not just a mathematical trick—it’s how the physics actually works. Optical attenuation is exponential, so logarithmic representation converts multiplicative decay into linear decay. Suddenly, that impossible dynamic range becomes manageable:
| Depth | Raw Intensity | dB Scale |
|---|---|---|
| Surface | 1,000,000 | 60 dB |
| 0.5mm | 100,000 | 50 dB |
| 1.0mm | 10,000 | 40 dB |
| 1.5mm | 1,000 | 30 dB |
Now your deep tissue isn’t represented as 0.001—it’s represented as 30 dB, just 30 units below the surface. The network can actually see it. Gradients flow to all depth levels. The model learns structure, not just brightness.
But Wait—What’s 0 dB?
Here’s where it gets subtle. The dB scale is relative. When you compute $10 \log_{10}(I)$, you get some absolute number in decibels. But what does “60 dB” actually mean? Sixty decibels relative to what?
In my first attempt, I used absolute dB values directly. This created a new problem: the same tissue, imaged with different laser power or detector gain, would produce completely different dB values. Same structure, different numbers. My training data was inconsistent.
The solution is percentile-based normalization:
I_ref = np.percentile(I_linear, 99.9)
I_dB = 10 * np.log10(I_linear / I_ref)
Now 0 dB is defined as the 99.9th percentile of your data—essentially, “the brightest meaningful signal.” Everything else is expressed relative to this reference: -10 dB means “ten times dimmer than the reference,” -20 dB means “a hundred times dimmer.”
This is invariant to acquisition settings. Double the laser power? Both numerator and reference double, ratio stays the same, dB values unchanged. Your model now learns contrast and structure, not absolute intensities that depend on how the machine was configured that day.
The Broader Pattern
This isn’t unique to OCT. The same fundamental tension—massive dynamic range, exponential signal behavior, acquisition variability—shows up across medical imaging:
CT (Computed Tomography): Raw projection data has enormous dynamic range. But what you usually work with is reconstructed images in Hounsfield Units—a standardized scale where water is 0 HU and air is -1000 HU. Someone already did the careful normalization work for you. This is why CT is often “easier” for deep learning than raw modalities.
X-Ray: The raw detector signal is high dynamic range. But suspiciously often, you receive datasets as clean 8-bit images. Where did the dynamic range go? Someone made choices—and those choices may not match your task.
I once received X-ray data with multiple exposures per case (30-40 frames each). My approach at the time: compute mean and standard deviation per case, clip at ±5σ, then normalize. Each case looked fine individually.
But here’s what I’d do differently now: when each case defines its own normalization, the model learns scale, not structure. A bright case and a dim case showing the same anatomy would produce different normalized values. The fix is the same principle as OCT—use a global reference. Compute statistics across all frames in a case (or even across your entire dataset), fix that as your reference, and apply it consistently. The model then learns that “0.7 means this tissue type,” not “0.7 means 70% of whatever this particular image’s max happened to be.”
MRI: Different sequences produce wildly different intensity scales. There’s no physical unit like Hounsfield. Intensity normalization is an active research problem with no universal solution.
Ultrasound: Log-compression is standard in the display pipeline, but the degree of compression varies by manufacturer and setting.
The lesson: whenever you work with a new medical imaging modality, your first question should be “what is the actual dynamic range of this data, and what has already been done to it?”
What I Do Now
My current preprocessing pipeline for OCT:
- Work in complex domain when possible (preserve phase information)
- Compute intensity as magnitude squared
- Transform to dB with a global percentile reference
- Clip to a reasonable range (typically -50 to 0 dB)
- Standardize (zero mean, unit variance) for training
The key insight: steps 3-4 ensure that my representation captures relative structure across the full depth range, not absolute intensities that would be dominated by surface reflections.
For visualization, I use the same dB scale with consistent clipping. What you train on should match what you evaluate on.
The Meta-Lesson
The debugging took weeks because I was looking in the wrong places. I checked my architecture, my loss function, my data augmentation. I didn’t question my preprocessing because normalization felt too simple to be the problem.
But “simple” preprocessing encodes assumptions. Dividing by 255 assumes your data fits in 8 bits. Dividing by max assumes uniform importance across the intensity range. Min-max scaling assumes your min and max are meaningful.
Medical imaging breaks all these assumptions, often quietly. The data loads, the shapes match, the training runs. You only discover the problem when you look at what the model actually learned.
Now, whenever I encounter a new modality, I start by staring at histograms. Not the images—the histograms. What’s the actual distribution? Where’s the signal? Where’s the noise floor? How many orders of magnitude am I dealing with?
The preprocessing that emerges from understanding your data is rarely “just normalize to [0, 1].”
And that’s probably the most important lesson: in medical imaging, there’s no such thing as “just.”
No figures in this post—I was too busy debugging the actual training. All I have left are research notes and hard-won lessons.